Evaluation of the Diagnostic Stability of the Early Autism Spectrum Disorder Phenotype in the General Population Starting at 12 Months

This study examines the diagnostic stability of autism spectrum disorder in a large cohort of toddlers starting at 12 months of age and compares this stability with that of toddlers with other disorders.


Examining clinical characteristics of included vs excluded toddlers
Psychologists at our Center initially performed developmental evaluations with 2,241 toddlers. Within this sample, 1,269 toddlers were evaluated two or more times and are the focus of this study. Although it is possible that meaningful clinical differences are inherent between toddlers who were excluded from the current study versus those that were included, a lack of obtaining a 2 nd evaluation was primarily due to a toddler's age at his/her first diagnostic evaluation. That is, if a toddler was referred at 32 months or older, we did not schedule a follow up evaluation given the considerable expense associated with in-depth diagnostic testing via licensed clinical psychologists. This is evident in our histogram ( Figure 1A) that shows that very few toddlers referred at 32 months or older are included in this study (simply because we did not call them back in for an evaluation; not that they dropped or had missing data).
Another reason that some toddlers may not have had a 2 nd diagnostic evaluation is due to the fact that they are still waiting for it to occur. Our Center protocol is that toddlers are invited for additional diagnostic evaluations approximately once per year until age 3 years. Thus, a portion of our sample was evaluated within the past 1 year and are awaiting their follow up test visit.
We nonetheless compared the clinical characteristics of ASD toddlers who were excluded from the study (i.e., toddlers that had only one diagnostic evaluation) to toddlers included in the study (i.e., toddlers with two or more diagnostic evaluations) using t-tets. There were no statistically significant differences in clinical characteristics. As expected, however, non-included toddlers were significantly older at their first diagnostic evaluation visit. See eFigure 1.

Diagnostic criteria and DSM version
A toddler was designated in each of the following diagnostic categories based on the following criteria: Autism Spectrum Disorder (ASD) -scored within the range of concern on the ADOS and was considered ASD based on DSM (DSM-IV or DSM-5) criteria and clinical judgment. ASD Features -showed signs of autism and may have an elevated ADOS score, but did not meet full criteria for ASD. Developmental Delay (DD) -> 1 standard deviation below expected values on two or more areas of the Mullen with at least one of those areas outside of the verbal scales. Language Delay (LD) -> 1 standard deviation below expected values on either or both the receptive or expressive subtests on the Mullen. Othershowed developmental issue not captured in any of the aforementioned categories including motor delay, social emotional delay, attention deficit and speech articulation impediment. Toddlers were determined to be typically-developing, TD, if they fell within the normal range on all clinical assessments and TypSib if they also had a sibling with ASD. See Table 1 for subject characteristics based on each toddler's final diagnostic visit.
Data collection for the present study began with small subject samples between 2006-2007, with the bulk of subject data collection occurring between 2008-2018. On May 18 th 2013, a new version of the Diagnostic and Statistical Manual (DSM), the DSM-5, was released and our Center adopted the new DSM-5 criteria on August 1 st , 2013. Thus, toddlers that participated in all of their evaluations prior to August 2013 were diagnosed using only DSM-IV criteria, and toddlers that participated in all of their evaluations after August 1 st 2013 were diagnosed using only DSM-5 criteria. A percentage of toddlers (~20%), began in the study during the time period of the DSM-IV, but ended during the time period of the DSM-5 and thus psychologists used both DSM versions as part of their diagnostic evaluation. eFigure 2 illustrates the percentage of toddlers within each diagnostic group who received the DSM-IV only, the DSM-5 only, or both. As illustrated, within the ASD group, 52% were diagnosed using DSM-IV criteria only, 31% using DSM-5 only and 17% using both.

Collection of clinical history, parent feedback and treatment referral
Clinical History. The unique age and subject ascertainment procedure used in our study brings with it some variation in the quantity of historical information collected across subjects. With a modal age of 14 months, our cohort is very young and some parents brought their child for an evaluation solely because their pediatrician recommended they do so based on a failure of the screening tool we use in our early detection program, the CSBS, and not because they had any initial concerns. A brief history and parent concerns, if any, are gathered first by study coordinators when the family first contacts our Center, and then reviewed by the psychologist. The psychologist also asks the family about any concerns and past assessments, prior to starting the evaluation session. While the Vineland Adaptive Behavior Scales was collected on all subjects, in some cases, the Vineland was administered as an interview, and in other cases the parents filled out the form. In either case, the Vineland served to provide further history for the psychologist.
Parent Feedback and Treatment Referral. Feedback regarding standardized test results and general clinical impression was provided to parents following completion of testing at every evaluation visit. The child's strengths and weakness were discussed with the parents, and any evidence of developmental delay was reviewed in detail. Based on the delay, the child was immediately referred for appropriate community services (e.g. speech therapy for language delays, ABA based services for delays related to ASD, etc.) with parent consent. Prior to age 3 years, the psychologist would share concerns for delay, and once the child turned 3 years, the psychologist made a formal diagnosis based on the child's current presentation. In our community, showing risk factors for ASD qualifies a child under age 3 years for autism-focused services including ABA-based services which are generally 8-10 hours per week. Therefore, it is standard protocol within our community to wait until age 3 years to make an ASD diagnosis given progressive service availability and given that previous research has shown the diagnosis of ASD is reasonably stable at age 3 years (see 4 for a review). For research purposes, the psychologist completes a diagnostic judgment form following every evaluation visit to track diagnostic impression over time and calculate the stability examined in the current study.

Description of psychologist training and reliability
At any given time across the study period, a total of two or three psychologists performed diagnostic evaluations. One of the diagnosticians, author CCB, achieved research reliability directly with the ADOS creator, Catherine Lord, prior to the study and was responsible for training new psychologists across the entire study period which generally extended across a 3-6 month period. CCB has achieved the highest level of ADOS certification as a certified independent ADOS trainer. Across the study period, inter-rater ADOS reliability between psychologists was established at >.80 on algorithm items across all modules. Across the past year, inter-rater reliability has been performed once every month, for both diagnostic impression and for ADOS algorithm items with 100% concordance in overall diagnosis, and an average of .89 on ADOS algorithm items.

Data verification and QC process
Using scatterplots, data were plotted for subdomain and total scores for each standardized test to visually identify data entry errors. A script was also written that generated a list of subjects whose scores fell outside of the possible range on that test (e.g., ADOS-2 Toddler Module Total score > 28). Data errors, which comprised <1% of all data, were fixed by retrieving the original test booklet used by the psychologist during testing and re-entering test scores.

Mullen estimated T score
The lowest minimum subscale T score based on the Mullen scoring manual is 20. However, some toddlers in our study performed at levels that were below 20, and we elected to generate a score for each toddler that was an approximate reflection of ability, rather than artificially assigning all such toddlers a score of 20. In these cases, which occurred in 9.4% (120 toddlers) of our sample, we estimated T-scores using the raw scores obtained with the table for the child's chronological age, as follows. The estimated T score is calculated by examining the variation in raw scores for the lowest T scores available for the child's age and applying that variation to estimate a lower T score. For example, if the lowest raw score available for the child's age is 14, but the child actually has a raw score of 12, two steps would be counted down and an estimated score would be calculated based on the amount of difference between T scores for each raw score above the cut off. So, if the raw score of 14 corresponded to a T of 20, and there was a 2 point difference between each T score above 20, then the estimated T score would be 16 (2 steps times 2 point difference = 4 and thus the estimated T score is 20-4 =16).

Diagnostic transition tables per 2-month age band
In addition to the overall diagnostic transition table presented in the Figure 2 that was calculated for the entire 1,269-subject cohort regardless of age at first diagnosis, diagnostic transition tables were also created for each 2month age band. These tables provide detail regarding the sample sizes per 2-month age band, as well as diagnostic stability coefficients within each age band and diagnostic category. See eFigure 3.

Employing a linear regression model to assess the effect of age at first DX on stability
In addition to using logistic regression as described in the main body of the paper, we also examined the potential effects of gender, age at first diagnosis, and diagnostic group (based on diagnosis at last visit) on the stability coefficients using a non-parametric linear regression model. To do so, the cohort was partitioned into 9 age bands based on 2-months age intervals up to 30 months (e.g., 12-13 months, 14-15 months etc.) and given the relatively small ASD sample sizes at older ages data from 30-36 months was collapsed into a single age band for a grand total of 10 age bands. Stability coefficients were determined for each combination of age band, gender, and diagnostic group. Linear regression was next used to model stability coefficients with age (mean age within each age band; age effects were modeled with a B-spline method using three degrees of freedom), diagnostic group (7 groups), and gender (male, female) as predictors.
The overall regression model was significant, R 2 =.49, p<.0001. There were no significant differences in stability based on sex (overall stability .84 boys; .84 girls (β = .0039, p=.934), thus, all stability coefficient data presented in tables and figures are reported collapsed across sex. Linear regression analyses revealed that stability coefficients were similar between ASD (.84) and TD (.78) p=.775. In contrast, significant differences were found between ASD and the remaining diagnostic groups (all ps <.0016). Within the ASD group, stability significantly changed across ages (X 2 =31.01, p=.0003; eFigure 4). Fisher's exact test revealed that only one age band, 12-13 months, was significantly lower than other age bands (stability coefficient .50, p=.00003; eFigure 5) and only one age band, 30-36 months, was significantly higher (stability coefficient .94, p=.026). Stability of ASD diagnosis increased to .79 by 14 months, and .83 by 16 months (eFigure 5). Given the transient nature of many early delays 25 , not surprisingly overall stability was low for the remaining delay groups (all coefficients <.50).

Impact of including toddlers with non-verbal age equivalents < 12 months
Developers of the ADOS Toddler Module note that the validity of the test might be weakest at the youngest ages and/or with toddlers who have not yet achieved a non-verbal mental age of 12 months. Luyster and colleagues (2009) state: "Pilot analyses indicated that children developmentally younger than 12 months consistently obtained elevated scores... We therefore set a lower cutoff of 12 months non-verbal mental age." Thus, one potential reason that the diagnostic stability was lowest at the youngest age bands, particularly the 12-13 month old age range, is because a percentage of toddlers had a non-verbal mental age that was lower than 12 months at the time of ADOS testing. To examine this possibility 73 subjects were removed from analyses (34 ASD, 1 ASD Features, 24 DD, 7 Other, 1 Typ Sib, and 6 TD) whose age equivalent scores were less than 12 months based on the visual reception subtest score of the Mullen. See eTable 5 for more detail. Instead of improving diagnostic stability coefficients as would be predicted, removing these subjects actually lowered the diagnostic stability coefficients. As such, we do not believe that the lower cutoff of 12 months is required in order to obtain valid ADOS toddler module score. See eFigure 5.

Early, Middle and Late-Identified ASD in comparison to all diagnostic groups
In the main body of the paper, selected scores from the Mullen, Vineland and ADOS were compared between Early, Middle and Late-Identified toddlers with ASD. Typically developing toddlers that received a diagnosis of TD at every visit were used as a contrast. Here we provide an expanded visualization of clinical scores that includes all study groups. This may be useful when considering how very early identified ASD is similar or different from children with global developmental delay or language delay. See eFigure 6

eFigure 1. Examination of clinical characteristics between excluded (1 visit) and included (≥ 2 visits) toddlers
As illustrated, toddlers with only one diagnostic visit tend to have their first visit at older ages. However, the two groups show similar clinical characteristics in broad set of clinical characteristics at their first diagnostic visit, suggesting minimal effect of selection bias in this regard.

: Clinical characteristics of ASD group stratified by identification age and other DX groups
Violin plots illustrating clinical scores on the ADOS, Mullen expressive language subscale, Mullen receptive language subscale and the Vineland Adaptive Behavior Composite between toddlers with ASD who were identified between 12-18 months (Persistent ASD, Early Age DX), after 18 months (Persistent ASD, Middle Age DX), or who were not identified as ASD at their first diagnostic visit (Late Identified ASD) and all other diagnostic groups. This figure is an expanded version on Figure 4 in the main paper that only included TD toddlers with a persistent diagnosis of TD as a contrast group. This expanded figure highlights the fact that toddlers with ASD that were initially missed by the clinician (i.e., "Late Identified" ASD) were showing delays at their first diagnostic evaluation visit. These delays, however, were similar to those exhibited by toddlers categorized as DD or LD, underscoring the challenges of differential diagnoses during early development.

eFigure 5. Diagnostic stability plots by age at first diagnosis
Raw data plots illustrating diagnostic stability per group across 2-month age intervals based on the age at first diagnostic evaluation. Age intervals with missing data points reflects an absence of subjects who received their first diagnostic evaluation at that age. Bspline regression line is shown in blue; gray bands represent 95% confidence intervals for the fit line. Overall stability was highest in toddlers initially designated as ASD or TD as illustrated by the relatively tight confidence interval bands, and the largely consistent stability coefficients.

eFigure 6. Diagnostic stability after removing toddlers with non-verbal mental age < 12 mo
In our cohort, 73 toddlers had non-verbal mental age less than 12 months. Comparisons between this figure and Figure 3 in the main body of the paper illustrate that the removal of toddlers with non-verbal mental ages <12 months does not improve diagnostic stability at 12-months (stability actually declines, changing from .50 to .44). See also eTable 5.

eTable 1. Overall logistic regression model
This model included four co-variates: first diagnosis, age at first diagnosis, gender, and time interval between first and last diagnostic visits. The contrast group associated with each covariate is designated as "Contr". Due to rounding, proportions may not add to one.

eTable 4. Overall age adjusted and unadjusted stability coefficients
Both adjusted and unadjusted coefficients were determined based on within diagnostic group logistic regression models. Overall, unadjusted stability coefficients were obtained by a logistic regression without considering age as a co-variate. Age adjusted stability coefficients are based on a logistic regression model in which non-linear effect of age was modeled using a B-splines method and are based on the median age at 1 st diagnosis of our cohort across diagnosis groups (i.e., age of 17.6 months).

eTable 5. Distribution of toddlers with non-verbal age below 12 mo. across diagnosis groups
Data is stratified by age at first diagnosis (DX) and the number of subjects with a VR score <12 months is denoted in the corresponding cell. In an exploratory analysis to determine the impact of VR score on ADOS results, subjects with a VR score < 12 months were removed (eFigure 2).

Age at First DX (mo)
Age Band