Temporal Changes in Effect Sizes of Studies Comparing Individuals With and Without Autism

This meta-analysis assesses effect sizes for statistically significant group-level differences between individuals with autism and control individuals for 5 distinct psychological constructs and 2 neurologic markers.


eTable 1. Data for Emotion Recognition (Autism)
The meta-analysis by Chung et al. only included a coarse annotation of the specific method used in each study. Studies identified from this meta-analysis only were examined and annotated with a task in accordance with the task categories obtained from the remaining three meta-analyses of the emotion recognition construct.
From the meta-analysis by Uljarevic & Hamilton, only studies using the Ekman task were included, as the remaining studies used a wide range of test procedures. Both of the Ekman variants (Emotion Labelling and Emotion Matching) were included.

Title
Year  "Grey matter volume" was chosen for analysis over the more closely matching "total brain volume", as data for the latter was not provided in a readable format.        Since macrocephaly was compared to population-wide norms rather than individual control-groups, some NOS items could not be rated for the macrocephaly studies.

eTable 19. NOS Rating Criteria
The Newcastle-Ottawa Scale (NOS) for Assessing the Quality of Studies Included -AUTISM adaptation
.5 point with one standardised asessment (ADOS and/or ADI and/or CARS) in a majority of autistic participants (80% and over) .5 additional point if with standardized asessment + clinical judgment. "Clinical judgment" can be clinical interview, best estimate diagnosis, team of professionals.
(Note: Diagnosis can be made previously or reconfirmed for the study. Also, a study with no standardised assessment gets a score of 0) Outcome (Tot=3) **If more than 1 task/experiment in the study, or if many outcome measures (ex: score + response time + rating by examiner), please refer to the meta-analysis to verify which task was included in the metaanalysis. Item 6: Ascertainment of outcome 1 point if Outcome is an objective measure (ex: score on a task, reaction time, volume of brain structures) OR outcome rated by an evaluator who is blind to group. (note: most studies will get 1) outcome rated by an evaluator who is not blind to group = 0 Item 7: Same task/procedure in both groups 1 point if same task/procedure in both groups for the task/variable of interest (ex: theory of mind task, brain volume…) (Note: most studies will get 1. Example of 0 would be a study in which autistics get the full task/battery, and controls get a short version) Item 8: Loss of participants 1 point if Loss of participants (ex: did not complete the task, technical difficulty, excluded for too low performance) is similar in both groups. "similar loss of participants" is when there is a maximum of 10% of between-group difference in loss of participants.
(Note: only consider participants lost because they did not complete task or because their data could not be used. Do NOT consider participants excluded because they did not meet inclusion criteria. Ex: a participant that was recruited but that in the end had a too low IQ for inclusion) n autistic group number of participants used in analyses (after participant loss) if more than one autistic group, refer to meta-analysis to verify which group they kept or both n control group number of participants used in analyses (after participant loss)

Autism group composition Proportion of "strict" autism
Score 1 if majority of autistic participants have an "autism" (or High-functioning autism) diagnosis (minimum of 80% of the sample) Score 0 if ASD (autism spectrum disorder/condition), or Asperger, or PDD (Pervasive developmental disorder), or a mix of diagnoses

Exclusion of syndromic autism
Score 1 if it is mentioned that autistic participants with a known genetic condition, or neurological conditions, were excluded. If it is not mentioned, then it's 0.
(Note: syndromic autism means with an identified genetic/neurologic condition like Fragile X or under identified mutations, or Tuberous sclerosis, etc. However, it will rarely be mentioned explicitely "syndromic autism" in the papers.) IQ IQ autistic group: mean IQ in autistic group (if more than one autistic group, refer to meta-analysis to verify which group they kept or both). Order of priority (take the first measure available in this order of priority): Not likely to be an issue, because combining data from both meta-analyses yields to a better representativeness of the autistic population (i.e. adolescents and adults) Low, because the inclusion periods described by respective search strategies overlap significantly (see Table 21) -Not likely to be an issue, because combining data from these meta-analyses yields to a better representativeness of the autistic population (i.e. children, adolescents and adults).
-Not likely to be an issue, because combining data from these meta-analyses yields to a broader assessment of the construct.
Low, because the inclusion periods described by respective search strategies overlap significantly (see Table 21) Planning 1 No major differences between selection criteria -Low, because the inclusion periods described by respective search strategies overlap significantly (see Table 22) Inhibition 1 No major differences between selection criteria -Low, because the inclusion periods described by respective search strategies overlap significantly (see Table 22) Flexibility 3 No major differences between selection criteria -Low, because the inclusion periods described by respective search strategies overlap significantly (see Table 22) P3b amplitude 0 As data were extracted from only one meta-analysis for this construct, there was no risk of differences between selection criteria from several meta-analyses.
-N/A Brain size 0 As we used data from only one meta-analysis for this construct, there was no risk of differences between selection criteria from several meta-analyses.

Social domain
Data for the analysis of emotion recognition was obtained from meta-analyses conducted by Chung et al., Leppanen et al., Peñuelas-Calvo, and Uljarevic & Hamilton. From these meta-analyses, we analysed 64 effect sizes from studies published from 1989 to 2017, based on a total of 3,895 participants. A regression analysis, with task, sample size, and publication year as independent factors, resulted in a significant effect of publication year (see Table 1). The slope estimate of the temporal trend was -0.028, meaning that the effect size decreased over time.
Data for the analysis of theory of mind was obtained from studies conducted by Chung et al. and Leppanen et al.. We identified 62 effect sizes from studies published from 1992 to 2017, based on a total of 4,478 participants. For theory of mind, the temporal trend was significant, and the slope was estimated to be -0.045. For one task (strange stories), there was evidence of the Proteus phenomenon, as the first study, which found a much larger effect size than the other studies, had a studentized residual above the 95th percentile. We tested the influence of this data point by also performing the analysis without this study, which still showed a significant effect of publication year (p < 0.001), with a slope of -0.032.

Executive domain
We explored the three executive constructs cognitive flexibility, planning, and inhibition. The data on cognitive flexibility was obtained from three meta-studies conducted by Landry & Al-Taie, Lai et al., and Westwood et al.. We included 51 effect sizes from studies published from 1985 to 2015, based on a total of 3,137 participants. The slope for publication year was estimated to be -0.013. Effect sizes from one study, Ozonoff 1994 study 2, deviated substantially from those of almost all other studies and could thus be considered to be outliers. This unusual result was also noted by the authors themselves and a reproduction of the study (Ozonoff 1994, study 3) found the results to be consistent with the remaining literature. If the abnormal effect sizes were excluded from the analysis, the results changed markedly, with the slope being estimated to be -0.018, and the effect of publication year becoming significant (p = 0.02). We examined the planning construct using data from meta-analyses of Olde Dubbelink & Geurts and Lai et al.. We included 46 effect sizes published from 1994 to 2015, based on a total of 3,033 participants. In addition to task type, the studies were sorted based on the applied outcome metric, as this varied between studies. The analysis of planning resulted in a significant slope for publication year of -0.067. The construct inhibition was explored by analyzing data obtained from Geurts et al. and Lai et al.. We included 71 effect sizes from studies published from 1994 to 2015, based on a total of 4,460 participants. As with the analysis of planning, the studies were sorted by task and outcome metric. The slope for inhibition was estimated to be -0.003.

Neurological domain
Data for the analysis of P3b amplitude was obtained from a meta-analysis conducted by Cui et al.34. We included 14 effect sizes from studies published from 1980 to 2014, based on a total of 374 participants. The studies were partitioned by task type based on which modality was investigated within each study. The analysis of P3b amplitude resulted in a significant slope of -0.048. Data for the brain size construct was obtained from a meta-analysis by Sacco et al.35. In total, 89 effect sizes were obtained from studies published from 1994 to 2014, based on a total of 8,326 participants. The brain size construct showed a significant decrease in effect size over time, with a slope of -0.047.