Shared Neural Phenotypes for Mood and Anxiety Disorders

Key Points Question Is the clinical overlap seen in major depressive disorder, bipolar disorder, anxiety disorders, and posttraumatic stress disorder reflected at the neurobiological level? Findings In this meta-analysis of 226 task-related functional imaging studies, transdiagnostic clusters of hypoactivation were identified in the inferior prefrontal cortex/insula, inferior parietal lobule, and putamen. Meaning Across mood and anxiety disorders, the most consistent transdiagnostic abnormalities in task-related brain activity converge in regions that are primarily associated with inhibitory control and salience processing.

Cognitive tasks: working memory, declarative episodic memory, executive function Affective: processing of affective stimuli including pain and mental imagery of trauma-related images

. Number of clusters of convergent case-control differences identified in meta-analyses of functional neuroimaging studies in mood, anxiety and posttraumatic stress disorders
The meta-analyses presented in this graph are detailed in eTable 1. The x-axis depicts the year of publication of each metaanalysis. The y-axis indicates the number of clusters reported in each meta-analysis. The size of the circles indicates the number of participants included in the meta-analysis. Violet= meta-analyses on major depressive disorder; Green= meta-analyses on bipolar disorder; Blue= meta-analyses focusing on anxiety / post-traumatic stress disorders.

Eligibility Criteria for Article Selection
We included articles that (a) examined adults aged 18-65 years; (b) used the diagnostic criteria of the Diagnostic and Statistical Manual of Mental Disorders (DSM) or the International Statistical Classification of Diseases and Related Health Problems (16); (c) studied healthy individuals and patients with major depressive disorder, bipolar disorder, generalized anxiety disorder, panic disorder, agoraphobia, specific and social phobias and post-traumatic stress disorder as separate groups; (d) investigated case-control differences originating from tasks comparing an active to a control condition, rather than to rest; (e) reported case-control differences arising from whole-brain analyses as coordinates in Talairach or Montreal Neurological Institute standard reference space. When the same sample was studied longitudinally (either in observational or interventional designs) using the same experiments we only included the article that reported the baseline findings. When articles reported results from overlapping samples, we included the article with the largest sample size. Three authors (DJ, DAM, SF) independently reviewed all the articles to determine eligibility (pairwise interclass correlations >0.91) and reach consensus about inclusion in cases of divergence in opinion.

Results of Literature Search
The 226 articles identified through the literature search comprised 83 articles on major depressive disorder, 66 on bipolar disorder, 35 on post-traumatic stress disorder, 6 on generalized anxiety disorder, 6 on panic disorder and agoraphobia, 8 on specific phobia and 22 on social phobia. Of the 66 articles on bipolar disorder, 47 included patients with bipolar disorder type I, 5 included patients with bipolar disorder type II, and 14 did not provide information about type. Of the 66 articles on bipolar disorder, 45 included patients that were described as not psychotic at the time of scanning, 2 included patients with psychotic features and 19 did not provide information regarding psychotic features in their sample. The polygenic risk for bipolar disorder influences brain regional function relating to visual and default state processing of emotional information.

Database Construction 7a. Samples Considered
When articles divided their sample into subgroups (e.g., patients subdivided by trauma type exposure), we included results originating from the entire sample and not the subgroups. When a single article examined the same sample longitudinally using the same fMRI tasks we only included the results at baseline. Conversely, when the same group of participants performed different tasks (e.g., using a facial affect task and a working memory task) we included results from each task. The following were recorded from each article separately for patient and control groups: number of participants, age, sex (% male), diagnostic classification system (e.g., DSM-IV, ICD-10). The medication status of the patient groups (% receiving any psychotropic medication) which 35% for major depressive disorder, 78% for bipolar disorder, 14% for post-traumatic stress disorder and 14% for anxiety disorders.

7b. Experiments Considered
Experiments (i.e., set of coordinates) were only included if derived from whole-brain analyses. When needed, coordinates were transformed from Montreal Neurology Institute space to Talairach space using the icbm_other2tal transformation 27 . For each task, we included the coordinates from the contrast between the active vs control condition; coordinates from resting-state or association analyses (e.g., correlation with clinical features) were not included. We included coordinates from multiple contrasts originating from the same task if they corresponded to different RDoC domains/constructs. For example, the monetary incentive delay task yields coordinates for contrasts between a control condition and reward anticipation or reward receipt; the coordinates of both such contrasts would be included as they respectively map to the constructs of approach/motivation and reward attainment. Based on this criterion, 26 studies contributed more than one experiment to the dataset. For tasks with multiple levels of difficulty, we included only the coordinates of the contrast corresponding to the most difficult condition. Coordinates from contrasts between different stimuli of the same active condition were not included if a control condition was available (e.g., in facial affect processing tasks contrasting emotional and neutral facial expressions, the contrasts between different emotional facial expressions were not included).

7c. Diagnostic coding of experiments
Coordinates from experiments from articles in major depressive disorder, bipolar disorder and post-traumatic stress disorder were coded accordingly. For experiments in patients with generalized anxiety disorder, panic disorder and agoraphobia, specific and social phobias we used the single coding of "anxiety disorders"; this is because the diagnostic boundaries of these disorders are unclear and because the number of experiments for each separate anxiety disorder was below the recommended minimum of 20 28 . Post-traumatic stress disorder was coded separately since it has been placed in the new category of trauma and stressor-related disorders in DSM-5. Each experiment was further coded by the direction of change in brain activity in patients relative to healthy individuals (hypo-or hyper-activation).

7d. RDoC Coding of experiments
We used the RDoC framework to code experiments (eTables 2-5) by domain and construct. The RDoC proposes a modular organization of brain function into positive valence systems, negative valence systems, cognitive systems, social processes and arousal and regulatory systems (https://www.nimh.nih.gov/researchpriorities/rdoc/constructs/rdoc-matrix.shtml). Experiments that combined mechanisms attributable to more than one domain (e.g., affective Go/NoGo) were designated cross-domain (See eMethods and eTables 2-5).

7e. Coding Experiments according to symptom severity
Patient's symptom severity wad based on their mean score of the instrument used to rate their symptoms in the primary study. When multiple instruments were used, we extracted only the mean value and standard-deviation (SD) of the instrument that was most commonly employed in all other studies. The name of the instrument and the mean (SD) psychopathology rating per study are shown in eTables 3-5. To accommodate the various instruments their rating was scored as "minimal/mild", "moderate" and "severe". For most scales it was possible to conduct this scoring based on the instrument's manual. For scales scoring guidelines and without standardized value, we use the tertile scores.

Activation Likelihood Estimation (ALE)
The ALE algorithm tests whether the spatial distribution within the brain of the peak coordinates (foci) from the experiments included in a meta-analysis differs from a random distribution [28][29][30] . Foci of each experiment are modeled as centers of a 3D Gaussian distribution accounting for the uncertainty associated with each focus. The full-width-athalf-maximum of these probability distributions is based on empirical data on the between-subject and betweentemplate variance. The between-subject variance is weighted by the size of the sample that performed the experiment, so that experiments from larger samples have a higher localizing power. The probabilities of all foci associated with each experiment are aggregated using the highest probability of each voxel at any focus reported for that experiment. This approach ensures that foci from a single experiment that are in close vicinity do not exert a cumulative influence on probability values. A modeled voxel-wise activation map is then created for each experiment. The combination of all activation maps yields voxel-wise ALE scores that describe the convergence of results at each location of the brain. To distinguish true spatial convergence across experiments from random overlap, a random effects model is used to compare ALE scores against an analytically derived null-distribution map reflecting a random spatial association between experiments. The p-value of a given voxel-wise ALE score represents the proportion of equal or higher values obtained under the null-distribution. The resulting non-parametric uncorrected voxel-wise p-values are thresholded at the cluster-forming threshold of P<0.001. Then the size of the clusters surviving this threshold is compared against a null-distribution of cluster-sizes derived by simulating 5000 datasets of randomly distributed foci but with otherwise identical properties (number of foci, uncertainty) as the original dataset. Family-wise error correction at P<0.05 is then applied to this distribution to identify cluster sizes that only exceed in 5 % of all random simulations.

8a. Rationale for excluding results from region-of-interest analyses
While region-of-interest (ROI) analyses are informative in the investigation of the neural correlates of processes that are known or are predicted to involve specific regions, including them in quantitative coordinate-based meta-analyses would almost surely bias results by inflating the contribution of the corresponding regions. This is because the fundamental assumption of the Activation Likelihood Estimation approach [28][29][30] is that each voxel has the same a priori chance of differentiating cases from controls (null hypothesis). To use a simple example, if all of included studies examined only the amygdala and consequently some of them reported activation of this structure, it would be almost certain that "amygdala activation" would emerge as a significant finding against a null-hypothesis of random spatial convergence across the entire brain. However, this "finding" would reflect nothing more that the propagation of the bias of looking exclusively in the amygdala. We have demonstrated such biases in our previous work by Sprooten et al. 31 where we have compared ROI and whole-brain volume analyses across diagnoses; we found that ROIs analyses led to significant over-representation of case-control differences in some brain regions (such as the amygdala). At the same time, ROI studies led to significant underrepresentation of other brain regions (such as the thalamus) which are rarely included in ROIs analyses but often show case-control differences in studies using whole-brain analyses. Thus, ROI analyses were excluded as they would artificially bias results in favor of voxels within these regions by violating the fundamental ALE null hypothesis.

8b. Rationale for pooling results across diagnoses
Our main analyses pooled together results across diagnoses for two reasons. First, mood, post-traumatic stress and anxiety disorders are often comorbid and hence the pooled analyses accommodate uncertainty about the symptomatic and syndromal boundaries between them. Comorbidity is often not assessed in the primary studies and therefore it is difficult to estimate its prevalence in the samples examined and its potential contribution to the neuroimaging results. Second, pooling results across diagnoses balances power, specificity and sensitivity and allows a data-driven identification of the relative significance of each suprathreshold cluster for each diagnosis based on their contribution to the likelihood of hypo or hyperactivation in that particular cluster.

8c. Rationale for pooling results across tasks and cognitive domains/constructs
The Research Domain Criteria (RDoC) framework 32 provides a principled way of classifying neurocognitive tasks according to their presumed association with recognized brain circuits. However, there is no one-to-one correspondence between tasks and brain circuits; the relationship between brain structure and function has been described both as pluripotent (one-to-many) and degenerate (many-to-one) 33,34 . Therefore, any given task engages brain regions outside those predicted by the cognitive mechanisms attributed to that particular task while a single brain area may be activated by disparate tasks that may not share cognitive component [32][33][34] . In this context, pooling results across tasks and estimating the contribution of each RDoC domain/construct to each suprathreshold cluster accommodates the pluripotency of the tests and offers a more realistic representation of their relevance to case-control differences.

Reproducibility and Ancillary Analyses
To test reproducibility of results, we reproduced the meta-analyses using the Seed-Based d Mapping software (SDM, version 5.15, http://www.sdmproject.com, formerly "Signed Differential Mapping") 34,35 . The SDM the software creates maps of the effect size of case-control differences in fMRI studies by converting the t-value of each peak to Hedges' effect size and then applying an anisotropic non-normalized Gaussian kernel so that voxels more correlated with the peak have higher effect sizes. Following this, a mean map is created by performing a voxel-wise calculation of the random effects mean of the study maps, weighted by sample size and variance of each study and between-study heterogeneity. The distribution of the resulting Z-values typically deviates from normality and thus their deviation from a null distribution (i.e., random distribution) is empirically estimated using permutation statistics (i.e., randomizations of effect sizes across voxels). All the analyses conducted here were based on 50 permutations. Statistically significant clusters of convergence are identified using voxel-wise uncorrected P<0.005, peak-level threshold of Z > 1 and cluster size > 10 voxels.

Ancillary Analyses
Using the Activation Likelihood Estimation methodology described above, we conducted the following ancillary analyses: (a) Our main analyses did not identify any clusters of hyperactivation; however, current models of affective morbidity emphasize regions of hyperactivation in response of emotionally valenced stimuli. We therefore conducted separate meta-analyses focused only on affective experiments pooled across all diagnoses. (b) Separate diagnosis-specific meta-analyses each restricted to studies on major depressive disorder, posttraumatic stress disorder, anxiety disorders and bipolar disorder. Details of the study samples and experiments involved as shown in Table 1 and eTable 6.
Additionally, for each suprathreshold cluster derived from the main analyses we extracted the per-voxel probability of functional change from the modelled activation maps in order to estimate (a) The contribution of symptom severity to each cluster, and (b) The contribution of each task type to the suprathreshold clusters