This procedure sequentially groups symptoms according to the similarity of their responses across a patient cohort. With this procedure, groups of symptoms that merge at high values relative to the merge points of their subgroups are considered candidates for natural clusters. A and B, In the Quick Inventory of Depressive Symptomatology–Self Report (QIDS-SR) checklist, we identified an identical 3-cluster solution in both the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) (n = 4017) and Combining Medications to Enhance Depression Outcomes (CO-MED) trials (n = 640). C, A comparable symptom structure was also observed at baseline for STAR*D patients when measured according to the Hamilton Depression (HAM-D) rating scale. The names of the individual checklist items are colored according to their cluster assignment. Line lengths in the dendogram reflect how similar items or clusters are to one another (shorter line length indicates greater similarity).
A, Measured according to the Quick Inventory of Depressive Symptomatology–Self Report (QIDS-SR) checklist in the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) and Combining Medications to Enhance Depression Outcomes (CO-MED) trials (12 weeks). B, Measured according to the Hamilton Depression (HAM-D) rating scale in 7 phase 3, placebo-controlled trials of duloxetine (8 weeks). The y-axes represent mean severity within a cluster and so should be multiplied by the number of symptoms within a cluster to convert to original units.
For each symptom cluster, a new model was trained on patients who received citalopram in the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial (A). After cross-validation, we applied the models to patients in 3 treatment arms of the Combining Medications to Enhance Depression Outcomes (CO-MED) trial (B) to test their ability to generalize to an independent clinical trial sample. Core emotional symptoms could be predicted with significantly above-chance performance in the escitalopram with placebo and venlafaxine with mirtazapine arms. Sleep/insomnia symptoms could be predicted above chance for escitalopram with bupropion.
eMethods. Detailed Methodology
eFigure 1. Study Overview
eFigure 2. Symptom Trajectories for the Individual Excluded Weight and Appetite Symptoms in STAR*D and COMED
eTable 1. Factor Loadings for the Three Factor Solution
eFigure 3. Factor Analysis Scree Plot
eTable 2. Factor Loadings (Oblique Rotation)
eFigure 4. Euclidean Distance Metric
eFigure 5. Divisive Clustering
eFigure 6. Clustering After 9 Weeks of Treatment
eFigure 7. Clustering Data From All Time Points
eFigure 8. Clustering All 16-Items of the QIDS Checklist in STAR*D and COMED
eTable 3. Manhattan Distance Matrix
eFigure 9. Silhouette Analysis of Main 12-Item Clustering
eResults. Supplementary Efficacy Analysis
eFigure 10. Dose Dependence Curves for Duloxetine and Placebo
eTable 4. STAR*D/COMED Slope Contrasts
eTable 5. Duloxetine Trial Slope Contrasts
eTable 6. Effect Size (Total Cluster Severity, Measured in QIDS Units)
eTable 7. Effect Size (Total Cluster Severity, in HAM-D Units)
eTable 8. GBM Model Performance in STAR*D During Repeated 10-Fold Cross Validation
eTable 9. Performance of a Simplified Analysis Pipeline (Combined Elastic Net and GLM) in STAR*D During Repeated 10-Fold Cross Validation
eFigure 11. Illustrating the External Validation of Our Simplified Analysis Pipeline
eDiscussion. Supplementary Machine Learning Discussion
eTable 10. Complete Variable List
Customize your JAMA Network experience by selecting one or more topics from the list below.
Chekroud AM, Gueorguieva R, Krumholz HM, Trivedi MH, Krystal JH, McCarthy G. Reevaluating the Efficacy and Predictability of Antidepressant Treatments: A Symptom Clustering Approach. JAMA Psychiatry. 2017;74(4):370–378. doi:10.1001/jamapsychiatry.2017.0025
Are antidepressants equally good at treating different kinds of symptoms in depression?
Individual patient data from 9 clinical trials of major depression in 7221 patients were analyzed, with a focus on specific clusters of symptoms rather than total depressive severity. For each cluster, significant differences in efficacy between antidepressants were identified.
Antidepressant medications can be selected to benefit specific clusters of symptoms in depression.
Depressive severity is typically measured according to total scores on questionnaires that include a diverse range of symptoms despite convincing evidence that depression is not a unitary construct. When evaluated according to aggregate measurements, treatment efficacy is generally modest and differences in efficacy between antidepressant therapies are small.
To determine the efficacy of antidepressant treatments on empirically defined groups of symptoms and examine the replicability of these groups.
Design, Setting, and Participants
Patient-reported data on patients with depression from the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial (n = 4039) were used to identify clusters of symptoms in a depressive symptom checklist. The findings were then replicated using the Combining Medications to Enhance Depression Outcomes (CO-MED) trial (n = 640). Mixed-effects regression analysis was then performed to determine whether observed symptom clusters have differential response trajectories using intent-to-treat data from both trials (n = 4706) along with 7 additional placebo and active-comparator phase 3 trials of duloxetine (n = 2515). Finally, outcomes for each cluster were estimated separately using machine-learning approaches. The study was conducted from October 28, 2014, to May 19, 2016.
Main Outcomes and Measures
Twelve items from the self-reported Quick Inventory of Depressive Symptomatology (QIDS-SR) scale and 14 items from the clinician-rated Hamilton Depression (HAM-D) rating scale. Higher scores on the measures indicate greater severity of the symptoms.
Of the 4706 patients included in the first analysis, 1722 (36.6%) were male; mean (SD) age was 41.2 (13.3) years. Of the 2515 patients included in the second analysis, 855 (34.0%) were male; mean age was 42.65 (12.17) years. Three symptom clusters in the QIDS-SR scale were identified at baseline in STAR*D. This 3-cluster solution was replicated in CO-MED and was similar for the HAM-D scale. Antidepressants in general (8 of 9 treatments) were more effective for core emotional symptoms than for sleep or atypical symptoms. Differences in efficacy between drugs were often greater than the difference in efficacy between treatments and placebo. For example, high-dose duloxetine outperformed escitalopram in treating core emotional symptoms (effect size, 2.3 HAM-D points during 8 weeks, 95% CI, 1.6 to 3.1; P < .001), but escitalopram was not significantly different from placebo (effect size, 0.03 HAM-D points; 95% CI, −0.7 to 0.8; P = .94).
Conclusions and Relevance
Two common checklists used to measure depressive severity can produce statistically reliable clusters of symptoms. These clusters differ in their responsiveness to treatment both within and across different antidepressant medications. Selecting the best drug for a given cluster may have a bigger benefit than that gained by use of an active compound vs a placebo.
Meta-analyses1 and factor analytic studies of large populations with depression2,3 indicate that the symptoms of major depressive disorder are organized into 2 to 5 clusters depending on the checklist used. Nevertheless, clinical trials of patients with depression nearly always report total symptom severity scores as their primary outcome measures. These studies also frequently report the proportion of patients whose total symptom severity falls below a certain threshold and thus achieve clinical response or remission.4 Few patients reach remission with their initial treatment, although depression eventually remits in most patients after a largely trial-and-error treatment selection process.5 Statistical models might improve clinical outcomes by accelerating the treatment matching process. Despite concerted efforts using genomic data,6 structural and functional magnetic resonance imaging,7 and machine learning of clinical data,8 performance in predicting outcomes remains modest.9,10
Heterogeneity among depressive symptoms may impede the evaluation of treatments for depression.11,12 For example, treatment efficacy for one group of symptoms may be masked by a lack of efficacy for other symptoms, potentially explaining mixed results from large comparative efficacy meta-analyses.4,13 For example, selective serotonin reuptake inhibitors are generally effective in reducing low mood14 relative to other symptoms. However, evaluating outcomes on an individual symptom level may be cumbersome since clinicians would need to remember treatment guidelines specific to each symptom. Although symptoms might be grouped based on clinical experience (eg, “melancholic depression”)15 or the use of rating subscales (eg, Hamilton Rating Scale for Depression–7), novel associations might be overlooked by this process.
Statistical methods enable one to categorize depressive symptoms into subcomponents. For example, one study showed that nortriptyline hydrochloride is more effective than escitalopram in treating a neurovegetative symptom dimension, but escitalopram was more effective in treating mood and cognitive symptom dimensions.16 However, traditional statistical approaches have some shortcomings. Factor analyses, for example, may generate complicated combinations of symptoms within particular dimensions.16 These analyses also may be susceptible to experimenter bias since one often has to choose the desired number of clusters or components in the data, as in k means clustering.17 By contrast, hierarchical clustering is an easy-to-visualize, deterministic method in which each symptom is assigned to a single cluster (ie, not loading across multiple clusters) without prespecifying the desired the number of clusters.
In this study, we explored the efficacy and predictability of antidepressant therapies in treating specific groups of symptoms (eMethods [which includes eTables 1-10 of various analyses] and eFigure 1 in the Supplement). We used an unsupervised machine-learning approach (hierarchical clustering) to establish a data-driven grouping of baseline symptoms. The clustering method was applied to patients from a large multisite trial of depression and a replication sample from an independent clinical trial with similar inclusion criteria. Next, we reanalyzed treatment outcomes for 9 archival clinical trials (Table 1) according to the severity of each symptom cluster (rather than total severity) to determine whether symptom clusters are equally responsive to antidepressant treatments and whether certain drugs and doses are more effective than others. Finally, we used supervised machine learning to predict outcomes specific to each cluster of symptoms since there may be good clinical or biological indicators of changes in some symptoms that do not correlate strongly with changes in other features of depression.
The Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial is the largest prospective, randomized clinical trial of outpatients with major depressive disorder.18-21 Eligible participants were treatment-seeking outpatients with a primary clinical (DSM-IV) diagnosis of nonpsychotic major depressive disorder scored 14 or higher on the 17-item Hamilton Depression (HAM-D) rating scale, were aged 18 to 75 years, and were recruited from primary and psychiatric care settings in the United States from June 2001 to April 2004.19 We focused on the first treatment stage consisting of a 12-week course of citalopram hydrobromide. The present study was conducted from October 28, 2014, to May 19, 2016. It was approved by the Yale University Human Subjects Committee, with a waiver of informed consent.
The Combining Medications to Enhance Depression Outcomes (CO-MED) trial was a multisite, single-blind, randomized clinical trial comparing the efficacy of medication combinations in the treatment of unipolar major depressive disorder.22,23 Eligible patients were aged 18 to 75 years, had a primary DSM-IV–based diagnosis of nonpsychotic major depressive disorder, had recurrent or chronic depression (current episode ≥2 years), scored 16 or higher on the 17-item HAM-D rating scale, and enrolled participants between March 2008 and February 2009. Patients were randomly allocated (1:1:1) to escitalopram plus placebo (monotherapy), escitalopram plus bupropion hydrochloride, or venlafaxine hydrochloride plus mirtazapine.
We also analyzed all arms from 7 randomized, multicenter, double-blind, placebo-controlled, and active comparator-controlled clinical trials of duloxetine for major depressive disorder (Table 1). Four different protocols were used for these studies; parts A and B reflect trials run in parallel following the same protocol. All studies incorporated double-blind, variable-duration placebo lead-in periods. Safety and efficacy results from these studies have been published previously24-27 and summarized as pooled analyses of safety28 and efficacy.29 Study HMCR is registered at clinicaltrials.gov.30 The other studies were conducted before clinical trial registration was necessary.
Outcomes for STAR*D and CO-MED are based on the 16-item self-report Quick Inventory of Depressive Symptomatology (QIDS-SR) checklist during 12 weeks of treatment. Outcomes for all other trials are based on the 17-item HAM-D rating scale31 during 8 weeks. We excluded the HAM-D “loss of insight” item because there is no equivalent in the QIDS-SR and excluded weight/appetite items because they were not collected in the same way across trials and are often excluded from item-level analyses32 (eFigure 2 in the Supplement). Study selection was driven primarily by access to individual patient-level data. Patients provided informed consent to treatment when they participated in the original clinical trials. Consent was not needed for the present analyses since the data were deidentified. Of the 4706 patients included in the first analysis, 1722 (36.6%) were male; mean (SD) age was 41.2 (13.3) years. Of the 2515 patients included in the second analysis, 855 (34.0%) were male; mean age was 42.65 (12.17) years.
Rating scales in depression include a diverse range of symptoms. We applied a data-driven approach to identify groups of symptoms within depression rating scales. Higher scores on the rating scales indicate more severe symptoms. Hierarchical clustering shows structure in data without making assumptions about the number of clusters that are present in the data and gives a deterministic solution. We applied agglomerative (bottom-up) hierarchical clustering to the QIDS-SR checklist completed at baseline in STAR*D by 4017 patients and replicated the analysis using baseline QIDS-SR data from CO-MED (n = 640) and the baseline HAM-D scale that was also collected on 4039 patients in STAR*D. We conducted multiple sensitivity analyses using alternative approaches (eFigures 3-9 and eTables 1-3 in the Supplement).
We analyzed the full intent-to-treat samples in all trials using linear mixed-effects regression models (STAR*D, 4041; CO-MED, 665; and other trials, 2515). The dependent measure was mean within-cluster severity: for each patient at each time point, we calculated the mean symptom severity within each cluster. Fixed effects included symptom cluster, time (log-transformed weeks), treatment regimen, and all 2- and 3-way interaction effects. We included a separate random intercept and slope for each symptom cluster with unstructured variance-covariance of the random effects within subject based on improvements in the Schwarz-Bayesian information criterion.33 False-discovery rate-adjusted34P values were used to determine statistical significance for post hoc comparisons by cluster and drug within each mixed-model analysis.
One model was used to analyze QIDS-SR–based clusters across STAR*D and CO-MED, and another model was used to analyze HAM-D–based clusters for the 7 other placebo-controlled trials. In the HAM-D model, we also included the main effect of the trial to control for potential systematic differences among trials. Preliminary analyses of the 4 duloxetine doses in each cluster indicated that 120-mg/d and 80-mg/d dosages were not significantly different from each other but differed from the lower doses and placebo (eResults and eFigure 10 in the Supplement). The 60-mg/d and 40-mg/d duloxetine dosages were similar to each other and nearly indistinguishable from placebo. We therefore grouped cohorts into high-dose duloxetine (80-120 mg/d) and low-dose duloxetine (40-60 mg/d).
We used a recently developed statistical modeling pipeline8 to predict treatment outcomes specific to each symptom cluster using information available at baseline. We extracted 164 items, including demographics, medical and psychiatric histories, and specific symptom items that were used as predictor variables (eTable 10 in the Supplement). Penalized logistic regression (elastic net35,36) was then used to identify the 25 variables that best predicted each cluster separately. These variables were then used to train machine-learning algorithms (gradient boosting machines37,38), resulting in a separate model for each symptom cluster, with each using 25 predictor variables. Predictability was measured as the percentage of variance explained in final cluster scores (ie, R2) using 5 repeats of 10-fold cross-validation. The statistical significance of each model was assessed using a permutation test (eMethods in the Supplement). We trained models on patients with complete baseline data for whom a severity score was recorded after 12 or more weeks of treatment (n = 1962) to ensure adequate treatment duration. To externally validate our predictive models, they were applied without modification to predict final cluster scores in CO-MED treatment completers. Here, statistical significance was measured by a P value calculated for Pearson correlations between predicted outcomes and observed outcomes in each treatment group of CO-MED. We did not have comparable predictor data in the duloxetine trials; thus, predictive analyses were conducted only for STAR*D and CO-MED. For significance, permutation-based tests used an α level of .01, mixed-effects regressions used a false-discovery rate correction and then an α level of .05, and Pearson correlations used an α level of .05.
Predictive and clustering analyses were implemented in R, version 3.2.3 (R Foundation). Efficacy analyses were conducted using SAS, version 9.4 (proc mixed) (SAS Institute).
In 2 independent trials, we identified the same clustering of symptoms in the QIDS-SR checklist, consisting of core emotional, sleep (insomnia), and atypical symptoms (Figure 1A and B). A similar clustering solution was also found for the HAM-D scale checklist (Figure 1C). The clustering solution was robust across a number of sensitivity analyses using different parameters, time points, and approaches (eFigures 3-9 and eTables 1-3 in the Supplement).
Treatment efficacy was measured according to the rate of symptom improvement over time (ie, steeper symptom trajectories are better, as shown in Figure 2). No antidepressant treatment worked equally well across all 3 symptom clusters. As shown in Figure 2A, when measured according to the QIDS-SR, trajectories were significantly better for core emotional symptoms than for either sleep symptoms or atypical symptoms for citalopram, escitalopram with placebo, and escitalopram with bupropion (all β>0.079; all false-discovery rate corrected P < .001). Sleep trajectories were also better than atypical trajectories for these 3 treatments (all β>0.099; all P ≤ .001). As shown in Figure 2B, when measured according to the HAM-D rating scale, a similar pattern was observed. Core emotional trajectories were better than sleep and atypical trajectories for all treatments (all β>0.12; all P ≤ .001). Sleep trajectories were also better than atypical trajectories for low-dose duloxetine and escitalopram (all β>0.080; all P ≤ .001). All slope contrast estimates, SEs, 95% CIs, and P values are included in eTables 4 and 5 in the Supplement.
To interpret the magnitude of differences between drugs, we calculated an effect size (ES), measured in raw rating scale points, that reflects the difference between treatments in reducing the overall severity of a symptom cluster (ie, we multiplied slope contrasts by the natural log of treatment duration and then by the number of symptoms in each cluster). For example, in this study, high-dose duloxetine was significantly better than escitalopram in treating atypical symptoms, such that a patient’s total improvement in atypical severity was a mean of 1.9 HAM-D points greater with high-dose duloxetine than escitalopram (ES, 1.9; 95% CI, 1.4-2.3; false-discovery rate corrected P < .001).
For each symptom cluster, there were significant differences in efficacy between treatments (Figure 2). Combined escitalopram and bupropion treatment was significantly more effective in treating core emotional symptoms than citalopram (ES, 0.7 QIDS-SR points; 95% CI, 0.2 to 1.3; P = .03). For sleep/insomnia symptoms, venlafaxine with mirtazapine outperformed citalopram (ES, 1.4; 95% CI, 1.0 to 1.8; P < .001). For core emotional symptoms in HAM-D scale trials (Figure 2B), high-dose duloxetine outperformed escitalopram (ES, 2.3 HAM-D points; 95% CI, 1.6 to 3.1; P < .001). Escitalopram was not significantly different from placebo for core emotional symptoms (ES, 0.03 HAM-D points; 95% CI, −0.7 to 0.8; P = .94). For sleep symptoms, high-dose duloxetine outperformed fluoxetine (ES, 0.9; 95% CI, 0.1 to 1.7; P = .046). For atypical symptoms, high-dose duloxetine outperformed all others (ES, 0.5-1.9) and escitalopram was worse than placebo (ES, 0.7; 95% CI, 0.3 to 1.1; P = .002). Among our HAM-D studies, only 2 antidepressant treatments (high-dose duloxetine and paroxetine) outperformed placebo for all 3 symptom clusters. All other comparisons are presented in eTables 6 and 7 in the Supplement.
Within STAR*D, although all models performed significantly above chance (all P < .01), we observed substantial variability in the predictability of outcomes for each cluster (Table 2 and eTable 8 in the Supplement). The sleep symptom cluster was the most predictable (R2 = 19.6%; SD, 5.0%; P < .01) and substantially more predictable than core symptoms (R2 = 14.5%; SD, 4.6%; P < .01) and atypical symptoms (R2 = 15.1%; SD, 5.3%; P < .01). The observed range in cluster predictability (R2 difference, 5.1%) was also significantly larger than any range observed during permutation testing (mean [SD] range, 0.56% [0.50%]; P < .01). We inspected the best predictive baseline variables for each model separately, highlighting those identified as predictive for 1 cluster but not others (ie, specific predictors) (Table 2). Baseline HAM-D scale severity was a top predictor of core emotional outcomes but not any of the other 3 clusters. Baseline atypical symptom severity and hypersomnia predicted atypical outcomes; baseline sleep cluster severity and early-morning insomnia predicted sleep outcomes.
We then applied the best-performing models, without modification, to predict outcomes for each cluster in the 3 treatment groups of CO-MED (Figure 3). Performance was statistically above chance, although clinically modest, for predicting core emotional outcomes in the escitalopram monotherapy arm (r149 = 0.18; P = .03) and the venlafaxine-mirtazapine arm (r138 = 0.17; P = .04). Performance was above chance predicting sleep outcomes in the escitalopram-bupropion arm (r132 = 0.36; P < .001).
To help translate these findings into clinical practice, we based a clinical decision support tool on these findings. It is implemented as a brief questionnaire that can be accessed from any web browser and returns results in real time (https://www.spring.care/spring-assessment).
Using a data-driven approach, we identified 3 symptom clusters within the QIDS-SR checklist. We replicated our clustering solution in an independent trial cohort (CO-MED) and found it to be robust across different parameters and time points and consistent with other statistical approaches. No antidepressant was equally effective for all 3 symptom clusters, and, for each symptom cluster, there were significant differences in treatment efficacy between drugs. Antidepressants in general worked best in treating core emotional and sleep symptoms and were less effective in treating atypical symptoms. The magnitude of these differences suggests that selecting the best drug for a given cluster may have a bigger benefit than that gained by use of an active compound vs a placebo. Treatment outcomes at the symptom cluster level were predictable by machine learning of self-report data.
These results might help to guide future research on personalized antidepressant treatment. The 2015 revision to the 2008 British Association for Psychopharmacology guidelines indicate that clinicians should “match choice of antidepressant to individual patient requirements…taking into account likely short-term and long-term effects.”39(p463) However, there is currently little appropriately powered evidence on which symptom-specific recommendations might be made. Our present finding of better trajectories for core symptoms with citalopram supports Genome-Based Therapeutic Drugs for Depression findings that mood and cognitive symptom dimensions were significantly better for escitalopram treatment than nortryptyline.16 Whereas large-scale comparative efficacy studies of aggregate severity show modest (if any) differences between antidepressants,4,13 our results at the symptom cluster level indicate substantial differences between drugs both within and across putative antidepressant classes. Moving forward, we must establish how improvements in a given cluster relate to quality of life, keeping in mind that medication tolerability remains an important clinical concern (as reviewed elsewhere4,13).
The approach outlined in this article may have implications for the drug approval process. United States Food and Drug Administration and European Medicines Association approval are currently determined in trials that use aggregate scores on severity measures to enroll patients or measure outcomes. Although some trials have used a specific symptom as an outcome (eg, depressed mood), our findings indicate that medications might be developed for specific clusters of symptoms, as they appear to respond differentially to antidepressant medications. Symptom clusters may also enable drug testing in smaller but more informative populations with a more consistent phenotype. This approach is consistent with the National Institute of Mental Health Research Domain Criteria40—symptom clusters or dimensions might have distinctive underlying neural circuitry and signaling mechanisms—and paves the way for developing treatments that target and biomarkers that predict changes in specific clusters of symptoms.
Further clinical research will determine whether these clusters generalize to other cohorts and reflect good candidates for a true symptom structure in major depression.41,42 The present cluster structure resembles that of other scales in other large samples of patients with depression,1-4 although a recent review concluded that the debate is not over.41 These studies and ours are largely consistent in isolating symptoms of insomnia, a core group of symptoms that includes low mood, anhedonia, and low self-worth. However, direct comparisons are impeded by the use of many different rating scales in depression.42 Our data-driven approach offers some novel symptom groupings relative to previous approaches. For instance, our emotional cluster resembled the HRSD-7 subscale but never included a suicide item, and when scored according to the HAM-D scale, the HRSD-7 energy/fatigability item clustered with insomnia symptoms rather than emotional symptoms. There were slight differences between the QIDS-SR and HAM-D scale results. In the HAM-D scale, the emotional cluster included an anxiety item, whereas in the QIDS-SR scale, the same cluster included low energy and concentration. The energy/concentration item falls in the sleep cluster for the HAM-D scale. This data-driven approach may have identified a set of symptoms in the emotional presentation of depression that may have neural circuit correlates that are more cohesive than either the DSM criteria or theory-driven clusters, such as the Bech/Maier scales, which have not yet produced meaningful signatures on neural circuits or treatment response prediction.10,43 Finally, the atypical cluster contains items that are not considered atypical items in the DSM, so conclusions about broader atypical symptoms should not be drawn from the naming of this cluster.
This study has some limitations. First, there was a high degree of study heterogeneity. Two rating scales (clinician-rated HAM-D vs self-rated QIDS-SR) and treatment durations (8 vs 12 weeks) were used. The studies used a mixture of fixed- and variable-dosage protocols and had differences in blinding (STAR*D was unblinded, CO-MED was single-blind, and all other trials were double-blind). The consistency of these findings from 7000 patients from these heterogeneous studies suggests that the findings should generalize. However, study differences precluded direct comparisons using all available data, and study selection based on data availability may be a source of bias.44 Our inclusion of placebo-controlled duloxetine trials was critical for considering the pattern of cluster response trajectories for placebo and determining whether trajectories were better with drug treatment than placebo. Ideally, behavioral interventions might be focused on atypical symptoms that are generally less responsive to antidepressants or combined with other focused interventions for specific/residual symptoms (eg, modafinil for energy/fatigue, zolpidem for insomnia). Finally, group-level differences do not translate to individual patient differences in a simple manner45; therefore, further research is needed to test whether the web tool is accurate and effective in real-world practice.
Larger limitations surround the interpretation of current predictive analyses (eTable 9, eFigure 11, and eDiscussion in the Supplement). Generalizability of our original pipeline was poor. Alternative analytic strategies may be more effective (eFigure 10 in the Supplement). This limitation highlights the importance of externally validating predictive tools rather than relying on metrics based on the discovery sample.8 Because it is impractical for each model to require 25 different items, we must identify a more limited group of predictor variables to use cluster-specific tools in the clinic.
Clusters of symptoms are detectable in 2 common depression rating scales, and these symptom clusters vary in their responsiveness to different antidepressant treatments. These patterns may offer clinicians evidence for tailoring antidepressant selection according to the symptoms that a specific patient is experiencing immediately—almost doubling the expected effect size of a treatment.
Accepted for Publication: December 31, 2016.
Corresponding Author: Adam M. Chekroud, MSc, Department of Psychology, Yale University, 2 Hillhouse Ave, New Haven, CT 06511 (firstname.lastname@example.org).
Published Online: February 22, 2017. doi:10.1001/jamapsychiatry.2017.0025
Author Contributions: Drs Krystal and McCarthy contributed equally to the study. Mr Chekroud and Dr Gueorguieva had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Chekroud, Krystal.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Chekroud, Gueorguieva.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Chekroud, Gueorguieva.
Obtained funding: Chekroud, Krystal.
Administrative, technical, or material support: Chekroud, Krystal.
Study supervision: Chekroud, Krystal, McCarthy.
Conflict of Interest Disclosures: Mr Chekroud holds equity in Spring Health (doing business as Spring Care Inc), a behavioral health startup. He is lead inventor on a provisional patent submission by Yale University. Dr Gueorguieva discloses consulting fees for Palo Alto Health Sciences and Mathematica Policy Research and a provisional patent submission by Yale University (Y0087.70116US00). Dr Krumholz is a recipient of research agreements from Medtronic and Janssen (a pharmaceutical company of Johnson & Johnson), through Yale University, to develop methods of clinical trial data sharing; is the recipient of a grant from the US Food and Drug Administration and Medtronic to develop methods for postmarket surveillance of medical devices; works under contract with the Centers for Medicare & Medicaid Services to develop and maintain performance measures; chairs (paid) a cardiac scientific advisory board for UnitedHealth; and is the founder of Hugo, a personal health information platform. Dr Trivedi has served as a paid adviser or consultant to Abbott, Abdi Ibrahim, Akzo (Organon), Alkermes, AstraZeneca, Axon Advisors, Bristol-Myers Squibb, Cephalon, Cerecor, Concert Pharmaceuticals, Eli Lilly, Evotec, Fabre Kramer Pharmaceuticals, Forest Pharmaceuticals, GlaxoSmithKline, Janssen Global Services, Janssen Pharmaceutical Products, Johnson & Johnson PRD, Libby, Lundbeck, Mead Johnson, MedAvante, Medtronic, Merck, Mitsubishi Tanabe Pharma Development America, Naurex, Neuronetics, Otsuka, Pamlab, Parke-Davis, Pfizer, PgxHealth, Phoenix Marketing Solutions, Rexahn Pharmaceuticals, Ridge Diagnostics, Roche Products, Sepracor, Shire Development, Sierra, SK Life and Science, Sunovion, Takeda, Tal Medical/Puretech Venture, Targacept, Transcept, VantagePoint, Vivus, and Wyeth-Ayerst Laboratories; he has received research support from the Agency for Healthcare Research and Quality, Corcept Therapeutics, Cyberonics, National Alliance for Research on Schizophrenia and Depression (now The Brain & Behavior Research Foundation), National Institute of Mental Health (NIMH), National Institute for Drug Abuse, Novartis, Pharmacia & Upjohn, Predix Pharmaceuticals (Epix), and Solvay. Dr Krystal is the editor of Biological Psychiatry. He has been a paid consultant to the following companies: LLC, AstraZeneca Pharmaceuticals, Biogen, Biomedisyn Corporation, Forum Pharmaceuticals, Janssen Pharmaceuticals, Orsuka America Pharmaceutical, Sunovion Pharmaceuticals, Takeda Industries, and Taisho Pharmaceutical Co. He is an unpaid member of the Scientific Advisory Board of Biohaven Pharmaceuticals, Blackthorn Therapeutics, Lohocla Research Corporation, Luc Therapeutices, Pfizer Pharmaceuticals, Spring Care, Inc, and TRImaran Pharma. He holds stock in ArRETT Neuroscience and Biohaven Pharmaceuticals and stock options in Blackthorn Therapeutics and Luc Therapeutics. Dr Krystal has the following patents and inventions: (1) dopamine and noradrenergic reuptake inhibitors in treatment of schizophrenia (patent No. 5,447,948); (2) co-inventor on a filed patent application by Yale University related to targeting the glutamatergic system for the treatment of neuropsychiatric disorders (PCTWO06108055A1); (3) intranasal administration of ketamine to treat depression (US application No. 14/197,767 and US application or Patient Cooperation Treaty international application No. 14/306,382); (4) composition and methods to treat addiction (provisional use patent application No. 61/973/961); and (5) treatment selection for major depressive disorder (US Patent and Trademark Office docket No. Y0087.70116US00). No other disclosures were reported.
Funding/Support: This study was supported in part by Yale University; The William K. Warren Foundation; grants 1UH2TR000960-01 and 5ULTR000142-08 from the National Center for Advancing Translational Science; the Department of Veterans Affairs (National Center for Posttraumatic Stress Disorder); grants P50AA12870 and M01RR00125 from the National Institute on Alcohol Abuse and Alcoholism; and grant UL1 RR024139 from the Yale Center for Clinical Investigation.
Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Additional Contributions: Data for Sequenced Treatment Alternatives to Relieve Depression (STAR*D) and Combining Medications to Enhance Depression Outcomes (CO-MED) were acquired from the NIMH through limited access data use certificates (eAppendix in the Supplement). Data for other trials were provided by Eli Lilly and Company. Amanda Zheutlin, MS (Yale University), Nikolaos Koutsouleris, MD (Ludwig-Maximilians University), and Martin Paulus, MD (Laureate Institute for Brain Research), provided advice and thoughtful comments on this article; there was no financial compensation.