The cross point of M* = .01 indicates that individuals with M* > .01 may prefer interpersonal psychotherapy (IPT), while those with M* < .01 may prefer selective serotonin reuptake inhibitor (SSRI) pharmacotherapy.
Wallace ML, Frank E, Kraemer HC. A Novel Approach for Developing and Interpreting Treatment Moderator Profiles in Randomized Clinical Trials. JAMA Psychiatry. 2013;70(11):1241-1247. doi:10.1001/jamapsychiatry.2013.1960
Copyright 2013 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.
Identifying treatment moderators may help mental health practitioners arrive at more precise treatment selection for individual patients and can focus clinical research on subpopulations that differ in treatment response.
To demonstrate a novel exploratory approach to moderation analysis in randomized clinical trials.
Design, Setting, and Participants
A total of 291 adults from a randomized clinical trial that compared an empirically supported psychotherapy with selective serotonin reuptake inhibitor (SSRI) pharmacotherapy as treatments for depression.
Main Outcomes and Measures
We selected 8 relatively independent individual moderators out of 32 possible variables. A combined moderator, M*, was developed as a weighted combination of the 8 selected individual moderators. M* was then used to identify individuals for whom psychotherapy may be preferred to SSRI pharmacotherapy or vice versa.
Among individual moderators, psychomotor activation had the largest moderator effect size (0.12; 95% CI, <.01 to 0.24). The combined moderator, M*, had a larger moderator effect size than any individual moderator (0.31; 95% CI, 0.15 to 0.46). Although the original analyses demonstrated no overall difference in treatment response, M* divided the study population into 2 subpopulations, with each showing a clinically significant difference in response to psychotherapy vs SSRI pharmacotherapy.
Conclusions and Relevance
Our results suggest that the strongest determinations for personalized treatment selection will likely require simultaneous consideration of multiple moderators, emphasizing the value of the methods presented here. After validation in a randomized clinical trial, a mental health practitioner could input a patient’s relevant baseline values into a handheld computer programmed with the weights needed to calculate M*. The device could then output the patient’s M* value and suggested treatment, thereby allowing the mental health practitioner to select the treatment that would offer the greatest likelihood of success for each patient.
Mental health practitioners need better methods for personalizing treatment selection for individual patients or, as the Institute of Medicine has characterized the challenge, of moving toward “precision medicine.”1 This is particularly true for the treatment of depression, since initial treatments may succeed as infrequently as one-third of the time.2 With additional treatments, the success rate may be increased to two-thirds.2 However, this trial-and-error approach is costly for patients and health care providers because the process can take months, during which treatment costs accumulate while patients’ functioning—often including occupational functioning and earning capacity—remains compromised. Moreover, patients frequently lose faith after multiple failures and drop out of treatment entirely. Mental health practitioners and researchers need better methods for predicting which treatment will have the greatest likelihood of success for a patient.
One approach to precision medicine is using a patient’s baseline characteristics to determine which treatment will be optimally suited for him or her, ideally considering the benefits and harms that each potential treatment entails. In theory, this can be achieved through identifying moderators in a randomized clinical trial (RCT). In an RCT, a baseline variable is called a moderator when the effect of the treatment on the outcome differs based on the value of that variable.3 If moderators in RCTs can be identified and interpreted, they have the potential to help mental health practitioners determine the best treatment for a patient based on his or her baseline characteristics.
Central to this effort is the distinction between quantitative and qualitative moderators. For both types of moderators, the treatment effect size differs depending on the value of the moderator. However, quantitative moderators point to a recommendation of the same treatment for all patients, while qualitative moderators suggest different preferred treatments depending on the patients’ moderator values. Both quantitative and qualitative moderators are important for scientific reasons, but for clinical decision making, qualitative moderators are more important because they can be used to discriminate among multiple treatments.
Thus far, the searches for consistent and clinically meaningful moderators of depression treatment have been mostly unsuccessful. For example, the Depression Phenotypes Study,4 designed and powered specifically to identify moderators of depression treatment, had little success. More than 30 baseline variables were considered potential moderators of treatment assignment (selective serotonin reuptake inhibitor [SSRI] pharmacotherapy or psychotherapy) on time to remission in outpatients with depression. However, only 2 relatively weak moderators were identified as statistically significant: the psychomotor activation factor from the Mood Spectrum Scale5,6 and the medical reassurance factor from the Panic-Agoraphobic Spectrum Scale.7,8 Moreover, with 2 separate moderators, the decision as to the preferred treatment would still be unclear if the individual’s scores on the 2 measures indicated different treatment preferences.
To identify strong moderators that can be used for precision medicine, researchers may need to implement new guidelines and methods designed specifically for this purpose.9 First, it is important to define a single outcome measure, preferably one that incorporates the benefits and harms experienced by a patient.10- 12 Although many separate outcomes could be used for a single RCT, distinguishing separate moderators based on each outcome raises concerns regarding multiple comparisons and the synthesis of the results. Second, the identification of moderators should be based on effect sizes rather than P values. Effect sizes measure the potency of the effect, while P values generally tend to reflect sample size. Appropriate moderator effect size measures recently have become available,9 and their use will greatly enhance researchers’ abilities to identify meaningful moderators. Third, given all possible baseline variables that could be tested as moderators in a given RCT, it is essential to organize this daunting task properly. Fourth, for strong moderation, it is possible that no individual moderator suffices. Thus, it may be necessary to combine individual weak moderators to determine a stronger, combined moderator. Individual moderators can also be difficult to interpret and implement in practice, particularly when more than 1 is identified. However, a single moderator that is an optimal combination of multiple individual moderators can provide mental health practitioners with a clear and consistent algorithm for making treatment decisions.9
Here, data from the Depression Phenotypes Study (MH065376; E. Frank, PhD, and G. B. Cassano, MD, principal investigators) are used to establish a novel, exploratory approach to moderation analysis that implements the aforementioned guidelines and methods. In the Methods section, we describe the data and the outcome measure used in the demonstration. In the Results section, we interweave the presentation of the analytic methods with our explanation of how they work when applied to the data from the Depression Phenotypes Study. This was done for clarity of presentation and to provide a specific example for each step of the proposed methods.
The data used in this exploratory study come from the short-term treatment phase of the Depression Phenotypes Study. This RCT was performed at 2 sites: the University of Pittsburgh, Pennsylvania, and the University of Pisa, Italy. It began with a variable-length short-term treatment phase that lasted at least 12 weeks or until stable remission was achieved. The primary study aim was to determine which patients with unipolar depression benefit most from initiating treatment with pharmacotherapy and which patients should receive depression-specific psychotherapy instead.
The analytic sample consisted of 291 outpatients who had a DSM-IV–defined episode of unipolar major depression as determined by the Structured Clinical Interview for DSM, with a minimum score of 15 on the 17-item Hamilton Rating Scale for Depression (HRSD).13 Within this sample of outpatients, 89 (30.6%) were in their first episode of depression, whereas 202 (69.4%) had a history of recurrent depression. Some patients had been treated for a psychiatric disorder, not necessarily depression, with psychotherapy (6.9%), pharmacotherapy (30.6%), or both (22.0%). However, individuals who had been unresponsive to an adequate trial of either 6 weeks of escitalopram oxalate at a therapeutic dose or 11 sessions of interpersonal psychotherapy (IPT) during the first episode of depression were excluded from the study.
At baseline, 149 participants were randomly allocated to psychotherapy and 142 to SSRI pharmacotherapy. Because of its origins in clinical practice,14 IPT was selected to represent psychotherapy of depression. Escitalopram, selected to represent SSRI pharmacotherapy, was begun at 10 mg and titrated up or down as needed, with the aim of symptom remission and/or achieving a dosage of 20 mg/d.
Study participants were evaluated approximately weekly throughout the short-term treatment phase. Those who had not responded by 6 weeks or remitted by 12 weeks were given the combination of psychotherapy and SSRI pharmacotherapy, regardless of their initial randomization assignment. All participants signed written informed consent approved by the institutional review boards of the University of Pittsburgh and the Ethics Committee of the Azienda Ospedaliero—University of Pisa. Full study details are provided elsewhere.4
The integrated preference score (IPS) is a single metric designed to quantify the total clinical value provided by a given treatment, as measured by both benefits and harms that accrue to each individual during an RCT.10- 12 The IPS is highly sensitive to the crucial individual differences in clinical outcome among the patients in an RCT, making it an ideal outcome for a moderation study.
To calculate the IPS for the Depression Phenotypes Study, we convened an expert clinical panel consisting of a patient who had experienced depression but was not involved in the Depression Phenotypes Study, a patient advocate, a psychiatric nurse, a psychiatric social worker, and 2 psychiatrists. Members of this panel were given a series of “scorecards,” each of which contained information from a randomly selected psychotherapy patient and pharmacotherapy patient. These scorecards showed plots of the 17-item HRSD and the Patient Rated Inventory of Side-Effects–Modified Version15 (PRISE-M) throughout the short-term treatment phase, as well as age, sex, and body mass index. Panel members were asked to use the information on each scorecard to select which of the 2 patients had the overall preferred clinical outcome.
We selected the trajectory of the 17-item HRSD to represent the “benefit” and the mean of the PRISE-M as the “harm” of the treatment. The IPS was then calculated as a weighted combination of the benefit, the harm, and their interaction, with specific weights for each component derived by an analysis of the expert clinical panel’s ratings. A confirmatory analysis on an independent sample showed the IPS to be strongly correlated with the ratings of the expert clinical panel. A patient with a higher IPS was likely judged by members of this panel to have had a better overall clinical outcome and vice versa. Additional details of the derivation of the IPS are described elsewhere.10- 12 As anticipated, no differences between psychotherapy and SSRI pharmacotherapy were observed when using the IPS as the outcome.12
Ostensibly, it may appear that the consideration of adverse effects in the IPS unfairly restricts SSRI pharmacotherapy in comparison with psychotherapy. However, the members of the expert clinical panel compared pairs of psychotherapy and SSRI pharmacotherapy patients based on the benefits and harms experienced within each individual. Thus, a patient with many adverse effects, but a rapid reduction in symptoms, would not automatically be judged to have a worse clinical outcome than a patient with few adverse effects but little or no reduction in symptoms. Instead, such decisions were made on a case-to-case comparison basis by each member of the expert clinical panel and then accounted for through the derivation of weights in the IPS. Furthermore, the PRISE-M lists 33 symptoms and asks patients to rate each one as “not present,” “tolerable,” or “distressing” during the past week. This list includes typical symptoms of SSRI pharmacotherapy collateral effects (eg, nausea and diarrhea), as well as those that could be a function of SSRI pharmacotherapy treatment or of depression itself (eg, difficulty sleeping, loss of sexual desire, and anxiety). Given the variety of symptoms on the PRISE-M, both the psychotherapy and SSRI pharmacotherapy groups had the potential to experience “harms” during the study.
Our ultimate goal was to establish the ideal combination of variables for distinguishing those who preferably respond to psychotherapy from those who preferably respond to SSRI pharmacotherapy. With this goal in mind, we initially selected 32 baseline variables, each with a rationale and justification as a moderator of treatment effect on outcome in the population4 (Table 1). Each of the 32 baseline variables was standardized so that all resulting effect sizes would be comparable, regardless of the original scale on which they were measured.18,19
For each of the 32 baseline variables, we calculated the individual moderator effect size with a 95% bootstrap CI9 (Table 1). The absolute values of these individual moderator effect sizes were small, ranging from 0.001 (sensitive to loss) to 0.12 (psychomotor activation), with a median of 0.05. Other than psychomotor activation, no other variable had a moderator effect size greater than 0.10. Even when moderators are identified, it is not unusual for the effect of any individual moderator to be quite small.
When calculating a combined moderator, it is important to use independent variables to avoid collinearity problems. However, many of the 32 original baseline variables were correlated. To identify an independent subset of the 32 variables, we first performed a principal-components analysis based on the 276 participants with complete baseline information. This analysis identified 8 factors with eigenvalues greater than 1. Using the individual moderator effect sizes, loadings from the principal-components analysis, clinical meaningfulness, and access to complete data, we selected 1 individual moderator to represent each of these 8 factors. These 8 moderators, shown in Table 1, were observed in 290 (of the 291) participants. Only 6 of the 28 Spearman rank correlations among these 8 moderators had magnitudes greater than 0.10, and only 1 had a magnitude of more than 0.20. The median magnitude of the correlations was 0.09.
Next, it was necessary to determine the weights that each of the 8 selected variables should contribute to the combined moderator. To calculate the weights, we first created a data set in which each psychotherapy patient was paired with each SSRI pharmacotherapy patient. Then, for every psychotherapy (IPT) and SSRI pharmacotherapy pair, we calculated the difference in the IPS outcome, ΔIPS = IPSIPT − IPSSSRI, and the mean of all 8 variables. Still using the paired data, we performed a linear regression analysis of ΔIPS on the 8 mean scores. Each estimated regression coefficient reflects the moderator strength of its respective variable in the context of all other individual variables. Thus, these regression coefficients are used as the weights for calculating the combined moderator. Table 1 displays the weights for the 8 selected moderators. The individual moderator contributing the largest weight was the number of past depressive episodes; sex contributed the smallest weight.
After determining the weights for all individual moderators, we calculated the combined moderator for every participant. Specifically, we multiplied each moderator by its estimated weight and then added all terms together. We denote this combined moderator by M*. Additional details regarding the theoretical justification and calculation of M* are described elsewhere.9
Once the combined moderator, M*, was calculated for each individual, the next step was to estimate the strength with which it moderated the effect of treatment on the outcome (the IPS). Thus, we performed a regression analysis of the IPS on the treatment assignment, the combined moderator, and the interaction of the combined moderator and the treatment. Using estimates from this model, the moderator effect size9 of the combined moderator, M*, was calculated to be 0.31 with a 95% bootstrap CI of 0.15 to 0.46.
To visualize how the treatment effect size changes depending on an individual’s M* value, we used the regression estimates to plot the predicted IPS values for the psychotherapy and SSRI pharmacotherapy groups across the observed values of M*. As shown in the Figure, the predicted regression lines for the psychotherapy and SSRI pharmacotherapy groups cross at a value of M* = 0.01, which is well within the observed range. Above the cross point, the predicted outcome for the psychotherapy group is better than that for the SSRI pharmacotherapy group. Below the cross point, the predicted outcome for the SSRI pharmacotherapy group is better than that for the psychotherapy group. Thus, M* is a qualitative moderator because it suggests a different treatment preference for individuals above and below the cross point.
Since the predicted regression lines crossed within the observed range of M*, we divided the sample into those below and above the cross point and estimated the treatment effect size in each group. The treatment effect size for the 44.8% of patients (n = 130) above the cross point is 0.35, with a 95% CI of 0.01 to 0.70; this indicates that psychotherapy is preferable to SSRI pharmacotherapy for patients with M* > .01. The treatment effect size for the 55.2% of patients (n = 160) below the cross point is –0.39, with a 95% CI of −0.71 to −0.08; this indicates that SSRI pharmacotherapy is preferable to psychotherapy for patients with M* < .01.
Finally, we used descriptive statistics to develop moderator profiles that characterize the 2 groups based on the 8 individual moderators used to calculate M*. Table 2 displays characteristics of the group with M* > .01 (psychotherapy preferable to SSRI pharmacotherapy) and the group with M* < .01 (SSRI pharmacotherapy preferable to psychotherapy). Since this is an exploratory study that requires additional validation to confirm M*, we refrain from testing specific differences between groups. However, individuals in the psychotherapy preferable to SSRI pharmacotherapy group tended to be older and had neither a full- nor a part-time job. They were more likely to be male and have an anxiety disorder diagnosis in addition to major depression and required more reassurance from medical personnel. They had higher 25-item HRSD depression scores and higher psychomotor activation scores but fewer past depressive episodes. Conversely, individuals in the SSRI pharmacotherapy preferable to psychotherapy group tended to be younger, employed, and female. They required less reassurance from medical personnel and were less likely to have an anxiety disorder diagnosis in addition to major depression. They had lower 25-item HRSD depression scores and lower psychomotor activation scores but more past depressive episodes.
In this exploratory study, we demonstrate a method that combines multiple individual moderators to develop a combined treatment moderator, M*, with a considerably larger effect size than any of the individual moderators examined. Furthermore, the combined moderator was qualitative—that is, the predicted regression lines for the psychotherapy and SSRI pharmacotherapy groups crossed well within the observed range of M*, indicating SSRI pharmacotherapy selection for individuals below the cross point and psychotherapy selection for those above the cross point.
To emphasize the point that individual moderators may not serve the purpose of identifying appropriate treatments for patients as well as a combined moderator, we note that the psychomotor activation score, which was the strongest individual moderator, was also a qualitative moderator. The predicted psychotherapy and SSRI pharmacotherapy regression lines crossed within the observed range of the psychomotor activation score, with SSRI pharmacotherapy indicated for individuals below the cross point and psychotherapy indicated for those above the cross point. However, the treatment effect sizes above and below the psychomotor activation cross point were only 0.11 and –0.14, respectively. These effect sizes are substantially weaker than those above and below the M* cross point (0.35 and –0.39, respectively) and probably insufficient to give a mental health practitioner confidence in recommending SSRI pharmacotherapy to a patient with a low psychomotor activation score or psychotherapy to a patient with a high score. This finding suggests that, while it is possible to find individual qualitative moderators, a determination strong enough to lead to confident treatment recommendations will likely require simultaneous consideration of several moderators.
Notably, the individual moderators that contributed the most weight in calculating the combined moderator did not necessarily have the largest individual moderator effect sizes. For example, the number of past depressive episodes contributed the largest weight to the combined moderator but had only the third largest absolute value of effect size among the 8 individual moderators (−0.09; 95% CI, −0.20 to 0.02). Thus, a single moderator may become more meaningful when considered in combination with other moderators. This finding emphasizes the importance of using a combined moderator, such as the one demonstrated here.
This is an exploratory study conducted for the purpose of demonstrating a novel methodologic approach to identifying moderators in RCTs. As such, the results presented here should not be used specifically for clinical decision making until a validation study is performed. To validate these results, they should be tested in a future RCT, ideally one designed specifically for this purpose. For example, a future RCT examining the same treatments might be conducted with participants stratified on M*, with a large enough sample size in each stratum to have adequate power to detect the treatment by M* interaction. Once confirmed, future research might focus on only participants with M* above or below the cross point and then concentrate on identifying mediators of treatment response in those 2 groups, which may well be different. Such mediators have the potential to indicate how to improve treatment outcomes within each subpopulation. The combination of a strong moderator that “targets” every treatment and strong mediators that “tailor” treatments to the needs of the patient gives promise of major strides in improving treatment outcome for psychiatric patients and, indeed, for patients in all areas of medicine.
When considering future RCTs for validating a combined moderator such as the one presented here, it is important to remember that the M* we developed is not the only possible combined moderator for the IPS outcome. The method described herein is just one of many potential approaches to identify a subset of individual moderators and use them to create one combined moderator. For example, one important future direction may be to explore interactions between individual moderators and the extent to which these interactions contribute to the derivation of the combined moderator.
Along these same lines, the M* we developed is also dependent on the baseline variables that were available to us. Specifically, all variables used in the present analyses relate to sociodemographic or clinical status. However, the methods we describe are entirely applicable to and should take on greater importance when used with genetic markers, brain imaging parameters, and biologic markers of other kinds. To our knowledge, no strong biologic moderators of major depression treatment response have been identified and replicated. This is perhaps because such moderators have little effect when applied in isolation but may have a strong impact when used together with the full range of possible moderators of treatment. By incorporating these types of variables, along with the kinds of variables used in the present analysis, it might be possible to develop an even stronger combined moderator than the one we have identified.
Once a combined moderator, M*, is developed and validated, it can be adopted for clinical decision making, thereby allowing mental health practitioners to move toward precision medicine. It is our goal that eventually they would be able to input each patient’s relevant baseline values into a handheld computer programmed with the weights needed to calculate M*. The handheld computer would then output each patient’s M* value and suggested treatment, thereby allowing mental health practitioners to provide the treatment that has the greatest likelihood of success for the patient. Of course, in actuality, they must select the best option among multiple different treatments, not just 1 form of psychotherapy and 1 SSRI. To accommodate this type of decision, it will be necessary to identify a combined moderator for each pairwise comparison of treatments (treatment A vs treatment B, treatment B vs treatment C, treatment A vs treatment C, etc). Although the full range of studies needed to propose and validate all such combined moderators is large indeed, such an effort could ultimately result in a ranking of all these treatments. In an ideal world, mental health practitioners could input the baseline levels of the key variables into a handheld computer programmed with these rankings to assist them in making these more complicated treatment decisions. While it seems unlikely that the full range of studies necessary to achieve this ideal for the treatment of depression will be carried out any time soon, it is not too far-fetched to imagine that such an effort might be mounted in relation to the treatment of more immediately life-threatening conditions, such as sepsis or specific cancers.
Submitted for Publication: August 14, 2012; accepted March 5, 2013.
Corresponding Author: Meredith L. Wallace, PhD, Department of Psychiatry, University of Pittsburgh, 3811 O’Hara St, Pittsburgh, PA 15213 (firstname.lastname@example.org).
Published Online: September 18, 2013. doi:10.1001/jamapsychiatry.2013.1960.
Author Contributions: Dr Wallace had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: All authors.
Acquisition of data: Frank.
Analysis and interpretation of data: All authors.
Drafting of the manuscript: All authors.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Wallace, Kraemer.
Obtained funding: Frank
Administrative, technical, and material support: Frank.
Study supervision: Frank, Kraemer.
Conflict of Interest Disclosures: Dr Frank reported serving as a consultant to Servier and Vanda Pharmaceuticals; receiving grant or research support from the Fine Foundation, the Pittsburgh Foundation, and Forest Research Institute; and receiving royalties from Guilford Press. No other disclosures were reported.
Funding/Support: This study was supported by grants R01 MH065376 and K01 MH096944 from the National Institute of Mental Health.
Role of the Sponsor: The National Institute of Mental Health had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.