A, Thirty patients with schizophrenia were randomized to either the antipsychotic treatment or the control group. Despite the trial being simulated, with a main effect of treatment (Cohen d = 1.32; t27.5 = 3.46; P = .002), it may be tempting to infer from the Positive and Negative Syndrome Scale (PANSS) pretreatment and posttreatment outcome difference scores that some patients in the treatment group responded better than others. This observed pretreatment and posttreatment outcome difference can be misleading given that individual differences might be merely some unexplained components of variance. B, The treatment group patients are ranked according to the observed pretreatment and posttreatment outcome differences and classified as responders or nonresponders based on an arbitrary threshold (dashed light blue line). Although the ranking is not necessary for the classification, it increases the perception of individual differences in response to treatment.
Two simulated scenarios are shown: one with a constant treatment effect across patients (A-C), and one with a reversed ranking after a control condition is taken into account (D, E). A, Using the same ranking from the simulated parallel trial, we show that ranking is a flawed approach to quantifying symptom improvement. B, When adding a control condition to the initial parallel trial, the seeming differences in improvement from patients in the treatment group vanish. C, The crossover trial simulation eliminates spurious differences in the outcome and reveals no between-patient differences in response. Although seemingly unlikely, such a scenario cannot be ruled out from the results of a parallel group trial. In another scenario, a variable treatment effect is added to a parallel trial (A). D, Differences in improvement may reverse the ranking if patients varied in their response to the treatment compared with controls. E, The patient who appeared to have improved the most in D had actually the smallest net improvement. PANSS indicates Positive and Negative Syndrome Scale.
Four patients from the simulated trial were measured repeatedly over time using the Positive and Negative Syndrome Scale (PANSS). A, The differences in a patient’s symptom severity scores from time point to time point are independent of any intervention. B, Measuring repeatedly and calculating the means over all time points account for random within-patient fluctuations and reveal that patients had the same mean PANSS score. What differed was the amount of random fluctuation, which might be a highly unlikely scenario but, until tested, cannot be assumed is not true.
The same simulation is shown of 30 patients with schizophrenia in a parallel trial assessed with the Positive and Negative Syndrome Scale (PANSS). A, Patients received both the antipsychotic and placebo drugs in a crossover trial. B, Only by running a crossover trial more than once can we identify whether individual response is a trait or a permanent feature of the patient. C, Associating the net improvement from crossover trials 1 and 2 (solid dots) shows that advantages from the first trial do not replicate in the second (r = 0.02; 95% CI). In this scenario, response to treatment is not a permanent feature of the patient. D, Another scenario using the same crossover trial 1 (A) and repeating it, we might observe that the response was almost identical to the response in crossover trial 1. E, Patients responded similarly in both trials (r = 0.77; 95% CI). In this scenario, we can consider response to treatment as a trait in these patients. Repeatability of the association from one crossover trial to the next in the same sample is the only way to separate treatment-by-patient interaction from random within-patient variation and to determine whether response is a trait or a state.
The forest plot shows the VR together with its 95% CI for treatment vs control across 52 studies. Each study is listed with its respective citation number. The overall VR was lower for treatment compared with control group.
eFigure 1. No Associations Between Mean and SDs
eFigure 2. Variability for Treatment vs Control Across All Investigated Antipsychotic Drugs
Customize your JAMA Network experience by selecting one or more topics from the list below.
Winkelbeiner S, Leucht S, Kane JM, Homan P. Evaluation of Differences in Individual Treatment Response in Schizophrenia Spectrum Disorders: A Meta-analysis. JAMA Psychiatry. 2019;76(10):1063–1073. doi:10.1001/jamapsychiatry.2019.1530
Is there evidence from randomized clinical trials that patients respond differently to antipsychotic drugs?
In this meta-analysis of 52 randomized clinical trials involving 15 360 patients with a schizophrenia or schizoaffective diagnosis, the outcome variability in the antipsychotic drug treatment group was not higher but slightly lower than that in the placebo control group.
This study cannot rule out that individual differences in drug response might still exist, but it does question the assumption of a personal element of response to antipsychotic treatment.
An assumption among clinicians and researchers is that patients with schizophrenia vary considerably in their response to antipsychotic drugs in randomized clinical trials (RCTs).
To evaluate the overall variation in individual treatment response from random variation by comparing the variability between treatment and control groups.
Cochrane Schizophrenia, MEDLINE/PubMed, Embase, PsycINFO, Cochrane CENTRAL, BIOSIS Previews, ClinicalTrials.gov, and World Health Organization International Clinical Trials Registry Platform from January 1, 1955, to December 31, 2016.
Double-blind, placebo-controlled, RCTs of adults with a diagnosis of schizophrenia spectrum disorders and prescription for licensed antipsychotic drugs.
Data Extraction and Synthesis
Means and SDs of the Positive and Negative Syndrome Scale pretreatment and posttreatment outcome difference scores were extracted. Data quality and validity were ensured by following the PRISMA guidelines.
Main Outcomes and Measures
The outcome measure was the overall variability ratio of treatment to control in a meta-analysis across RCTs. Individual variability ratios were weighted by the inverse-variance method and entered into a random-effects model. A personal element of response was hypothesized to be reflected by a substantial overall increase in variability in the treatment group compared with the control group.
An RCT was simulated, comprising 30 patients with schizophrenia randomized to either the treatment or the control group. The different components of variation in RCTs were illustrated with simulated data. In addition, we assessed the variability ratio in 52 RCTs involving 15 360 patients with a schizophrenia or schizoaffective diagnosis. The variability was slightly lower in the treatment compared with the control group (variability ratio = 0.97; 95% CI, 0.95-0.99; P = .01).
Conclusions and Relevance
In this study, no evidence was found in RCTs that antipsychotic drugs increased the outcome variance, suggesting no personal element of response to treatment but instead indicating that the variance was slightly lower in the treatment group than in the control group; although the study cannot rule out that subsets of patients respond differently to treatment, it suggests that the average treatment effect is a reasonable assumption for the individual patient.
Personalized medicine is based on a widely held assumption that patients differ substantially in their response to treatments. The goal of personalized medicine is to find the right treatment for the right patient. Psychiatry is no exception. An assumption among clinicians and researchers alike is that the response to antipsychotic drugs by patients with psychosis differs considerably between individuals.1
We report that this assumption may be ungrounded. Although variation in the observed treatment responses obviously exists, it is crucial to distinguish between observed and true treatment response: observed response consists of true response plus regression to the mean, some placebo effects, and random terms such as (but not restricted to) measurement error. First, we exemplify why confusing observed with true treatment response is so common, and we use simulated data to show how variation that is purely random and unrelated to permanent differences in treatment response may suggest the need for personalized treatment. Next, we review the evidence of the differences in treatment response by conducting a meta-analysis of the variation in antipsychotic treatment trials. Although this issue is brought up by statisticians regularly,2 it deserves more attention from a general psychiatric audience.
Where does the assumption of individual differences in treatment response come from? In general, antipsychotic drugs are assessed in randomized clinical trials (RCTs), the criterion standard for identifying the efficacy of a treatment. In RCTs, patients are assessed at baseline (eg, with the Positive and Negative Syndrome Scale [PANSS]) and randomized to either a treatment or a control group. What RCTs can ultimately provide is an answer to whether a treatment works in general. This average treatment effect is derived from the direct comparison of the response between the treatment and the control groups, which is imperative in an RCT.3 Understandably, this answer may leave clinicians unsatisfied; after all, they are treating individual, and not typical, patients. From a clinical perspective, patients vary considerably in their response to antipsychotic drugs, and the general response may seem almost like an uninformed guess for the individual patient. Furthermore, clinicians seem to prefer categories such as normal or abnormal and responders or nonresponders to inform diagnostic and therapeutic decisions. A consequence is that many investigators now try to personalize medicine by aiming to tailor treatments to individual patients. They agree that response to treatment varies from patient to patient.
However, estimating individual response to treatment, known as the treatment-by-patient interaction, is more complex than often appreciated and depends on laborious study designs, such as repeated crossover trials.2 However, as we illustrate with simulated data, such study designs are needed to distinguish individual response to treatment from other components of variation that are unrelated to permanent differences in treatment response.4-7
By design, RCTs cannot estimate the treatment-by-patient interaction, the index of individual response. Although RCTs do not tell anything about individual response, they might indicate something about the presence of individual response. As recognized early by Fisher,8 an increase in variance in the treatment group compared with the control group could indicate the presence of variation in response to treatment.2 The strength of this increase would then quantify the size of the personal element of response and provide evidence for the presence of a treatment-by-patient interaction.9 A method has been developed to compare variances between groups across studies10 and has been adopted by a meta-analysis package.11 In psychiatry, this method has been applied to compare variances in brain structure12 and inflammatory parameters in psychosis.13 This method compares the variance of treatment and control by computing their ratio: a ratio of 1 means equal variances, a ratio greater than 1 means more variability in the treatment group, and a ratio smaller than 1 means less variability in the treatment group compared with the control group.10,12,13
This study is organized in 2 parts. The first part illustrates the different components of variation in RCTs with simulated data, showing the importance of recognizing the treatment-by-patient interaction (which reflects individual treatment response) as the component of interest. The second part shows the results of a meta-analysis, which tested for the presence of treatment-by-patient interaction in empirical data from antipsychotic drug RCTs. We compared the overall variability in the treatment group with the overall variability in the control group, using data from a recently published meta-analysis,14 summarizing 24 years of placebo-controlled, antipsychotic RCTs in schizophrenia. We hypothesized that compared with control, the often-highlighted heterogeneity in patients with schizophrenia would be reflected by a clinically relevant increase in overall variance of treatment, outcome, which is compatible with a personal element of response that deviates from the estimated average treatment effects.
To illustrate the different components of variation in RCTs, we simulated data from patients with schizophrenia who were randomized to either the antipsychotic treatment or control group and assessed with the PANSS and a positive effect of treatment (Cohen d = 1.32; t27.5 = 3.46; P = .002). First, we added a single crossover condition with either a constant or a varying treatment effect, and then we added a double crossover to this simulated trial. With these additions, we show how the variability between and within patients has to be distinguished from the treatment-by-patient interaction, the component reflecting the individual differences in treatment response.
To ensure data quality and validity, this meta-analysis was conducted in accordance with the PRISMA guidelines. We searched Cochrane Schizophrenia, MEDLINE/PubMed, Embase, PsycINFO, Cochrane CENTRAL, BIOSIS Previews, ClinicalTrials.gov, and World Health Organization International Clinical Trials Registry Platform from January 1, 1955, to December 31, 2016.
Using the meta-analysis of Leucht et al15 as a basis, we included published and unpublished double-blind, placebo-controlled RCTs of at least 3 weeks’ duration. These studies investigated adults with a diagnosis of schizophrenia spectrum disorders and prescription for licensed antipsychotic medications, except clozapine. Studies were excluded if they investigated relapse prevention, patients with predominant negative symptoms, patients with major concomitant somatic or psychiatric illness, or intramuscular formulations of antipsychotic treatment, or if they were Chinese research. We included only studies that reported the necessary information (mean, SD, and sample size) of the PANSS pretreatment and posttreatment outcome difference scores.
In studies that combined comparisons of multiple antipsychotic drugs with placebo, we calculated an aggregated SD across all comparisons, leaving only 1 SD per study. We extracted the PANSS means and SDs of the pretreatment and posttreatment outcome difference scores as well as the sample sizes for the treatment and the control groups. Further information on the search strategy is published elsewhere.15
The SDs of the pretreatment and posttreatment outcome difference scores in the treatment and control groups consist of the same variance components, including the within-patient variation. The treatment group, however, may also include the additional treatment-by-patient interaction, which could indicate the presence of individual response differences. Thus, in the case of a variable treatment effect, an increase of the variance in the treatment group, compared with the control group, should be observable. To assess this variation, we calculated for each comparison between antipsychotic and placebo drugs the relative variability of treatment and control as the log variability ratio (log VR)16 with
in which SDTx was the reported sample SD for treatment, SDCt was the reported sample SD for control, nTx was the treatment sample size, and nCt the control sample size.10 The corresponding sampling variance (SD2logVR)for each comparison between antipsychotic and placebo drugs can be expressed as follows:
We did not find an association between the pretreatment and posttreatment outcome difference scores and their respective SDs in the data for the control group (β = 0.16; P = .15; eFigure 1A in the Supplement) or the treatment group (β = –0.05; P = .63; eFigure 1B in the Supplement). For this reason, we did not consider the log coefficient of variation ratio (log CVR) as an additional index for comparing variabilities.10
We weighted each log VR with the inverse of this sampling variance11 and entered it into a random-effects model. This approach allows for the quantification of the true individual response, after adjusting for within-patient variability and regression to the mean.5,9 Results were back-transformed from the log scale for better interpretability, with a variability ratio higher than 1, indicating greater variability under treatment compared with control, and a ratio lower than 1, indicating less variability under treatment compared with control.
The analysis was performed from October 31, 2018, to March 29, 2019, with the R package metafor, version 2.0.0,11 and the manuscript was produced with the R package knitr, version 1.20, in RStudio (R Foundation for Statistical Computing). All the data and code we used are freely available online to ensure reproducibility (https://doi.org/10.17605/OSF.IO/QARVS).
We simulated an RCT of 30 patients with schizophrenia randomized to either the treatment or the control group. The individual pretreatment and posttreatment outcome differences (Figure 1A) might tempt us to infer that some patients in the treatment group responded better than others. We might then rank these patients according to their outcome and classify them as either responders or nonresponders. However, such ranking and classification can be misleading.
Although seemingly different (Figure 2A), adding a simulated crossover condition to the initial parallel trial may reveal that the apparent differences in improvement among patients in the treatment group vanish (Figure 2B) and the treatment effect may actually be constant across patients (Figure 2C). Such a scenario cannot be ruled out from the results of a parallel group trial. In addition, the same ranking (Figure 2A) may reflect yet another scenario, in which differences in improvement as calculated from a crossover condition may reverse the ranking (Figure 2D), such that patients who appeared to have improved the most had actually the smallest net improvement (Figure 2E). Apparent outcome differences among patients in an RCT may still be compatible with a constant treatment effect.
Next, outcome differences may also be found within patients. Assessing patients repeatedly over time might reveal that symptoms fluctuate randomly around the same mean score (Figure 3). This fluctuation shows that within-patient variability alone may suggest differences in treatment response that are a mere reflection of random fluctuation.
Again, we can add a simple crossover condition to the simulated parallel group trial (Figure 4A), in which each patient received both the treatment (antipsychotic drug) and control (placebo). Only by running the crossover trial once again (Figure 4B) can we determine whether the differences observed in the first crossover trial are indeed stable features of the patients. The net improvement from crossover trial 1 may not replicate in crossover trial 2, which indicates that the response differences are still not stable features of the patients (Figure 4C). For that stability to be the case, we would have to see a similar outcome in crossover trial 1 (Figure 4A) as in crossover trial 2 (Figure 4D), in which case we have identified a substantial treatment-by-patient interaction (Figure 4E).
A careful distinction of the sources of variation in a simulated RCT has shown that it is not trivial to distinguish the source of primary interest (treatment-by-patient interaction) from components that tell nothing about individual response. In the meta-analysis, we assessed whether evidence exists for such treatment-by-patient interaction across antipsychotic drug trials.
We investigated 75 comparisons of antipsychotic drug with placebo in 52 RCTs.17-68 None of these studies used a design such as repeated crossovers that would have allowed for a direct estimate of individual responses. Overall, a total of 15 360 patients with a schizophrenia or schizoaffective diagnosis were included, of whom 8550 (55.7%) had been randomized to the treatment group and 6810 (44.3%) to the control group (more details can be found in the eResults in the Supplement).
We found an overall lower variability in treatment compared with control (variability ratio = 0.97; 95% CI, 0.95-0.99; P = .01; Figure 517-68). This finding indicates that the overall variability across treatment groups was 3% lower compared with that in the control groups. Furthermore, we compared the variances in individual antipsychotic drug outcome and found the same pattern, with lower variability across treatment compared with control (variability ratio = 0.97; 95% CI, 0.95-1.00; P = .02; eFigure 2 in the Supplement).
No evidence was found that antipsychotic treatment increased the outcome variance compared with the control. Instead, the outcome variance was slightly lower in the treatment than in the control group.
A widespread belief among clinicians and researchers is that patients differ substantially in their antipsychotic treatment response, but finding evidence for this assumption is complex. A likely explanation, supported by the simulations conducted for this study, is that taking an observed treatment response as the true treatment response is tempting, compelling us to ignore the components of variation most likely encountered: random variation within patients and differences between patients. The existing empirical evidence for such individual differences is weaker than expected: No evidence was found that the antipsychotic drug increased the outcome variance compared with the placebo. Instead, the outcome variance was slightly lower in the treatment group. With this finding, we still cannot rule out that subsets of patients responded differently to treatment, but the overall small difference in variances suggests that the average treatment effect is a reasonable assumption for the individual patient. By assuming heterogeneity in treatment outcomes, we might ultimately introduce noise into clinical practice by refusing to go with the best available evidence, the average treatment effect derived from RCTs.2
Although RCTs are questioned regularly, sometimes using questionable arguments,69 they remain the criterion standard in clinical research. They provide unbiased estimates of the relative efficacy of an intervention, which even the largest observational studies cannot provide.70 In addition, appreciating the role of randomization in RCTs is important. Randomization is not compatible with the notion that specific features, such as placebo response, increase in one but not the other group in an RCT. If evidence existed of an enhanced placebo response over time, as has been suggested repeatedly in the past years,14 this response would have been apparent in both the control and the treatment groups because of randomization and thus would have canceled out. Furthermore, the concept of placebo response, although regularly investigated,71 cannot be studied by looking at the observed responses in control groups,72 for the same reasons that this approach does not work for the treatment groups, as this study has shown.
Comparing the variabilities between treatment and control groups may provide valuable insight into the presence of individual response and the scope of personalized medicine. Recently, other groups have taken a similar approach to assess the presence of individual differences in brain structure12 and immunological parameters in psychosis.13 We assumed that, in the presence of a personal element of response to treatment, the variance in the treatment group should be higher compared with the control group, which in turn would require further investigation (eg, with n-of-1 trials).73-75 However, our results indicate that overall variability in the treatment groups was slightly lower, if only by a modest amount (1% to 5%). One explanation might be that the treatment had a stabilizing quality9 that reduced the variability in the treatment group. An example for such variance stabilization might be the floor effect, in which the assessment instrument is too coarse to capture patient improvements over a certain level.
Nevertheless, given the slightly lower variability under treatment compared with control found in this study, we cannot rule out that individual differences in response to antipsychotic drugs might still exist. A subset of ill patients may have responded well to treatment, whereas less affected patients may not have improved, resulting in an overall decreased variance under treatment than under control.9 Yet, the finding of a narrow CI around an overall only slightly lower variability suggests that substantial differences in drug response are rather unlikely. Thus, analyses aimed at estimating individual response might be premature until these differences have been shown to exist and to be clinically relevant.
As the simulations have shown, labeling patients as responders might be misleading. The label suggests that true response has been established as a permanent feature of the patient, even though the label is a mere reflection of the observed response, which includes true response plus regression to the mean, some placebo effect, and random terms such as measurement error. Thus, response rates that are calculated in RCTs reflect observed but not true response. We suggest that biomarker research aimed at identifying response to treatment of individuals or subgroups should consider the possibility that treatment outcome is less heterogeneous than anticipated and might even be close to constant across individuals.
This meta-analysis has some limitations. First, the calculation of the pretreatment and posttreatment outcome difference scores varied between studies. Although some RCTs calculated the differences between outcome and baseline PANSS scores, others used analysis of covariances with the baseline PANSS scores and additional variables as covariates. Thus, some of the included SDs of change were adjusted for covariates but others were not. Second, the use of pretreatment and posttreatment outcome difference scores might lead to a loss of information and might not be sensitive enough to capture differences in response to treatment.76 Third, we assumed that individual responses to treatment were reflected by increased variance in the treatment group. Yet, this increased variance could have also indicated the presence of subgroups who responded differently to the treatment.9 Such a case would argue for stratified medicine rather than personalized medicine, in which subgroups of patients receive varying treatments. As any interaction, a treatment-by-patient interaction is ultimately scale dependent, which means it can be removed by transformation of the scale.77
Until the differences in individual response to treatment have been demonstrated with careful designs, the overall small differences in outcome variance suggest that the average treatment effect is a reasonable assumption for the individual patient.
Accepted for Publication: April 27, 2019.
Corresponding Authors: Stephanie Winkelbeiner, PhD, Center for Psychiatric Neuroscience, Feinstein Institute for Medical Research, 75-59 263rd St, New York, NY 11004 (firstname.lastname@example.org); Philipp Homan, MD, PhD, Center for Psychiatric Neuroscience, Feinstein Institute for Medical Research, 75-59 263rd St, New York, NY 11004 (email@example.com).
Published Online: June 3, 2019. doi:10.1001/jamapsychiatry.2019.1530
Author Contributions: Drs Homan and Winkelbeiner had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Winkelbeiner, Leucht, Homan.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Winkelbeiner, Homan.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Winkelbeiner, Homan.
Obtained funding: Winkelbeiner.
Administrative, technical, or material support: All authors.
Conflict of Interest Disclosures: Dr Winkelbeiner reported grants from Swiss National Science Foundation during the conduct of the study. Dr Leucht reported personal fees from LB Pharma International, H Lundbeck A/S, Otsuka Pharmaceutical, Teva Pharmaceutical Industries Ltd, LTS Lohmann Therapy Systems, Gedeon Richter, Recordati SpA, MSD, Boehringer Ingelheim and Sandoz, Janssen Pharmaceutica, Eli Lilly & Company, SanofiAventis, and Servier Laboratorie outside of the submitted work; reanalysis of a clinical trial together with Geodon Richter and the publication of its results; and honoraria from Johnson & Johnson, MSD, Angelini, and Sunovion. Dr Kane reported grants from Otsuka, Lundbeck, and Janssen, as well as other from Alkermes, Allergan, Forum, Genentech, Lundbeck, Intracellular Therapies, Janssen, Johnson & Johnson, Merck, Neurocrine, Otsuka, Pierre Fabre, Reviva, Roche, Sunovion, Takeda, Teva, Vanguard Research Group, and LB Pharmaceuticals outside of the submitted work. No other disclosures were reported.
Meeting Presentation: The results of this study were presented at the World Congress of Biological Psychiatry, June 3, 2019, Vancouver, British Columbia, Canada.
Additional Contributions: The authors thank Majnu John, PhD, Department of Mathematics, Zucker School of Medicine at Northwell/Hofstra, for advice on the analysis of the current study, as well as Stephen Senn, PhD, Methodology and Statistics, Luxembourg Institute of Health, and Daniel Guinart, MD, Department of Psychiatry, Zucker School of Medicine at Northwell/Hofstra, for helpful comments on the manuscript. These individuals received no additional compensation, outside of their usual salary, for their contributions.
Additional Information: All data and code are freely available online to ensure reproducibility (https://doi.org/10.17605/OSF.IO/QARVS).
Create a personal account or sign in to: