Patients are randomized to 1 of 4 arms, each 2 of which represent a double-blind substudy with an active treatment arm and a corresponding placebo arm. Regression to the mean and spontaneous changes are assumed to be equal in all groups. Treatment arm 2 has a smaller specific effect than treatment arm 1, but its total (specific + placebo) effect is larger because it is associated with a larger placebo effect. Adapted from Walach.6
This flowchart depicts the selection of the 79 studies.
Numbers and solid lines indicate studies with direct comparisons between active treatment and placebo groups; dashed lines, studies with direct comparisons with an additional group. Three-armed studies are displayed as triangles (the sources are given in the figure).
Results are displayed in text and graphically. CBT indicates cognitive-behavioral treatment; OR, odds ratio.
eTable 1. Search strategy for MEDLINE
eTable 2. Characteristics of the 79 studies included in the meta-analyses
eTable 3. Subgroup analyses for proportion of placebo responders
eTable 4. Multivariable analyses for proportions of placebo responders
eTable 5. Subgroup analyses according to type of placebo treatment for the 17 studies reporting pretreatment to posttreatment changes of continuous outcomes
eTable 6. Subgroup analyses according to type of placebo treatment for the specific effects in the 38 studies reporting posttreatment values of continuous outcomes
eTable 7. Results of the network meta-analysis from 7 studies reporting pretreatment to posttreatment changes of continuous outcomes
eTable 8. Results of the network meta-analysis from 11 studies reporting posttreatment values of continuous outcomes
Meissner K, Fässler M, Rücker G, Kleijnen J, Hróbjartsson A, Schneider A, Antes G, Linde K. Differential Effectiveness of Placebo TreatmentsA Systematic Review of Migraine Prophylaxis. JAMA Intern Med. 2013;173(21):1941–1951. doi:10.1001/jamainternmed.2013.10391
When analyzing results of randomized clinical trials, the treatment with the greatest specific effect compared with its placebo control is considered to be the most effective one. Although systematic variations of improvements in placebo control groups would have important implications for the interpretation of placebo-controlled trials, the knowledge base on the subject is weak.
To investigate whether different types of placebo treatments are associated with different responses using the studies of migraine prophylaxis for this analysis.
Design, Setting, and Participants
We searched relevant sources through February 2012 and contacted the authors to identify randomized clinical trials on the prophylaxis of migraine with an observation period of at least 8 weeks after randomization that compared an experimental treatment with a placebo control group. We calculated pooled random-effects estimates according to the type of placebo for the proportions of treatment response. We performed meta-regression analyses to identify sources of heterogeneity. In a network meta-analysis, direct and indirect comparisons within and across trials were combined. Additional analyses were performed for continuous outcomes.
Active migraine treatment and the placebo control conditions.
Main Outcomes and Measures
Proportion of treatment responders, defined as having an attack frequency reduction of at least 50%. Other available outcomes in order of preference included a reduction of 50% or greater in migraine days, the number of headache days, or headache score or a significant improvement as assessed by the patients or their physicians.
Of the 102 eligible trials, 23 could not be included in the meta-analyses owing to insufficient data. Sham acupuncture (proportion of responders, 0.38 [95% CI, 0.30-0.47]) and sham surgery (0.58 [0.37-0.77]) were associated with a more pronounced reduction of migraine frequency than oral pharmacological placebos (0.22 [0.17-0.28]) and were the only significant predictors of response in placebo groups in multivariable analyses (P = .005 and P = .001, respectively). Network meta-analysis confirmed that more patients reported response in sham acupuncture groups than in oral pharmacological placebo groups (odds ratio, 1.88 [95% CI, 1.30-2.72]). Corresponding analyses for continuous outcomes showed similar findings.
Conclusions and Relevance
Sham acupuncture and sham surgery are associated with higher responder ratios than oral pharmacological placebos. Clinicians who treat patients with migraine should be aware that a relevant part of the overall effect they observe in practice might be due to nonspecific effects and that the size of such effects might differ between treatment modalities.
Placebo controls are important when evaluating the effectiveness of medical treatments. They separate specific and nonspecific effects (including placebo effects, regression to the mean, natural course of the disease, etc) and reduce bias by enabling blinding of participants (and, if possible, of providers and outcome assessors). Although placebo-controlled trials may not be possible for ethical or technical reasons in certain instances, they are an important tool when investigating whether a treatment truly works, in the sense that the postulated mechanism of action makes a clinically relevant difference.
Placebo controls are crucial in many evaluations of health care. However, for theoretical reasons and based on empirical evidence, responses to placebo treatments can be quite variable depending on several factors, such as the type of placebo treatment,1- 3 how such treatments are provided (eg, in an enthusiastic or a neutral manner),4 and whether informed consent is obtained.5 Although these issues might be of little relevance for some conditions, they could be of major importance for those conditions in which cognitive and emotional processes play a central role, such as chronic pain or depression or when the symptoms evaluated are mostly of a subjective nature.
Differential responses to different types of placebo controls would challenge the classic interpretation of randomized clinical trials that the treatment with the greatest specific effect compared with its placebo control is also the most effective one.6 Rather, a direct comparison of the different types of treatments would be necessary to find out the best treatment option for a certain disease. Otherwise, a complex intervention with a small specific effect but a large placebo effect, for example, would be considered of little value while still being more effective than a simple drug application with a moderate specific effect but only a small placebo effect. This paradox has been termed the efficacy paradox (Figure 1).6
Although systematic variations of improvements in placebo control groups would have important implications for the interpretation of placebo-controlled trials, the knowledge base on the subject is weak. Trials with a direct comparison of different types of placebo treatments are relatively rare, dispersed over a variety of conditions, and difficult to identify.3 For these reasons, systematic reviews making indirect comparisons are warranted. In this review, we collected randomized clinical trials on the prophylaxis of migraine that included a placebo control group. We focused on migraine because this disease has high prevalence and economic relevance, a variety of treatment strategies have been investigated, and diagnostic criteria and outcome measures are relatively well defined. An influence of contextual factors on clinical outcomes also seems plausible on theoretical grounds. The primary aim of this review was to determine whether different types of placebo treatments are associated with different effect sizes.
We searched MEDLINE, EMBASE, the Cochrane Controlled Trials Register, and PsychINFO (from inception to February 2012) using a combination of keywords and text words related to migraine and placebo controls, combined with validated filters for controlled clinical trials.7 The search strategy for MEDLINE is shown in the Supplement (eTable 1).
We included randomized placebo-controlled trials of the prophylactic treatment of migraine in adults with observation periods of at least 8 weeks after randomization. Studies had to report at least 1 clinical outcome related to migraine (eg, response, frequency, pain intensity, headache scores, or analgesic use). Migraine had to be diagnosed according to the International Headache Society criteria, or criteria for migraine diagnosis had to be in close agreement with the International Headache Society classification. We excluded crossover studies except when the results of the first administration were given separately. We also excluded studies in which migraine was associated with other neurological disorders, studies of daily or converted migraine, studies with single-blind placebo run-in periods, studies with changes in prophylactic migraine treatment (except for titration of study medication therapy), and studies with experimental cointerventions (except for acute migraine attacks).
We screened all abstracts identified by the literature search and removed irrelevant hits (eg, duplicate studies, studies that were not randomized clinical trials, studies of conditions other than migraine, and treatment of acute migraine). All other articles were obtained in full text and checked by 2 reviewers (K.M. and M.F.) for eligibility according to our selection criteria. Disagreements were resolved by discussion.
The 2 reviewers extracted information on patients, methods, interventions, outcomes, and results using a pretested standard form. In particular, we extracted the bibliographic details of the study; exact diagnoses and headache classifications used; number and type of centers; age and sex of the patients; duration of disease; number of patients undergoing randomization, treatment, and analysis; special inclusion criteria; resistance to previous treatment; number of and reasons for dropouts; duration of baseline, treatment, and follow-up periods; type, duration, and frequency of experimental and sham treatments; description of other groups (if any); handling of acute migraine attacks; cointerventions; informed consent procedure; definition of primary outcomes; adverse events; success of blinding; type of analysis (ie, intent to treat or per protocol); and randomization ratio of placebo to active treatment (eg, 1:1, 1:2).
The primary outcome measure was the proportion of responders. We preferably defined responders as patients with a reduction in attack frequency of at least 50%. If these data were not available, we used (in descending order of preference) patients with at least a 50% reduction in the number of migraine days, with at least a 50% reduction in the number of headache days, with at least a 50% reduction in headache scores, and with significant improvement as assessed by the patients or their physicians. As a secondary outcome measure, we preferably extracted the frequency of migraine attacks (means and SDs) per month at baseline and follow-up or (in descending order of preference) the number of migraine days per month, number of headache days per month, migraine index, headache index, headache scores, headache intensity, or frequency of analgesic use. The following time windows were applied: 8 weeks or 2 months after randomization, 3 to 4 months after randomization, 5 to 6 months after randomization, and more than 6 months after randomization. The analysis considered preferably the data 3 to 4 months after randomization; otherwise, the presented data closest to 3 to 4 months were used. All outcomes relied on patient reports, mainly collected in headache diaries. If a study contained multiple treatment groups that differed only in the dosage, we pooled the values.
Risk of bias was assessed using the risk of bias tool of the Cochrane Collaboration.8 We considered adequate sequence generation, allocation concealment, patient blinding, addressing incomplete outcome data at 4 months and at 5 to 12 months after randomization, and absence of selective reporting. At least 2 reviewers K.M., M.F., or K.L.) independently judged whether the risk of bias for each criterion was considered low, high, or unclear. Disagreements were resolved by discussion.
The proportion of responders in the placebo and active treatment groups with the associated 95% confidence intervals and the relative risk between the percentage of responders in the placebo and active treatment groups (responder ratio [RR]) were calculated for each study, and results were pooled using random-effects models. To evaluate the effects of different types of placebo treatments, we performed stratified analyses (for descriptive purposes) and meta-regression analyses using the inverse variance method (Freeman-Tukey arcsine transformation). In explorative analyses, we investigated the effects of 15 covariates on outcome separately and then in a multivariable meta-regression analysis (Supplement [eTables 3 and 4]). Because a strong overlap between the factors “type of placebo treatment” and “type of blinding” was present, we tested 2 models for multivariable analyses, in each of which only one of these factors was involved. Responder data in the active treatment groups, placebo control groups, and no-treatment groups were then subjected to a network analysis using a fixed-effects method based on logistic regression to enable indirect comparisons between studies while preserving within-group randomization.9- 12
Secondary analyses were performed on continuous outcomes. The changes from baseline to after treatment (pretreatment to posttreatment changes) were analyzed by calculating standardized mean differences in the placebo and active treatment groups with the associated 95% CIs. Posttreatment data were analyzed by calculating standardized mean differences between placebo and active treatment groups with the associated 95% CIs. For both types of continuous outcomes, individual study results were pooled using the inverse variance method and a random-effects model. Network analyses were performed by adapting the method originally developed for logistic regression9- 12 for continuous outcomes.13
We assessed heterogeneity between trials using the I2 statistic.14 All analyses were performed with the R packages meta15 and metafor.16
A total of 102 studies (reported in 118 publications) met the inclusion criteria. At least 1 outcome could be included in the meta-analyses from 79 studies (Figure 2).
The individual characteristics of the 79 studies17- 93 included in the meta-analyses are given in the Supplement (eTable 2). Dichotomous and/or continuous data were retrieved from 56 and 55 studies, respectively, and 32 studies provided both types of data. The 79 trials had randomly allocated a total of 9278 patients to an experimental treatment, a placebo treatment, no treatment, or an active control treatment. The mean age of study patients was 39 years, they experienced migraine for a mean of 18 years, and most of the study participants (79.8%) were female. Twenty-four studies had a clearly predefined outcome measure and 21 had a partly predefined primary outcome measure (such as predefinition of the outcome but not of the time point). Fifty-one trials were double blind; 26, single blind; and 2, not specified. In 49 trials, the randomization ratio (placebo to active treatment) was 1:1; in 30 trials, 1:2 or less.
In 35 studies, the placebo treatment was an orally administered placebo for a pharmacological drug; in 13 studies, sham acupuncture (typically superficial needling at nonacupuncture points); in 10 studies, a pharmacological sham injection; in 8 studies, an orally administered placebo for an herbal remedy, a homeopathic remedy, or a vitamin supplement; in 8 studies, a sham cognitive-behavioral treatment; in 3 studies, a sham electromagnetic device; and in 2 studies, sham surgery (skin incision in the groin43 and exposure of cranial nerves and muscles at trigger sites56). In 4 studies, only the placebo arms were analyzed owing to different types of treatments in the placebo and active treatment groups.44,55,85,89 Four studies included a no-treatment group.24,49,52,64 One acupuncture study40 included a group that received guideline-oriented standard drug therapy.
Most studies had considerable weaknesses in relation to the details of sequence generation (51 studies with a high or unclear risk of bias), allocation concealment (68 studies with a high or unclear risk of bias), and blinding (48 studies with a high or unclear risk of bias). In 39 studies, the dropout rates at 4 months were substantial (>15%) and could have led to distortions. Although study results were often presented in a suboptimal manner, we considered the risk of selective reporting in most studies (n = 70) to be low.
The overall pooled proportion of responders in the active treatment groups was 0.42 (95% CI, 0.38-0.45), whereas it was 0.26 (0.22-0.30) in the placebo groups. On average, active treatment was significantly more effective than placebo treatment (RR, 1.40 [95% CI, 1.23-1.59]). Heterogeneity between trials was substantial (for placebo and active treatment groups, I2 = 80.8% and 80.7%, respectively [both P < .001]; for placebo vs active treatment groups, I2 = 60.5% [P < .001]).
Stratified analyses according to type of placebo treatment showed wide variations in the proportions of responders in the placebo groups (Tables 1, 2, 3, 4, and 5). On average, sham surgery and sham acupuncture (Table 4) were associated with the greatest responder proportions (0.58 [95% CI, 0.37-0.77] and 0.38 [0.30-0.47], respectively), whereas oral pharmacological placebos (Table 1) were associated with clearly smaller ones (0.22 [95% CI, 0.17-0.28]). The pooled proportions of responders for other placebo treatments ranged from 0.23 to 0.27 (Tables 2, 3, and 5). Subgroup analyses according to the type of placebo confirmed significant differences between different types of placebo treatments (Q = 22.02 [P = .001]), with substantial heterogeneity across trials (I2 = 80.8% [P < .001]). The proportions of responders in the sham acupuncture and sham surgery subgroups were significantly greater than those in the subgroup of oral pharmacological placebos (P = .004 and P = .03, respectively).
In addition to type of placebo treatment, separate subgroup analyses revealed that studies with sample sizes larger than 100 patients, patients unresponsive to previous prophylactic treatments, parallel-group design, and single blinding were associated with significantly greater placebo RRs (Supplement [eTable 3]). Explorative multivariable analyses showed the placebo treatments of sham surgery and sham acupuncture to be the only independent predictors of placebo RRs in model 1 (Supplement [eTable 4]). In model 2, blinding emerged as the only independent predictor of placebo RRs, with single-blinded studies associated with greater RRs.
Forty trials with dichotomous outcomes were included in a network meta-analysis, which allowed comparisons among sham acupuncture, oral pharmacological placebos, sham cognitive-behavioral treatment, their corresponding active treatments, and no treatment (Figure 3). Results are summarized in Figure 4. Sham acupuncture was associated with greater effects than were oral pharmacological placebos and no treatment (odds ratio, 1.88 [95% CI, 1.30-2.72] and 4.57 [2.34-8.92], respectively). Meta-analyses of continuous outcomes and the corresponding network meta-analyses broadly confirmed the findings for the dichotomous data (Supplement [eTables 5 through 8]).
In the studies included in this review, the amount of reduction in migraine frequency varied systematically between different types of placebo. We found consistent evidence across analyses that sham acupuncture was associated with a more pronounced frequency reduction than were oral pharmacological placebos. Sham surgery also appeared to be more effective, but data on this subject were sparse. Oral placebos for herbs, vitamins, or homeopathic drugs; injected placebos for pharmacological drugs; a sham electromagnetic device; and sham cognitive-behavioral treatment were associated with effects similar to oral pharmacological placebos.
By focusing on a single condition with well-defined diagnostic criteria and outcome measures, we were able to confirm and extend earlier findings that suggested physical placebo treatments in general and acupuncture in particular were associated with greater effects than were pharmacological placebo treatments.1,94,95 Experimental placebo research in the past decades has provided ample evidence that treatments without active ingredients can relieve pain and other symptoms, and the underlying mechanisms are increasingly understood.96 The context and meaning of a placebo treatment are more important than the placebo vehicle itself.4,97 However, the context and meaning of surgery, for example, differ considerably from those of an oral drug. Patients may develop greater expectations about treatments such as acupuncture and surgery because of the more elaborate and impressive treatment rituals.98 The higher level of attention and physical contact may also play a role.99 The most probable explanation for the apparently greater effectiveness of sham acupuncture and possibly sham surgery compared with placebo pills is thus their systematic association with contextual factors that are known to enhance placebo effects. Part of the enhanced placebo effect, however, may also result from the physiological effects of skin injury during sham acupuncture and sham surgery.100,101
Our results suggest that the response to sham acupuncture and sham surgery can be as great as the mean response to active drugs. The other placebo treatments in our data set showed RRs comparable to those of oral pharmacological placebos. The finding that studies with pharmacological sham injections as placebo controls did not show greater effects than oral placebo pills is in contrast to the results of 2 earlier meta-analyses2,102 reporting an enhanced placebo effect of injected vs oral placebos for the treatment of acute migraine attacks. The discrepancy with the literature may be due in part to the fact that, in our review, the placebo injections controlled for botulinum toxin, which induces persistent physical change due to muscle relaxation, whereas in the other reviews the placebo treatment controlled for analgesics without such adverse effects. The absence of physical changes in placebo-treated patients in botulinum toxin studies may have led to the unblinding of patients and physicians, thereby decreasing the placebo effect. The absence of a difference between the responses to a sham electromagnetic device and oral pharmacological placebos is in agreement with a recent meta-analysis that detected no difference between the placebo response for oral pharmacological placebos and repetitive transcranial magnetic stimulation.103
The multivariable analyses indicated that the placebo response was greater in single-blinded than in double-blinded designs (Supplement [eTable 4, model 2]). This response most probably results from a strong overlap of the factors blinding and type of treatment in that the placebo treatments associated with the greatest improvements (ie, sham acupuncture and sham surgery) can usually only be performed in a single-blinded manner. In contrast to what would be expected, single blinding apparently did not decrease the placebo response.
Our data cannot prove that the differences in migraine frequency reduction are caused by the different types of placebos and their associated context and meaning. In the first part of our analyses, we compared the outcome of control groups of separate trials receiving different types of placebo treatments. This comparison of uncontrolled case series is prone to biases. Although we investigated a variety of covariates, we cannot rule out the risk of spurious positive associations due to the high number of covariates and the possibility that we missed an important factor. Although our main results could be confirmed by network analyses, the connections in our networks did not allow us to include all studies of our data set, and the heterogeneity of patients, settings, and treatment modalities further limit the conclusiveness of the network meta-analyses. One potential explanation for our findings could be differential response bias.104 The use of patient diaries in migraine trials probably reduces the risk of response bias but does not eliminate it. Regression to the mean and spontaneous improvements also could vary between subgroups of studies. Our data suggest that baseline headache frequency was slightly lower in acupuncture trials compared with pharmacological trials, making larger regression to the mean less likely. However, this reduced frequency could also imply that patients in acupuncture trials were easier to treat.
Our results suggest that some types of placebo treatments can be, on average, associated with greater improvements than others. Although our study cannot prove that this association is causal, the results support the notion that some placebo treatments can trigger clinically relevant responses. Our findings warrant further studies testing different types of placebo as well as active treatments directly against each other. We also suggest that treatment options whose contexts vary strongly should be investigated in placebo-controlled trials and head-to-head comparisons. Otherwise, treatments with small specific effects greater than those of their sham controls are withheld from patients even though they work better than standard treatment. Clinicians who treat patients with migraine should be aware that a relevant part of the overall effect they observe in practice might be the result of nonspecific effects and that the size of such effects might differ between treatment modalities. In other words, the method of treatment delivery might have an important influence on outcome. Although our analyses focused on migraine, the same conclusion could well be true for other conditions.
Submitted for Publication: December 13, 2012; final revision received May 16, 2013; accepted May 19, 2013.
Corresponding Author: Karin Meissner, MD, Institute of Medical Psychology, Ludwig-Maximilians-University Munich, Goethestrasse 31, 80336 Munich, Germany (email@example.com).
Published Online: October 14, 2013. doi:10.1001/jamainternmed.2013.10391.
Author Contributions: Drs Meissner and Linde had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Meissner, Fässler, Kleijnen, Hróbjartsson, Antes, Linde.
Acquisition of data: Meissner, Fässler, Linde.
Analysis and interpretation of data: Meissner, Fässler, Rücker, Kleijnen, Hróbjartsson, Schneider, Linde.
Drafting of the manuscript: Meissner, Rücker.
Critical revision of the manuscript for important intellectual content: Fässler, Rücker, Kleijnen, Hróbjartsson, Schneider, Antes, Linde.
Statistical analysis: Meissner, Rücker, Kleijnen.
Obtaining funding: Meissner, Kleijnen, Linde.
Administrative, technical, or material support: Schneider, Fässler, Antes.
Study supervision: Kleijnen, Antes, Linde.
Conflicts of Interest Disclosures: None reported.
Funding/Support: This study was supported by grant 01KG0924 from the German Ministry of Education and Research.
Role of the Sponsor: The funding source had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.