Customize your JAMA Network experience by selecting one or more topics from the list below.
Hypericum Depression Trial Study Group. Effect of Hypericum perforatum (St John's Wort) in Major Depressive DisorderA Randomized Controlled Trial. JAMA. 2002;287(14):1807–1814. doi:10.1001/jama.287.14.1807
Context Extracts of Hypericum perforatum (St John's
wort) are widely used for the treatment of depression of varying severity.
Their efficacy in major depressive disorder, however, has not been conclusively
Objective To test the efficacy and safety of a well-characterized H perforatum extract (LI-160) in major depressive disorder.
Design and Setting Double-blind, randomized, placebo-controlled trial conducted in 12 academic
and community psychiatric research clinics in the United States.
Participants Adult outpatients (n = 340) recruited between December 1998 and June
2000 with major depression and a baseline total score on the Hamilton Depression
Scale (HAM-D) of at least 20.
Interventions Patients were randomly assigned to receive H perforatum, placebo, or sertraline (as an active comparator) for 8 weeks. Based
on clinical response, the daily dose of H perforatum
could range from 900 to 1500 mg and that of sertraline from 50 to 100 mg.
Responders at week 8 could continue blinded treatment for another 18 weeks.
Main Outcome Measures Change in the HAM-D total score from baseline to 8 weeks; rates of full
response, determined by the HAM-D and Clinical Global Impressions (CGI) scores.
Results On the 2 primary outcome measures, neither sertraline nor H perforatum was significantly different from placebo. The random regression
parameter estimate for mean (SE) change in HAM-D total score from baseline
to week 8 (with a greater decline indicating more improvement) was –9.20
(0.67) (95% confidence interval [CI], –10.51 to –7.89) for placebo
vs –8.68 (0.68) (95% CI, –10.01 to –7.35) for H perforatum (P = .59) and –10.53 (0.72)
(95% CI, –11.94 to –9.12) for sertraline (P = .18). Full response occurred in 31.9% of the placebo-treated patients
vs 23.9% of the H perforatum–treated patients
(P = .21) and 24.8% of sertraline-treated patients
(P = .26). Sertraline was better than placebo on
the CGI improvement scale (P = .02), which was a
secondary measure in this study. Adverse-effect profiles for H perforatum and sertraline differed relative to placebo.
Conclusion This study fails to support the efficacy of H perforatum in moderately severe major depression. The result may be due to low
assay sensitivity of the trial, but the complete absence of trends suggestive
of efficacy for H perforatum is noteworthy.
Hypericum perforatum (St John's wort) is widely
used to treat depression, sometimes in an attempt to avoid adverse effects
associated with prescription antidepressants. One meta-analysis in 1996 concluded
that hypericum is superior to placebo for treatment of mild to moderate depression.1 Subsequent studies have found hypericum to be comparable
to active controls, such as amitriptyline,2
and fluoxetine,6 and superior to placebo.4,7 Some studies suggest that it may be
an effective treatment for moderately severe depression.3,4
Others have been unable to differentiate hypericum from placebo.8,9
Important issues have been raised regarding existing studies, including
limited information about use in clinically defined major depression, lack
of placebo-controlled trials that have included a selective serotonin reuptake
inhibitor arm, and absence of controlled data for continuation treatment.
Concern has been raised about adverse interactions of hypericum with certain
drugs.10,11 Most hypericum in
the United States is consumed without physician consultation. Even though
many patients prefer to avoid the use of medications with adverse effects,
there is a risk that people with clinically significant depression may self-medicate
with hypericum rather than receive effective medication or psychotherapy.
This placebo-controlled study was designed to expand on previous trials
by studying outpatients with well-defined major depression of moderate severity
and included a 4-month continuation phase and sertraline as an active comparator
to calibrate the trial's validity. The main hypothesis tested whether hypericum
would be superior to placebo after 8 weeks of treatment.
The study was a randomized, double-blind, parallel-group, 8-week, outpatient
trial of hypericum, sertraline, or placebo treatment for major depressive
disorder, followed by up to 18 weeks of double-blind continuation treatment
in participants meeting response criteria at 8 weeks.
Outpatients meeting Diagnostic and Statistical Manual
of Mental Disorders, Fourth Edition (DSM-IV)
criteria for major depressive disorder12 were
recruited from 12 academic or community clinics between December 1998 and
June 2000. Major depressive disorder was diagnosed with the modified Structured
Clinical Interview for Axis I DSM-IV disorders (SCID-Hypericum).13
Inclusion criteria were age at least 18 years; current diagnosis of
major depression; minimum total score of 20 on the 17-item Hamilton Depression
(HAM-D)14 scale and a maximum score of 60 on
the Global Assessment of Functioning (GAF)12
at screening and baseline following a 1-week, single-blind, placebo run-in;
no more than a 25% decrease in HAM-D total score between screening and baseline;
capacity to give informed consent and follow study procedures; and identification
of a close personal contact to be notified if warranted by clinical concerns.
Exclusion criteria were a score above 2 on the HAM-D suicide item; attempted
suicide in the past year or current suicide or homicide risk; being pregnant,
planning pregnancy, breastfeeding, or not using medically acceptable birth
control; clinically significant liver disease or liver enzyme levels elevated
to at least twice the upper normal limit; serious unstable medical illness;
history of seizure disorder; SCID diagnoses indicating alcohol or other substance-abuse
disorder within the past 6 months or lifetime diagnoses of schizophrenia,
schizoaffective or other psychotic disorder, bipolar disorder, panic disorder,
or obsessive-compulsive disorder; history of psychotic features of affective
disorder; evidence of untreated or unstable thyroid disorder; no response
to at least 2 adequate trials of antidepressants in any depressive episode;
daily use of hypericum or sertraline for at least 4 weeks within the past
6 months; current use of other psychotropic drugs, other medicines, dietary
supplements, natural remedies, or botanical preparations with psychoactive
properties; use of investigational drugs within 30 days of baseline or of
other psychotropic drugs within 21 days of baseline (within 6 weeks for fluoxetine);
allergy or hypersensitivity to study medications; positive urine drug screen;
introduction of psychotherapy within 2 months of enrollment or any ongoing
psychotherapy specifically designed to treat depression; and mental retardation
or cognitive impairment.
Patients provided written informed consent, and the institutional review
board approved the protocol at each site. Patients who remained eligible after
a 1-week placebo run-in were randomly assigned to receive 1 of 3 treatments
in a 1:1:1 ratio within permuted blocks of size 3 and 6 within site by sex
strata. Sites telephoned a 24-hour randomization service for computer-generated
treatment assignment. Drug kits were designed to be indistinguishable for
the 3 treatments at each dose level.
Patients were assessed weekly or biweekly until week 8. Patients who
fully or partially responded during these 8 weeks (ie, the acute phase) could
enter the continuation phase, with visits at weeks 10, 14, 18, 22, and 26.
The HAM-D, GAF, Clinical Global Impressions Scales15
for severity (CGI-S) and improvement (CGI-I) and Beck Depression Inventory
(BDI)16 were assessed at all visits. The Sheehan
Disability Scale (SDS)17 was completed at baseline
and weeks 8 and 26. As a way to evaluate blinding at weeks 8 and 26, clinicians
and patients indicated their beliefs about treatment assignment.
Drug accountability, concomitant therapies, vital signs, and self- and
physician-rated symptom reports were assessed at every visit. Blood chemistry
and hematologic tests and electrocardiography and physical examinations were
performed at screening and at weeks 8 and 26. Urine toxicology was performed
Columbia University biometric staff trained all raters in the use of
the SCID and HAM-D. For reliable scoring of the CGI, raters scored case vignettes
before the study. Throughout, audiotapes of SCID and HAM-D interviews and
of the medication-management sessions were audited by the coordinating center
for quality and adherence to the guidelines specified in the operations manual.
Hypericum and matching placebo were provided by Lichtwer Pharma (Berlin,
Germany); sertraline and matching placebo were provided by Pfizer Inc (New
York, NY). The Lichtwer extract (LI-160) was selected for its well-characterized
features and literature supporting its possible efficacy in depression.1,18,19 The extract was standardized
to between 0.12% and 0.28% hypericin, and the entire supply came from one
batch. The study was conducted under an investigational new drug application
filed by the manufacturer.
Medications were given 3 times daily. During the run-in period, patients
received placebo tablets in a single-blind fashion. At baseline, patients
were randomly assigned to receive hypericum (900 mg/d), sertraline (50 mg/d),
or placebo. Daily hypericum, sertraline, or placebo doses could be increased
to 1200 mg, 75 mg, or placebo equivalent, respectively, after weeks 3 or 4
and to 1500 mg, 100 mg, or placebo equivalent at week 6 if the CGI-S score
was 4 (moderately ill) or more at week 3, or 3 (mildly ill) or more at weeks
4 or 6. After week 8, those eligible to continue could receive maximum daily
doses of 1800 mg, 150 mg, or placebo equivalent, respectively. Medication
was dispensed in blister packets in double-dummy fashion. Doses could be held
or reduced for adverse effects. For insomnia, zolpidem (5 to 10 mg) was permitted
up to twice weekly during weeks 1 and 2 and up to 6 times total during continuation.
The prospectively defined primary efficacy measures were the change
in the 17-item HAM-D total score from baseline to week 8 and the incidence
of full response at week 8 or early study termination. Full response was defined
as a CGI-I score of 1 (very much improved) or 2 (much improved) and a HAM-D
total score of 8 or less. Partial response was defined as a CGI-I score of
1 or 2, a decrease in the HAM-D total score from baseline of at least 50%,
and a HAM-D total score of 9 to 12. Secondary end points comprised the GAF,
CGI, BDI, and SDS scores.
After week 8, relapse was defined as a HAM-D score of 20 or more and
a CGI-S score of 4 or more at 2 consecutive visits. Serious suicidal ideation
or the development of psychosis also served as grounds for removal from the
study and prompt clinical assessment.
Any symptom or sign that appeared or became worse after baseline was
considered an adverse event. Adverse events were elicited and recorded by
the study physician at each visit, based on patient interview and on a 44-item
checklist completed by the patient and expanded from an earlier scale.20
Patients were deemed noncompliant if they had taken less than 80% of
the prescribed medication, according to pill counts at each follow-up visit.
The principal comparison was between the hypericum and placebo groups.
Sertraline served as an active comparator to evaluate the study's sensitivity.
Sample-size calculations were based on detecting a difference in full-response
rates at 8 weeks, assuming full-response rates of 55% for hypericum and 35%
for placebo. Accordingly, a sample size of 336 patients (112 per group) was
specified to ensure 85% power with a type I error rate of 5% (2-sided). The
sample-size calculation assumed no interactions of treatment with site or
sex, the blocking factors for randomization.
The primary analysis was according to assignment at randomization. However,
a systematic review of all protocol deviations in patient enrollment, as indicated
by the database, was undertaken before the study was unblinded, and patients
who did not meet the HAM-D total score of at least 20 entry criteria (n =
2) were excluded from the efficacy analyses as recommended by the scientific
advisors. These ineligibilities resulted from mistakes in summing the 17-item
Treatment differences in full-response rates were assessed with Wald χ2 statistics from logistic regression, with fixed effects for treatment,
site, sex, and baseline HAM-D total score. Treatment differences in the change
in HAM-D total score from baseline to week 8 were evaluated through a random-coefficient
regression model. The longitudinal scores at baseline and weeks 1 through
8 (all available acute-phase data) were modeled as a linear function of fixed
effects for treatment, site, sex, study week (linear), and treatment by study
week, with random intercept and slope over time for each patient. Under the
assumptions of this model, tests of treatment differences for the change in
HAM-D total score from baseline to week 8 are equivalent to tests of treatment
differences in the linear trends or slopes with time.
These analyses on full response and change in the HAM-D total score
were specified in the final protocol as primary in assessment of acute-phase
efficacy. Secondary analyses included random-coefficient regression models
on secondary outcomes as described above for HAM-D, similar modeling on primary
and secondary outcomes but restricted to patients completing the acute phase
(completer analysis), and analysis of covariance models on primary and secondary
outcomes using the last available acute-phase measurement (last observation
carried forward). The analysis-of-covariance models included effects for treatment,
site, sex, and the respective baseline scores.
Among the 3 treatment comparisons, those of hypericum vs placebo and
sertraline vs placebo were of interest a priori, and their P values (2-sided) are presented. In addition, for efficacy measures,
the nominal significance level for the hypericum vs sertraline contrast is
noted in the text if P<.05 when the 3 treatment
groups differed overall (2 df) using a type I error
Simple tests for treatment differences included χ2 and
Fisher exact tests for categorical variables, Kruskal-Wallis/Wilcoxon-Mann-Whitney
tests for ordinal and continuous measurements, and log-rank tests for time
to events. These tests were applied to baseline characteristics, protocol
deviations, adverse events, attrition, compliance, treatment beliefs, and
maintenance of response during continuation. In addition, the nonparametric
methods were used on the efficacy measures to substantiate results that rely
on distributional assumptions. The consistency of the data with the parametric
assumptions was checked for the primary analyses.
All analyses were performed with SAS version 6.12 software (SAS Institute
Inc, Cary, NC). The PROC MIXED procedure was used for the longitudinal data
In all, 428 patients entered the run-in phase, and 340 were randomized
(Figure 1). No differences were
noted between treatment groups at baseline (Table 1). In addition, with regard to severity of depression, the
numbers of patients judged to be mild, moderate, marked, and severe were 3,
261, 70, and 6, respectively, according to the CGIs. Baseline total HAM-D
scores ranged from 18 to 33.
There were similar proportions of patients among the treatment groups
discontinued before week 8: 27% for hypericum (n = 31), 28% for placebo (n
= 32), and 29% for sertraline (n = 32). Likewise, time to early discontinuation
did not differ significantly (P = .91, log-rank test),
although 17 of the 32 dropouts in the sertraline group (53%) occurred during
the first 2 weeks vs 5 of 32 dropouts in the placebo group (16%) and 11 of
31 dropouts in the hypericum group (35%).
Of the 340 acute-phase subjects, 245 (72%) completed 8 weeks, 129 entered
the continuation phase, and 79 completed continuation. There were no treatment
differences in attrition during continuation. Nine of the 129 patients entering
continuation did not meet response criteria. These patients and the 2 who
were ineligible for the acute phase were excluded from efficacy analysis in
the continuation phase.
The mean (SD) highest daily dose prescribed was 1299 (243) mg (95% confidence
interval [CI], 1254-1344 mg) for hypericum and 75 (21) mg (95% CI, 71-79 mg)
for sertraline during the acute phase and 1382 (284) mg (95% CI, 1292-1473
mg) for hypericum and 89 (32) mg (95% CI, 80-98 mg) for sertraline during
continuation. Fewer patients in the sertraline group achieved the highest
daily dose level during the acute phase (36% compared with 54% for hypericum
and 54% for placebo; P = .005, Kruskal-Wallis test).
Similar proportions of each treatment group required dose reductions during
the acute phase because of adverse events (hypericum, 4%; placebo, 5%; and
The HAM-D total scores throughout the 8-week trial are summarized by
treatment group in Figure 2. The
random-coefficient regression analysis on the longitudinal HAM-D total scores
detected a downward linear trend with time (F1,263 = 565.2; P<.001) and general differences in scores by site (F12,328 = 4.56; P<.001) and sex (lower for
men; F1,326 = 8.97; P = .003). Linear
trends with time did not differ significantly by treatment (hypericum vs placebo:
F1,265 = 0.30 and P = .59; sertraline
vs placebo: F1,264= 1.83 and P = .18),
and no interactions were found between treatment and site or sex. Model estimates
for the mean (SE) change in HAM-D total score (week 8 minus baseline) were −8.68
(0.68) for hypericum, −9.20 (0.67) for placebo, and −10.53 (0.72)
for sertraline. According to the model estimates for the difference in slopes
between sertraline and placebo and the variance estimate for the random slope
coefficient, the sertraline effect size was 0.24.
Full response rates at acute-phase exit did not differ between placebo
and either hypericum (P = .21) or sertraline (P = .26) (Table 2).
Patients with a lower HAM-D total score at baseline had a higher rate of full
response (P = .002), which did not differ by treatment
group. No differences were noted by site or sex. No interactions were found
for treatment by site, sex, or baseline HAM-D total score. The ordinal distribution
of responses (full, partial, or none) differed significantly across treatments
(P = .04, Kruskal-Wallis test), with more partial
responders among the sertraline patients and more full responders among the
placebo patients. Findings were similar for patients completing the acute
phase, with percentages (full, partial, or no response) of 30.5%, 17.1%, and
52.4% for hypericum, 41.6%, 14.3%, and 44.0% for placebo, and 31.2%, 29.9%,
and 39.0% for sertraline, respectively.
Other acute-phase outcomes did not differ between hypericum and placebo
(Table 3). Sertraline was superior
to placebo on the CGI-I score at week 8 (F1,254 = 6.36; P = .02) and showed a general trend toward better outcomes. In post
hoc analysis, sertraline proved superior to hypericum on the CGI-I scale (F1,252 = 7.91; P = .01).
The numbers of patients who entered continued therapy for hypericum,
placebo, and sertraline were 38, 42, and 49, respectively; for those completing
treatment, 24, 27, and 28, respectively. Efficacy assessment during continuation
excluded the 2 patients not meeting the HAM-D entry criteria at baseline and
9 patients who began continuation without meeting the full or partial response
criteria. The HAM-D total score means (SDs) were 6.7 (3.5), 6.2 (3.0), and
6.9 (3.6) at entry to continuation and 6.6 (4.5), 5.3 (5.2), and 6.7 (4.9)
at week 26, respectively, by treatment. Only 1 patient (receiving hypericum)
relapsed during continuation.
Hypericum and sertraline were associated with more acute-phase adverse
events than placebo. Table 4 displays
those that differed significantly. Analyses of data from the continuation
phase yielded similar results. Rates of diarrhea, nausea, and sweating (sertraline);
anorgasmia (sertraline and hypericum); and frequent urination and swelling
(hypericum) all were higher than those of placebo. Forgetfulness was less
common with sertraline than with placebo. No serious adverse events occurred.
At the end of 8 weeks, the proportion of patients guessing their treatment
correctly was 55% for sertraline, 29% for hypericum, and 31% for placebo (P = .02 for differences between treatment groups). Correct
guesses for clinicians totaled 66% for sertraline, 29% for hypericum, and
36% for placebo (P = .001 for differences between
treatment groups). The change (mean) in HAM-D total score from baseline to
week 8 did not differ for patients who were in the sertraline group and either
had guessed the correct treatment (−11.6 ; 95% CI, −13.1 to −10.1)
or had not (−11.9 ; 95% CI, −13.9 to −9.9).
As with 2 other trials,8,9
we have found no evidence for a superior effect of hypericum relative to placebo.
Neither hypericum nor sertraline could be differentiated from placebo on the
primary efficacy measures. Although the efficacy of sertraline was demonstrated
on the secondary CGI-I measure, resulting on average in much improvement,
hypericum had no efficacy on any measure. Although not designed to compare
sertraline with hypericum, the study showed superiority of sertraline on the
CGI-I. Responders who entered continuation treatment maintained their improvement
equally in each treatment group.
The overall effect size for sertraline on the HAM-D total score was
0.24, which is consistent with reported effect sizes for standard antidepressants,21,22 while on the CGI-I, it was 0.41.
These findings can also be observed in the context of 3 other sertraline studies23-25 that yielded effect
sizes of 0.31, 0.33, and 0.45 for the drug relative to placebo on the HAM-D
change from baseline to last observation in all study patients.
Adverse effects for sertraline were consistent with its profile, while
for hypericum, more frequent anorgasmia, swelling, and urination were noted
relative to placebo, although these were mild and the multiple comparisons
may have produced spurious associations.
When a new treatment cannot be distinguished from placebo, it is important
to determine whether a drug of known efficacy would have been proven effective
in that sample. Failure of established antidepressants to show such superiority
occurs in up to 35% of trials,26,27
which illustrates the difficulties plaguing randomized placebo-controlled
trials in this population. We should, therefore, consider some of the factors
that might have contributed to our results. One concern is a high placebo
response rate, but this was not unusually high in our sample and is therefore
an unlikely explanation.
Addressing specific issues relevant to sertraline, we note the following.
Although a dose-effect relationship within the therapeutic range (50-200 mg/d)
has not been demonstrated,28-30
one may wonder whether the study dose limitation (up to 100 mg/d in the 8-week
trial) was too restrictive and whether even the highest prescribed dose was
administered for an inadequate duration. The protocol dose regimen was chosen
on the basis of extensive discussion by all parties concerned in the study
design and oversight as the best compromise for ensuring effective treatment
while minimizing the incidence of dose-related adverse events. In fact, only
36% of sertraline patients had their dose maximized to 100 mg/d compared with
54% for hypericum or placebo. There were more partial responders to sertraline
(23.9%) than to hypericum (14.2%) or placebo (11.2%; P
= .03). Analyses of the sertraline patients eligible for a dose increase revealed
no association between lack of dose increase and presence of adverse events.
Thus, it appears that, in this trial, clinicians tended not to increase the
sertraline dose for patients with partial response, electing instead to allow
more time with the same dose. On the matter of dosing, if any protocol bias
existed at all, it would favor hypericum, which could be dosed to the maximum
of its permissible range, whereas the maximum permitted dose of sertraline
was only 50% of its highest recommended amount.
The study was adequately powered to detect moderate effect sizes (ie,
at least 0.40). The observed sertraline effect size was small (0.24) on the
HAM-D total score and moderate (0.41) on the CGI-I; hence, the lack of statistical
significance on the primary outcome measure.
For hypericum, 2 issues are relevant. Although the hyperforin content
of this batch was 3.1%, the formulation was not standardized to hyperforin,
which has been suggested by some as an important active ingredient.31,32 Hypericum may be most effective in
less severe major depression (eg, HAM-D scores <20), but further study
of this possibility needs to be conducted according to the standard diagnostic
From a methodological point of view, this study can be considered an
example of the importance of including inactive and active comparators in
trials testing the possible antidepressant effects of medications. In fact,
without a placebo, hypericum could easily have been considered as effective
as sertraline, as some studies have done with respect to active antidepressants.2,3 On the other hand, without sertraline
as an active comparator, the results would have been interpreted as evidence
for the lack of efficacy of hypericum, without consideration of the possibility
that a low assay sensitivity of the trial might have contributed to the finding.
An increasing number of studies have failed to show a difference between
active antidepressants and placebo.33,34
Many of the presumed factors underlying this phenomenon were carefully attended
to in this study, eg, adherence to quality control by rater training, treatment
adherence monitoring, inclusion of experienced investigators, and carefully
defined entry criteria. Despite all of this, sertraline failed to separate
from placebo on the 2 primary outcome measures.
Besides the limitations already discussed, our study tested one particular
hypericum extract, although many are marketed. Because the active ingredient
of hypericum is unclear, it is difficult to extrapolate clinical data from
one extract to other products. The extract we tested is among the best characterized,
however, and is the one for which the most efficacy data are available. Thus,
we believe that the results can be considered relevant to other hypericin-standardized
Because hypericum is widely available, it is likely to be used for milder
depression, but its use in this population cannot be supported until trials
show clear evidence of efficacy. According to available data, hypericum should
not be substituted for standard clinical care of proven efficacy, including
antidepressant medications and specific psychotherapies, for the treatment
of major depression of moderate severity.