Ioannidis JPA. Effect of the Statistical Significance of Results on the Time to Completion and Publication of Randomized Efficacy Trials. JAMA. 1998;279(4):281-286. doi:10.1001/jama.279.4.281
From the HIV Research Branch, Division of AIDS, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Md.
Context.— Medical evidence may be biased over time if completion and publication
of randomized efficacy trials are delayed when results are not statistically
Objective.— To evaluate whether the time to completion and the time to publication
of randomized phase 2 and phase 3 trials are affected by the statistical significance
of results and to describe the natural history of such trials.
Design.— Prospective cohort of randomized efficacy trials conducted by 2 trialist
groups from 1986 to 1996.
Setting.— Multicenter trial groups in human immunodeficiency virus infection sponsored
by the National Institutes of Health.
Patients.— A total of 109 efficacy trials (total enrollment, 43708 patients).
Main Outcome Measures.— Time from start of enrollment to completion of follow-up and time from
completion of follow-up to peer-reviewed publication assessed with survival
Results.— The median time from start of enrollment to publication was 5.5 years
and was substantially longer for negative trials than for results favoring
an experimental arm (6.5 vs 4.3 years, respectively; P<.001;
hazard ratio for time to publication for positive vs negative trials, 3.7;
95% confidence interval [CI], 1.8-7.7). This difference was mostly attributable
to differences in the time from completion to publication (median, 3.0 vs
1.7 years for negative vs positive trials; P<.001).
On average, trials with significant results favoring any arm completed follow-up
slightly earlier than trials with nonsignificant results (median, 2.3 vs 2.5
years; P=.045), but long-protracted trials often
had low event rates and failed to reach statistical significance, while trials
that were terminated early had significant results. Positive trials were submitted
for publication significantly more rapidly after completion than were negative
trials (median, 1.0 vs 1.6 years; P=.001) and were
published more rapidly after submission (median, 0.8 vs 1.1 years; P=.04).
Conclusion.— Among randomized efficacy trials, there is a time lag in the publication
of negative findings that occurs mostly after the completion of the trial
SEVERAL INVESTIGATORS have raised concerns that clinical studies with
negative results may never be published and their failure to appear in the
literature may distort the picture we obtain from clinical experiments about
the optimal practice of medicine.1- 5
However, ascertaining the extent of this bias is difficult. Retrieving information
about lost studies is a challenge.3- 6
Prior research has been based on retrospective interviews about the fate of
research protocols located through questionnaires,6
meeting abstracts,7 or archives of institutional
boards or funding organizations.1,2,5
Prospective evaluation of the phenomenon with detailed trial registries8 gathering information on all the implementation milestones
of randomized trials (onset, completion of enrollment, completion of follow-up,
submission, and publication) has not been accomplished. Also most prior investigations
have applied the term publication bias indistinguishably
to phase 1, 2, and 3 trials and to both randomized and nonrandomized studies.
However, the loss of information from a small, unrevealing pilot or observational
study is not comparable to the disappearance of the results of randomized
efficacy trials, which form the mainstream of evidence for medical practice.
Such loss of information could affect systematic reviews, the decisions of
funding agencies, and the outcomes of patients.
From retrospective investigations, it remains unclear whether publication
bias affects specifically phase 2 and 3 trials to a substantial extent.1,4 Moreover, since many efficacy trials
perform interim analyses during their conduct and may be prematurely interrupted
if significant results are seen, it is unknown whether the results of a study
would affect not only the time to publication after completion, but also the
time a trial takes to be completed. Until now it has not been possible to
address this last question with retrospectively constructed databases. Knowledge
of the exact natural history of efficacy trials is needed to provide an accurate
perspective on how medical evidence is obtained and whether there is a time
lag between studies with "positive" and "negative" results.
The best insight into the fate of clinical trials can be gained from
trial databases that collect prospectively information about the conduct of
all studies in a given domain. Such an approach can offer a perspective on
the natural history of clinical trials from inception through completion to
publication. In this article detailed information from a large database of
clinical trial protocols was used with data on the implementation milestones
and publication dates of registered trials. This allowed assessment of the
natural history of randomized efficacy trials in the domain of human immunodeficiency
virus (HIV) infection and its complications, a discipline of rapidly expanding
therapeutics with intense clinical research activity.
All efficacy clinical trials conducted from 1986 until 1996 by the AIDS
[acquired immunodeficiency syndrome] Clinical Trials Group (ACTG) and by the
Terry Beirn Community Programs for Clinical Research on AIDS (CPCRA) were
considered in the analysis. These trial groups sponsored by the Division of
AIDS of the National Institute of Allergy and Infectious Diseases (NIAID)
represent globally the largest networks for the conduct of clinical trials
on HIV infection and its complications. ACTG uses the resources of 30 university
sites across the United States as well as other collaborative clinical units.
The CPCRA is a community-based program encompassing more than 160 clinical
practices across the United States. All protocols and detailed dated information
on their implementation and presentation are archived by the Division of AIDS.
Supplemental information about recently analyzed trials and clarifications
on unclear or missing data were obtained from investigators and medical officers
and staff responsible for the protocols.
The analysis was limited to randomized trials addressing the efficacy
of the compared interventions (phase 2 and phase 3 trials). Observational,
nonrandomized, pharmacokinetic, and safety phase 1 and phase 1/2 studies were
excluded as well as substudies of the main protocols. Qualification for inclusion
and recording of study design factors (including target sample size, end points,
designation of phase, blinding, and data management) was based on examination
of the complete protocols archived in the Division of AIDS, which offered
a distinct advantage to avoid bias in recording as compared with interviews,
questionnaires, and data extraction from abstracts that have been used in
previous investigations of publication bias.1- 7
Trials were selected regardless of whether they compared a regimen with placebo,
different regimens, or different doses of the same medication. All protocols
that enrolled any patients have been registered. The considered implementation
milestones included the dates of starting enrollment, completion of follow-up,
submission for peer-reviewed publication, and publication. For studies that
continued follow-up beyond their primary analysis and publication, follow-up
was censored at the time of the primary analysis. For 9 early studies, exact
dates for completion of follow-up were not available but could be approximated
with high accuracy from data on closure to enrollment and trial follow-up.
All data were censored for analysis on October 10, 1996.
In this article, a trial is called "positive" if a statistically significant
finding (denoted by P<.05) had been found in the
analysis of the data for a main efficacy end point defined in the protocol
in favor of an experimental therapy arm. Trials with nonstatistically significant
findings or favoring the control arm are called "negative." Whenever there
was no distinct control (traditional therapy) arm, a study was called positive
if it showed a statistically significant finding in favor of any arm. When
multiple major efficacy end points were available, the trial was considered
positive if any major efficacy end point reached statistical significance.
In 2 trials, significance favored different arms for survival and another
end point; these trials were classified according to the direction of the
survival results. The availability of the complete archived protocols and
trial reports minimized the chances for subjective interpretation of end points
and trial results. For further quality assurance, a random selection of 12
protocols and trial reports was evaluated by another colleague who was blinded
to the study milestones. There was agreement in determination of pertinent
end points and classification of significance between the author and the second
independent observer in all cases.
Time-to-event analyses were performed with the Kaplan-Meier method,
and comparisons used the log-rank test. The significance levels of the findings
and other trial characteristics were used as covariates for the risk of publication
in Cox proportional hazards regressions. Trial characteristics included the
actual sample size, the ratio of accrual compared with originally anticipated
(target) enrollment (typically based on power calculations), the trialist
group, the age of the population (adult or pediatric), the trial domain (antiretroviral
therapy vs complications of HIV), the presence or not of double blinding,
and the place where data were managed (pharmaceutical industry or other).
Both univariate and multivariate models and interactions between variables
were assessed, but only univariate regressions are reported since the results
of the multivariate regressions were similar and no significant interactions
were identified. Statistical analyses were run on SPSS software, version 6.0
(SPSS, Inc, Chicago, Ill). All reported P values
are 2 tailed.
A total of 109 randomized efficacy trials with total enrollment of 43708
patients qualified for the analysis. Of these, 8 were closed, having failed
to accrue more than 20 patients, and are excluded from all subsequent analyses.
Typically the conduct of these 8 trials became unfeasible or futile, and no
publishable evidence materialized. Of the remaining 101 trials, 25 were still
open to accrual, 10 were closed to accrual and continuing follow-up, and 66
trials with 30715 patients had been completed; 36 of the 66 completed trials
had been published (18 at the New England Journal of Medicine, 6 in the Annals of Internal Medicine, 5
in the Journal of Infectious Diseases, and 1 each
in 7 other journals) at the time data were censored for analysis. Of the 30
completed but unpublished trials, 9 had been submitted at least once for publication,
and 8 more had been completed less than 1 year ago. Characteristics of the
trials are shown in Table 1.
In a Kaplan-Meier analysis, the median time to publication among the
101 analyzed trials was estimated to be 5.5 years from the time a trial started
enrollment (interquartile range, 3.9-7.0 years). On average, the time it took
to conduct a study was of similar magnitude as the time it took to publish
the results after its completion. The median time from starting until completing
follow-up was 2.6 years (interquartile range, 2.0-3.8 years). Among the 66
completed trials, the median time from completion of follow-up to publication
was 2.4 years (interquartile range, 1.6-3.8 years).
Overall, positive trials were published significantly earlier than trials
with negative findings (Figure 1,
A; log-rank P<.001). The median time from starting
enrollment to publication was 4.3 years for positive trials vs 6.5 years for
negative trials (mean, 4.2 vs 6.4 years). As shown in Figure 1, B, this time lag was largely attributable to differences
between positive and negative trials in the time from completion of follow-up
to publication (median, 1.7 vs 3.0 years; mean, 1.8 vs 3.6 years; log-rank P <.001).
Conversely, as shown in Figure 1,
C, positive and negative trials differed little in the time they took to complete
their follow-up (log-rank P=.17). However, when all
trials with statistically significant results were considered regardless of
whether the experimental or control arm was favored, these trials completed
follow-up slightly faster than trials with nonsignificant results (Figure 1, D; log-rank P=.045). The absolute difference was not large (median, 2.3 vs 2.5
years; mean, 2.3 vs 2.8 years). To avoid the possible bias that trials still
continuing follow-up at the censoring date may be more likely to be negative,
a separate analysis included only the 50 trials that had started before June
30, 1992, and had all been completed by the time of analysis. Trials with
significant results were completed slightly earlier (median, 2.3 vs 2.7 years;
mean, 2.4 vs 3.0 years; log-rank P =.02).
The time-to-completion difference was probably important mostly for
some early interrupted trials and some trials that had protracted enrollment
and follow-up. Ten trials completed follow-up within less than 18 months from
starting: 4 ran their prespecified course within this period of time; 1 was
interrupted prematurely because of the surfacing of the significant results
of a similar trial; and 5 trials were stopped early because of significant
differences in either survival or clinical outcomes (n=4) or both efficacy
and toxicity (n=1). On the other end, 12 trials had taken more than 4 years
to complete follow-up: 2 were still trying to accrue patients, and another
9 were protracted because of lower than anticipated event rates in interim
analyses despite full accrual; only 1 trial had an anticipated event rate.
Of the 8 trials not published at 6 years after starting enrollment,
5 had a prolonged enrollment and follow-up period (3.8-6.5 years) because
of lower than anticipated event rates. Seven of the 8 trials had negative
results. In contrast, the 5 trials published within less than 3 years from
starting enrollment were all interrupted prematurely on the basis of early
differences in efficacy between the arms in interim analyses (2 favoring the
defined experimental arm, 2 favoring 1 of the arms, 1 favoring the defined
control arm) and were all subsequently published in prestigious journals (New England Journal of Medicine, n=4; Annals of Internal Medicine, n=1).
Table 2 shows that significance
of results was the only major determinant of the time from starting enrollment
to publication for a clinical trial. The rate of publication was 3.7 (95%
confidence interval [CI], 1.8-7.7) times higher for positive trials compared
with negative ones, and the difference was explained again mostly by differences
in the rapidity of publication after trials had been completed. The magnitude
of the effect was unchanged in multivariate analyses adjusting for other trial
characteristics (odds ratio, 4.9 [95% CI, 2.1-11.0]). Interestingly, large
trials with more than 1000 patients took the same time to be published (if
not less) than smaller trials. Large trials took probably a longer time to
complete than smaller trials (P=.12), but their time
to publication after completion was significantly shorter (P=.02). Using study accrual as a continuous variable, the rate at which
a trial was completed decreased 1.8-fold (95% CI, 1.0-3.3) per 1000 patients,
but the rate at which a trial was subsequently published after completion
increased 2.5-fold (95% CI, 1.3-4.5) per each 1000 patients. Another interesting
feature is that trials with data management performed by the pharmaceutical
industry were of shorter duration (P<.001), but
this did not expedite their overall time to appearance in the peer-reviewed
literature (P=.33). There was no significant correlation
between the presence of statistical significance, study accrual, and whether
the data were managed by the industry or not (all correlation coefficients
<0.2 in absolute value), but it should be remembered that trials were still
monitored by NIH.
Underpowered trials that accrued less than half of their required prespecified
target sample size (which was typically based on a priori power calculations)
were published as fast as trials that had reached closer to their target sample
size, although a moderate difference may have been missed in this analysis
(odds ratio, 1.2 [95% CI, 0.4-3.5]). Among the 43 completed trials that had
enrolled at least 90% of their prespecified target sample size, positive trials
were still published substantially more rapidly than negative ones (log-rank P <.001).
Among the 66 completed trials, 45 had been submitted for publication
by the time data were censored. The median time to first submission was 1.4
years after completion (interquartile range, 0.7-2.3 years), and the median
time to publication was 0.8 year after submission (interquartile range, 0.6-1.4
years). Positive trials were submitted significantly more rapidly compared
with trials with negative results (mean, 1.0 vs 2.4 years; median, 1.0 vs
1.6 years; log-rank P =.001; Figure 2, A). A similar time lag was observed when trials with significant
results favoring any arm were compared with trials with nonsignificant results
(mean, 1.3 vs 2.4 years; median, 1.0 vs 1.6 years; log-rank P =.008). Data were managed by the industry in 2 of the 3 negative
studies that had not yet been submitted despite a lapse of more than 3.5 years
after their completion. After submission, positive trials were also published
more rapidly than negative trials, but the time lag was relatively smaller
(for both mean and median, 0.8 vs 1.1 years; log-rank P=.04; Figure 2, B).
Of the 45 submitted trials, 17 were rejected by at least 1 journal.
At least 4 negative trials with over 300 patients each were rejected 2 or
3 times, while no positive trial was rejected multiple times. There was a
trend that positive results might increase the odds of acceptance on the first
submission (odds ratio, 1.6 [95% CI, 0.5-5.6]), but this was far from being
formally significant (P=.54 by the Fisher exact test).
Among the 36 published trials, there was a moderately strong correlation
between the time from completion to submission and the time from submission
to publication (r=0.50; P
=.002). These 2 time intervals were not significantly different across the
published trials (mean difference, 0.2 year; P=.09
by t test).
This analysis shows that, even within multicenter trial groups of high
efficiency, randomized efficacy trials are published more rapidly when results
reach traditional levels of statistical significance. Negative studies suffer
a substantial time lag. With some exceptions, most of this lag is generated
after a trial has been completed. On the average, the time it takes to publish
the results of efficacy trials, positive or negative, is of the same magnitude
as the time it takes to actually conduct them.
This analysis extends the observations of earlier retrospective investigations
of publication bias that claimed that investigators were not interested in
the publication of small, negative trials.1,3,4,6
Earlier investigations have not agreed on the extent of publication bias for
randomized trials,1,2 and the
distinction between efficacy trials and early phase 1 and 1/2 trials could
not be made clearly with retrospective approaches. For efficacy trials, publication lag may be a more exact term than publication
bias, and this is most accurately described with a survival analysis approach.
The effect of this publication lag can still be important, however. Trial
results may be outdated when published belatedly. Rapid advances in technology
and changes in the patient populations9 and
clinical practice decrease the importance of the conveyed information. This
is particularly true in rapidly evolving fields such as HIV therapeutics,
where even significant findings become rapidly outdated. Publication lag drastically
reduces the value of the provided evidence and introduces bias over time since
the publication of positive results tends to antedate the surfacing of negative
Besides proper peer-reviewed publication, information is disseminated
by meetings, preliminary non–peer-reviewed presentations, and pharmaceutical
advertisement. The extent of bias in non–peer-reviewed mechanisms of
disseminating information is unknown, but unavoidably advertisement is fed
preferentially by positive results,10 and prior
investigators have suggested that meetings favor positive results.11 Beyond doubt, medical advances should draw prompt,
strong attention. However, there is a concern that, in the rapidly changing
face of medicine, positive trials may one-sidedly dominate our information
system, as their negative counterparts take longer to appear, typically with
less or no advertising glare around them. There are several examples where,
on the same topic, positive trials caused early excitement, followed later
by disappointing or less promising counterparts. Typical examples in HIV disease
(Table 3) include the use of early
zidovudine monotherapy in asymptomatic patients,12- 14
acyclovir,15 ditiocarb (Imuthiol),16,17 and oral ganciclovir prophylaxis,18 where positive and negative trials started at about
the same time, but negative studies appeared later14,17
or are still unpublished. Thus, a wave of enthusiasm is sometimes followed
by a wave of disappointment or skepticism. Early systematic reviews of the
accumulating evidence may give misleading results. An investigation in HIV-related
trials, in particular, has shown that meta-analyses including only the published
evidence would have found sizable treatment benefits for several controversial
or even abandoned treatments,19 while in the
case of zidovudine, a meta-analysis of the early published, short-term trials
would suggest markedly more favorable results compared with the pooled treatment
effect of the trials with longer follow-up that appeared later.20
Many phase 3 trials use sequential preplanned interim analyses to ensure
that early differences between arms are not missed,21,22
and thus, positive trials may be interrupted prematurely while negative trials
may be further protracted. This analysis shows that, with some exceptions,
this approach does not have a major direct impact on the timing of most clinical
trials. Exceptions do exist, mostly for trials at the 2 extremes: all the
efficacy trials published within less than 3 years from opening to enrollment
had been prematurely interrupted on the basis of interim analyses, while prolonged
enrollment and follow-up, often because of slowly accrued events, was common
in trials languishing for more than 6 years.
Publication delays for negative trials occur largely after the trial
completion. Delays can be substantial even in multicenter trial groups with
strict publication policies and high efficiency. As part of the ACTG and CPCRA
standards of practice, strong incentives to publication are offered, including
credit in the competitive renewal of funding for the site of the principal
investigator. Similarly, disincentives to negligent investigators include
potential dismissal from authorship and exclusion from chairmanship of future
protocols. It is possible that publication lag may be more pronounced for
trials conducted by groups not dependent on federal funding and for trials
where the sponsor has a financial interest in the outcome.23,24
Future analyses should try to address prospectively whether the natural history
of efficacy trials is different under these less ideal circumstances.
The observed average delay for negative findings after submission to
peer review seemed small in this analysis, but it should be acknowledged that
it could be an underestimate. Among published trials, the time to submit correlated
fairly strongly with the time to be published. It is unknown whether negative
trials whose submission had been deferred by the censoring date in this analysis
might have similarly been deferred publication after an eventual submission.
Also, multiple rejections seemed to cluster among negative trials.
Three more points warrant discussion. First, this analysis shows that,
under the current circumstances, evidence from large trials becomes available
in the peer-reviewed literature as fast as evidence from smaller trials. Large
trials take longer to complete, but then they surface much more rapidly. The
relative merits of small vs large trials have often been debated.25 With limited resources and rapid changes in medicine,
large trials with prolonged follow-up are a challenge to design and conduct
in many medical disciplines, including HIV disease. However, if small trials
are to offer an advantage over larger trials, their results should be submitted
to peer review and published promptly. Otherwise, a few large trials may be
preferable to small trials presented in non–peer-reviewed sources and
potentially affected by publication lag and bias.
Second, publication lag and bias may increase under circumstances where
positive results accumulate rapidly, leaving even less interest for studies
with inconclusive or less spectacular findings. Currently, with a geometric
increase in the available therapeutic options in many medical disciplines,
including HIV infection, it is possible that the light of publicity may fall
even earlier and more heavily on the most positive results, probably even
for trials considering the same or comparable regimens. The extent of publication
lag should be addressed with similar methods in other domains besides HIV
infection to examine its extent in less rapidly evolving specialties.
Third, publication lag may affect evidence-based medicine and systematic
reviews and may lead to spuriously larger treatment effects in early meta-analyses
of the available evidence. Meta-analyses should be aware of ongoing studies
that may influence their conclusions. Late-appearing trials are more likely
to show nonsignificant results. Lack of significance could be attributable
either to smaller treatment effect estimates or to lower event rates resulting
in wider CIs. In this registry, delay in completion and publication because
of low event rates and slow accrual of events was a common occurrence. Differences
between early- and late-appearing trials could reflect heterogeneity in the
trial protocols and the disease risk of the studied patient populations.26 Cumulative meta-analysis may be used to evaluate
systematically the extent of the impact of late negative trials,27
and reasons for discrepancies between early meta-analyses and late-appearing
trials should be studied systematically.28
Enthusiasm about the results of clinical research should not be based
on its P value.29
Clinical trials and meta-analyses provide a continuum of evidence and nondefinitive
trials may provide as important information as trials with high levels of
statistical significance when seen in the context of other pieces of evidence.
It is still very questionable what really optimizes clinical care, but the
results of properly conducted trials are definitely a cornerstone in the process.
Equipoise in the presentation of all results is important for an objective
evaluation of the accumulated evidence.