July 13, 1994

Statistical Power, Sample Size, and Their Reporting in Randomized Controlled Trials

Author Affiliations

From the Clinical Epidemiology Unit, Loeb Medical Research Institute (Mr Moher and Dr Wells), and the Faculties of Medicine (Mr Moher and Dr Wells) and Health Sciences (Dr Dulberg), University of Ottawa, Ottawa, Ontario.

JAMA. 1994;272(2):122-124. doi:10.1001/jama.1994.03520020048013

Objective.  —To describe the pattern over time in the level of statistical power and the reporting of sample size calculations in published randomized controlled trials (RCTs) with negative results.

Design.  —Our study was a descriptive survey. Power to detect 25% and 50% relative differences was calculated for the subset of trials with negative results in which a simple two-group parallel design was used. Criteria were developed both to classify trial results as positive or negative and to identify the primary outcomes. Power calculations were based on results from the primary outcomes reported in the trials.

Population.  —We reviewed all 383 RCTs published in JAMA, Lancet, and the New England Journal of Medicine in 1975, 1980, 1985, and 1990.

Results.  —Twenty-seven percent of the 383 RCTs (n=102) were classified as having negative results. The number of published RCTs more than doubled from 1975 to 1990, with the proportion of trials with negative results remaining fairly stable. Of the simple two-group parallel design trials having negative results with dichotomous or continuous primary outcomes (n=70), only 16% and 36% had sufficient statistical power (80%) to detect a 25% or 50% relative difference, respectively. These percentages did not consistently increase overtime. Overall, only 32% of the trials with negative results reported sample size calculations, but the percentage doing so has improved over time from 0% in 1975 to 43% in 1990. Only 20 of the 102 reports made any statement related to the clinical significance of the observed differences.

Conclusions.  —Most trials with negative results did not have large enough sample sizes to detect a 25% or a 50% relative difference. This result has not changed over time. Few trials discussed whether the observed differences were clinically important. There are important reasons to change this practice. The reporting of statistical power and sample size also needs to be improved.(JAMA. 1994;272:122-124)