[Skip to Content]
[Skip to Content Landing]
Views 998
Citations 0
Original Investigation
February 13, 2017

Evaluation of Evidence of Statistical Support and Corroboration of Subgroup Claims in Randomized Clinical Trials

Author Affiliations
  • 1Department of Health Research and Policy, Stanford University School of Medicine, Stanford, California
  • 2Meta-Research Innovation Center at Stanford (METRICS), Stanford University School of Medicine, Stanford, California
  • 3Department of Medicine, Stanford University School of Medicine, Stanford, California
  • 4Stanford Prevention Research Center, Department of Medicine, Stanford University School of Medicine, Stanford, California
  • 5Department of Public Health, Erasmus MC, Rotterdam, the Netherlands
  • 6Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, California
JAMA Intern Med. Published online February 13, 2017. doi:10.1001/jamainternmed.2016.9125
Key Points

Question  How often are subgroup claims reported in the abstracts of randomized clinical trials supported by a statistically significant interaction test result and corroborated by subsequent randomized clinical trials and meta-analyses?

Findings  In this meta-epidemiological survey, a minority of subgroup claims (46 of 117) in the abstract of randomized clinical trials were supported by their own data. Only 5 of these 46 subgroup findings had at least 1 subsequent corroboration attempt, and none of the corroboration attempts had a statistically significant P value from an interaction test.

Meaning  Claims of subgroup differences in randomized clinical trials are typically spurious or chance findings.

Abstract

Importance  Many published randomized clinical trials (RCTs) make claims for subgroup differences.

Objective  To evaluate how often subgroup claims reported in the abstracts of RCTs are actually supported by statistical evidence (P < .05 from an interaction test) and corroborated by subsequent RCTs and meta-analyses.

Data Sources  This meta-epidemiological survey examines data sets of trials with at least 1 subgroup claim, including Subgroup Analysis of Trials Is Rarely Easy (SATIRE) articles and Discontinuation of Randomized Trials (DISCO) articles. We used Scopus (updated July 2016) to search for English-language articles citing each of the eligible index articles with at least 1 subgroup finding in the abstract.

Study Selection  Articles with a subgroup claim in the abstract with or without evidence of statistical heterogeneity (P < .05 from an interaction test) in the text and articles attempting to corroborate the subgroup findings.

Data Extraction and Synthesis  Study characteristics of trials with at least 1 subgroup claim in the abstract were recorded. Two reviewers extracted the data necessary to calculate subgroup-level effect sizes, standard errors, and the P values for interaction. For individual RCTs and meta-analyses that attempted to corroborate the subgroup findings from the index articles, trial characteristics were extracted. Cochran Q test was used to reevaluate heterogeneity with the data from all available trials.

Main Outcomes and Measures  The number of subgroup claims in the abstracts of RCTs, the number of subgroup claims in the abstracts of RCTs with statistical support (subgroup findings), and the number of subgroup findings corroborated by subsequent RCTs and meta-analyses.

Results  Sixty-four eligible RCTs made a total of 117 subgroup claims in their abstracts. Of these 117 claims, only 46 (39.3%) in 33 articles had evidence of statistically significant heterogeneity from a test for interaction. In addition, out of these 46 subgroup findings, only 16 (34.8%) ensured balance between randomization groups within the subgroups (eg, through stratified randomization), 13 (28.3%) entailed a prespecified subgroup analysis, and 1 (2.2%) was adjusted for multiple testing. Only 5 (10.9%) of the 46 subgroup findings had at least 1 subsequent pure corroboration attempt by a meta-analysis or an RCT. In all 5 cases, the corroboration attempts found no evidence of a statistically significant subgroup effect. In addition, all effect sizes from meta-analyses were attenuated toward the null.

Conclusions and Relevance  A minority of subgroup claims made in the abstracts of RCTs are supported by their own data (ie, a significant interaction effect). For those that have statistical support (P < .05 from an interaction test), most fail to meet other best practices for subgroup tests, including prespecification, stratified randomization, and adjustment for multiple testing. Attempts to corroborate statistically significant subgroup differences are rare; when done, the initially observed subgroup differences are not reproduced.

×