[Skip to Content]
Sign In
Individual Sign In
Create an Account
Institutional Sign In
OpenAthens Shibboleth
Purchase Options:
[Skip to Content Landing]
Table 1.  Characteristics of Included Clinical Trials or End Points
Characteristics of Included Clinical Trials or End Points
Table 2.  Analysis of Trial Characteristics and Reporting a P Value Less Than .005a
Analysis of Trial Characteristics and Reporting a P Value Less Than .005a
1.
Ioannidis  JPA.  The proposal to lower P value thresholds to .005.  JAMA. 2018;319(14):1429-1430. doi:10.1001/jama.2018.1536PubMedGoogle ScholarCrossref
2.
Benjamin  DJ, Berger  JO, Johannesson  M,  et al.  Redefine statistical significance.  Nat Hum Behav. 2017;2(1):6-10. doi:10.1038/s41562-017-0189-zGoogle ScholarCrossref
3.
Ioannidis  JPA.  Why most published research findings are false.  PLoS Med. 2005;2(8):e124. doi:10.1371/journal.pmed.0020124PubMedGoogle ScholarCrossref
4.
Head  ML, Holman  L, Lanfear  R, Kahn  AT, Jennions  MD.  The extent and consequences of P-hacking in science.  PLoS Biol. 2015;13(3):e1002106. doi:10.1371/journal.pbio.1002106PubMedGoogle ScholarCrossref
5.
Halpern  SD, Karlawish  JHT, Berlin  JA.  The continuing unethical conduct of underpowered clinical trials.  JAMA. 2002;288(3):358-362. doi:10.1001/jama.288.3.358PubMedGoogle ScholarCrossref
Research Letter
November 6, 2018

Evaluation of Lowering the P Value Threshold for Statistical Significance From .05 to .005 in Previously Published Randomized Clinical Trials in Major Medical Journals

Author Affiliations
  • 1Oklahoma State University Center for Health Sciences, Tulsa
JAMA. 2018;320(17):1813-1815. doi:10.1001/jama.2018.12288

Lowering the threshold for statistical significance in medical research from a P value of .05 to .005 was recently proposed to reduce misinterpretation of study results.1,2P values less than .05 but greater than .005 would be reclassified as “suggestive.” What effect this proposal would have on the medical literature is unclear. We evaluated primary end points in randomized clinical trials (RCTs) published in 3 major general medical journals with high impact factors to determine how the new threshold could affect the interpretation of previously published RCTs.

Methods

We searched PubMed from January 1, 2017, to December 31, 2017, for phase 3 RCTs published in JAMA, Lancet, and New England Journal of Medicine (NEJM). We excluded single-group trials, pooled analyses, RCTs without P values, and RCTs that used Bayesian or noninferiority analyses. Two authors (C. W., J. S.) screened all trials.

We extracted data for primary end points because RCTs are most often powered for these end points. The following data were extracted from each trial: P values for primary end points (excluding subgroups), study title, journal name, funding source, sample size, type of intervention, whether the end point was mortality, whether the trial was multicentered, and whether the trial was multinational. Data were extracted blinded and in duplicate. Discrepancies were resolved by consensus.

We first determined the proportion of end points that would maintain statistical significance with a threshold of P less than .005 and that would be reclassified as suggestive (ie, P values >.005 but <.05). Second, we investigated trial characteristics associated with reporting at least 1 primary end point with a P value less than .005 using a logistic regression model adjusting for all extracted trial characteristics. We used Google Forms for data extraction and STATA version 13.1 (StataCorp) for the data analysis.

Results

Of 290 articles retrieved, 203 were included. The 87 excluded were mostly phase 1 or 2 trials (n = 26), noninferiority or Bayesian analyses (n = 26), or pooled analyses (n = 11) or did not report P values (n = 10). Characteristics of included RCTs are outlined in Table 1.

We identified 272 primary end points from 203 trials: 174 end points had a P value less than .05 and 98 had a P value greater than .05. Overall, 70.7% (123 of 174) of statistically significant primary end points were less than .005, whereas 29.3% (51 of 174) were between .005 and .05 and would be reclassified as suggestive. Of these 272 total P values, 53.5% (76 of 142) in NEJM, 47.7% (21 of 44) in Lancet, and 30.2% (26 of 86) in JAMA were less than .005.

We next analyzed the 203 trials to determine which trial characteristics were associated with reporting at least 1 P value less than .005. Before adjusting for covariates, industry funding, drug and “other” (eg, nonpharmacological) interventions, and trials published in NEJM and Lancet were associated with primary end points that met the new threshold for significance of P less than .005. Sample size, multicenter trials, multinational trials, and mortality end points were not related to maintaining statistical significance. After adjusting for covariates, only trials with industry funding (n = 86) were more likely to report primary end points that would maintain statistical significance (59 of 86 articles [68.6%] with industry funding vs 38 of 115 [33.0%] without industry funding; adjusted odds ratio, 7.87; 95% CI, 3.14-19.71) (Table 2).

Discussion

Of statistically significant primary end points in RCTs published in 2017 in 3 major general medical journals with high impact factors, 70.7% would maintain their statistical significance with a P value threshold of less than .005. A .005 threshold for significance may address the shortcomings of P values, such as spurious false-positive results,3P-hacking (when researchers analyze data multiple ways until a significant effect is found),4 and underpowered RCTs.5 Furthermore, a .005 threshold may encourage a reliance on effect sizes rather than P values. A comparison between interventional and observational studies is warranted to evaluate the study type most affected by the proposed significance threshold change.

This study included only 3 high impact factor general medical journals over a 1 year period; thus, the results may not be generalizable.

Section Editor: Jody W. Zylke, MD, Deputy Editor.
Back to top
Article Information

Accepted for Publication: July 31, 2018.

Corresponding Author: Cole Wayant, BS, Oklahoma State University Center for Health Sciences, 1111 W 17th St, Tulsa, OK 74107 (cole.wayant@okstate.edu).

Author Contributions: Mr Wayant had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: All authors.

Acquisition, analysis, or interpretation of data: All authors.

Drafting of the manuscript: All authors.

Critical revision of the manuscript for important intellectual content: All authors.

Statistical analysis: All authors.

Administrative, technical, or material support: Wayant.

Supervision: Wayant, Vassar.

Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest and none were reported.

Additional Contributions: We thank Denna Wheeler, PhD, from the Oklahoma State University Center for Health Sciences, who was not financially compensated, for assistance with statistical analysis.

References
1.
Ioannidis  JPA.  The proposal to lower P value thresholds to .005.  JAMA. 2018;319(14):1429-1430. doi:10.1001/jama.2018.1536PubMedGoogle ScholarCrossref
2.
Benjamin  DJ, Berger  JO, Johannesson  M,  et al.  Redefine statistical significance.  Nat Hum Behav. 2017;2(1):6-10. doi:10.1038/s41562-017-0189-zGoogle ScholarCrossref
3.
Ioannidis  JPA.  Why most published research findings are false.  PLoS Med. 2005;2(8):e124. doi:10.1371/journal.pmed.0020124PubMedGoogle ScholarCrossref
4.
Head  ML, Holman  L, Lanfear  R, Kahn  AT, Jennions  MD.  The extent and consequences of P-hacking in science.  PLoS Biol. 2015;13(3):e1002106. doi:10.1371/journal.pbio.1002106PubMedGoogle ScholarCrossref
5.
Halpern  SD, Karlawish  JHT, Berlin  JA.  The continuing unethical conduct of underpowered clinical trials.  JAMA. 2002;288(3):358-362. doi:10.1001/jama.288.3.358PubMedGoogle ScholarCrossref
×