[Skip to Content]
Access to paid content on this site is currently suspended due to excessive activity being detected from your IP address 18.206.48.142. Please contact the publisher to request reinstatement.
[Skip to Content Landing]
Viewpoint
April 4, 2019

The Importance of Predefined Rules and Prespecified Statistical Analyses: Do Not Abandon Significance

Author Affiliations
  • 1Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, California
  • 2Meta-Research Innovation Center–Berlin (METRIC-B), Berlin, Germany
JAMA. 2019;321(21):2067-2068. doi:10.1001/jama.2019.4582

For decades, statisticians and clinicians have debated the meaning of statistical and clinical significance. In general, most journals remain married to the frequentist approach to statistical testing and using the term statistical significance. A recent proposal to ban statistical significance gained campaign-level momentum in a commentary with 854 recruited signatories.1 The petition proposes retaining P values but abandoning dichotomous statements (significant/nonsignificant), suggests discussing “compatible” effect sizes, denounces “proofs of the null,” and points out that “crucial effects” are dismissed on discovery or refuted on replication because of nonsignificance. The proposal also indicates that “we should never conclude there is ‘no difference’ or ‘no association’ just because a P value is larger than a threshold such as 0.05 or, equivalently, because a confidence interval includes zero,”1 and that categorization based on other statistical measures (eg, Bayes factors) should be discouraged. Other recent articles have also addressed similar topics, with an entire supplemental issue of a statistics journal devoted to issues related to P values.2

Limit 200 characters
Limit 25 characters
Conflicts of Interest Disclosure

Identify all potential conflicts of interest that might be relevant to your comment.

Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.

Err on the side of full disclosure.

If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.

Not all submitted comments are published. Please see our commenting policy for details.

Limit 140 characters
Limit 3600 characters or approximately 600 words
    9 Comments for this article
    EXPAND ALL
    Abandoning statistical significance is both sensible and practical
    Valentin Amrhein, PhD | University of Basel
    Authors of this Comment: Valentin Amrhein, Andrew Gelman, Sander Greenland, Blakeley B. McShane

    Dr Ioannidis writes against our proposals to abandon statistical significance in scientific reasoning and publication, as endorsed in the editorial of a recent special issue of an American Statistical Association journal devoted to moving to a “post p<0.05 world.” We appreciate that he echoes our calls for “embracing uncertainty, avoiding hyped claims…and recognizing ‘statistical significance’ is often poorly understood.” We also welcome his agreement that the “interpretation of any result is far more complicated than just significance testing” and that “clinical, monetary, and other considerations may
    often have more importance than statistical findings.”

    Nonetheless, we disagree that a statistical significance-based “filtering process is useful to avoid drowning in noise” in science and instead view such filtering as harmful. First, the implicit rule to not publish nonsignificant results biases the literature with overestimated effect sizes and encourages “hacking” to get significance. Second, nonsignificant results are often wrongly treated as zero. Third, significant results are often wrongly treated as truth rather than as the noisy estimates they are, thereby creating unrealistic expectations of replicability. Fourth, filtering on statistical significance provides no guarantee against noise. Instead, it amplifies noise because the quantity on which the filtering is based (the p-value) is itself extremely noisy and is made more so by dichotomizing it.

    We also disagree that abandoning statistical significance will reduce science to “a state of statistical anarchy.” Indeed, the journal Epidemiology banned statistical significance in 1990 and is today recognized as a leader in the field.

    Valid synthesis requires accounting for all relevant evidence—not just the subset that attained statistical significance. Thus, researchers should report more, not less, providing estimates and uncertainty statements for all quantities, justifying any exceptions, and considering ways the results are wrong. Publication criteria should be based on evaluating study design, data quality, and scientific content—not statistical significance.

    Decisions are seldom necessary in scientific reporting. However, when they are required (as in clinical practice), they should be made based on the costs, benefits, and likelihoods of all possible outcomes, not via arbitrary cutoffs applied to statistical summaries such as p-values which capture little of this picture.

    The replication crisis in science is not the product of the publication of unreliable findings. The publication of unreliable findings is unavoidable: as the saying goes, if we knew what we were doing, it would not be called research. Rather, the replication crisis has arisen because unreliable findings are presented as reliable.

    Reference

    Amrhein V, Gelman A, Greenland S, McShane BB. 2019. Abandoning statistical significance is both sensible and practical. PeerJ Preprints 7:e27657v1. https://doi.org/10.7287/peerj.preprints.27657v1
    CONFLICT OF INTEREST: None Reported
    READ MORE
    Reply To Amrhein et al
    John P.A. Ioannidis, MD, DSc | Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA
    By “filtering” I do not mean deciding on publication based on whether “statistical significance” is reached. All results should be published, regardless of whether they are “statistically significant” or not. Hopefully more journals will follow JAMA [1] in not using “statistical significance” as an editorial criterion for publication of important studies. Pre-specified analyses help reduce selective and distorted reporting.

    “Filtering” means publishing everything but not wasting excessive resources and not urging action on associations that are probably either null or too small to be consequential. There are differences across fields, but in most fields of modern research such false-positives
    are the major problem. With massive association testing, some filtering is essential to select the most informative signals [2]. E.g. in genome studies, of tens of millions of tested associations, >99.9% are null and even the non-null ones have on average extremely tiny effects.

    The proposal of Amrhein et al. to retire the term “statistical significance,” keep p-values, and avoid all dichotomous statements may be feasible in select niches but is entirely impractical for science-wide application. The fact that the highly-respectable journal Epidemiology adopted this approach in 1990 but has had a dearth of other journals following it after 30 years proves the severe impracticality. Moreover, by setting this policy, Epidemiology narrowly attracted specific types of traditional epidemiological studies, but practically lost contact with modern –omics epidemiology and abandoned out of its remit most clinical epidemiology and randomized trials. From 1/1/2014 until now, PubMed retrieves 39,725 published articles with the term “genome-wide”. Of those, only 3 were published by Epidemiology, and only 2 of them are primary analyses [3,4]. Interestingly, both these papers still use p-value dichotomization (“A p value <0.05 was used as statistical cut-off” [3], ”nine of 16 SNPs had evidence (p<0.05) of heterogeneity“ [4]). From 1/1/2014 until now, PubMed retrieves 110,956 published articles with type “randomized controlled trial” and only 7 were published by Epidemiology (mostly secondary analyses).

    I disagree with the pessimistic statement that “the publication of unreliable findings is just unavoidable”. Reliability can be improved with better methods and (ideally pre-emptive) protection from bias. The laissez-faire definition of research by Amrhein et al. as something where we don’t know what we are doing is barely tolerable only for a fraction of exploratory research. Much exploratory research is hopefully more disciplined nowadays, while research that entails validation, confirmation, implementation or policy change cannot just be haphazard. Rules for the game do not necessarily have to use null-hypothesis significance-testing. Other approaches, e.g. Bayesian or false-discovery rates, might be preferable in most applications, but their use requires proper training, and this is still deficient. Simply removing “statistical significance” while keeping the widely misunderstood p-values leaves 25 million statistically-undertrained publishing scientists with vague advice to just describe their data and fabricate narratives ad libitum. Instead of proposing old, extreme, failed solutions that are evidently impractical, we should commit more seriously to train scientists on how to use and understand fit-for purpose, rigorous statistical methods.

    References

    1. Olson CM, Rennie D, Cook D, Dickersin K, Flanagin A, Hogan JW, et al. Publication bias in editorial decision making. JAMA. 2002;287(21):2825-8.

    2. Manrai AK, Ioannidis JPA, Patel CJ. Signals among signals: prioritizing non-genetic associations in massive datasets. Am J Epidemiol. 2019 Mar 16. pii: kwz031. doi: 10.1093/aje/kwz031.

     3. Mostafavi N, Vlaanderen J, Portengen L, et al. Associations between genome-wide gene expression and ambient nitrogen oxides. Epidemiology. 2017;28(3):320-328.

     4. Seyerle AA, Young AM, Jeff JM, et al. Evidence of heterogeneity by race/ethnicity in genetic determinants of QT interval. Epidemiology. 2014;25(6):790-8.
    CONFLICT OF INTEREST: None Reported
    READ MORE
    Disallowing "Statistically Significant" is Both Sensible and Practical
    Stuart Hurlbert, Ph.D., Zoology | Department of Biology, San Diego State University
    Once again we are burdened by the propensity of some statisticians to try to solve or at least clearly define simultaneously every controversial aspect of statistical analysis at once. To its credit, the 2016 ASA statement on P values focused on a very narrow range of issues and clarified them – even if they were matters that any good teacher of statistics fully understood half a century ago.

    With a similar tight focus, in our contribution to The American Statistician special issue (1), we proposed only “that in research articles all use of the phrase “statistically significant” and
    closely related terms (“nonsignificant,” “significant at p = 0.xxx,” “marginally significant,” etc.) be disallowed on the solid grounds long existing in the literature. Just present the p-values without labeling or categorizing them. …. For a journal an additional instruction to authors could read something like the following: "There is now wide agreement among many statisticians who have studied the issue that for reporting of statistical tests yielding p-values it is illogical and inappropriate to dichotomize the p-scale and describe results as "significant" and "nonsignificant." Authors are strongly discouraged from continuing this never justified practice that originated from confusions in the early history of modern statistics.”

    Amrhein, Greenland and McShane were among the 48 statisticians and other scientists who endorsed that statement pre-publication, as listed in its Appendix A.

    The three editors of The American Statistician declined to do so pre-publication but did so post-publication. In their introductory editorial to the issue they state, “We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term “statistically significant” entirely.” (2) 

    Ioannidis raises many important issues, but he does not put forward any cogent argument against the recommendation to disallow use of the phrase “statistically significant” in scientific writing. Nor have others before him.

    Though I am happy to be one of the 854 endorsers of the Amrhein-Greenland-McShane commentary in Nature, in retrospect the use of “statistical significance” in their title may be one source of misunderstanding. That phrase, depending on context, sometimes refers to the mere conduct of statistical tests, sometimes to a pre-specified alpha or critical p-value, and sometimes to an actual calculated p-value. The intended meaning will not always be clear.

    Given that Amrhein, Greenland, and McShane cited “Coup de grace”’s recommendation in their commentary, it seems likely that a large percentage of their 854 endorsers support quite specifically the simple measure of disallowing “statistically significant” in scientific writing,

    Deep background on the issue can be found in Hurlbut and Lombardi's 38-page review article (3). 

    REFERENCES

    1. “Coup de Grâce for a Tough Old Bull: “Statistically Significant” Expires” https://doi.org/10.1080/00031305.2018.1543616

    2. Wasserstein, R., Lazar, N., and Schirm, A. (2019), “Editorial: Moving to a world beyond p<0.05”, The American Statistician, 73(S1): 1-19. https://tandfonline.com/doi/full/10.1080/00031305.2019.1583913

    3. Stuart H. Hurlbert and Celia M. Lombardi. “Final collapse of the Neyman-Pearson decision-theoretic framework and rise of the neoFisherian”, Annales Zoologici Fennici, 46, 311-349, 2009.
    CONFLICT OF INTEREST: None Reported
    READ MORE
    What's so significant about (statistical) significance?
    Martin Mayer, DMSc, MS, PA-C | Innovations and Evidence-Based Medicine Development, EBSCO Health; East Carolina Heart Institute, General Medicine Service, Vidant Medical Center
    Unsurprisingly, Ioannidis has meritorious points (e.g., detailed, transparent preregistration and carefully considering the best fit-for-purpose approach for any given analysis). [1] However, he  suggests [1,2] retiring statistical significance [3] might lead to scientific anarchy, where “[i]rrefutable nonsense would rule.” [2] He also says “Dichotomous decisions are the rule in medicine and public health … [An intervention] will either be licensed or not and … used or not.” [1] These sentiments seem to distract from the proposal’s [3] central message rather than build on it. The dichotomous decisions Ioannidis mentions seem like an admixture of a straw man and red herring; while certainly real, they must be based on much more than the arbitrary threshold for statistical significance, lest they be unacceptably reductionist. Ioannidis acknowledges this: “[i]nterpretation of any result is far more complicated than just significance testing”. [1] Additionally, myopic reverence for and poor understanding of “statistical significance” has led to poor research practices, retaining “statistical significance” hardly seems a meaningful “gatekeeper”, [1] and surely, “statistical significance” is not necessary for good science. Suggesting “[i]rrefutable nonsense would rule” seems hyperbolic and implies “statistical significance” helps us sort out “nonsense” from “sense”. However, as Wasserstein and colleagues note, “Regardless of whether it was ever useful, a declaration of “statistical significance” has today become meaningless [and] using bright-line rules for justifying scientific claims or conclusions can lead to erroneous beliefs and poor decision making.”[4] These warnings are not new. Consider, for instance, Altman’s 1991 warning (many others exist, including decades-earlier warnings):

    "It is ridiculous to interpret the results of a study differently according to whether the P value obtained was, say, 0.055 or 0.045. [Additionally, instead of P values,] confidence intervals … are greatly preferred. The use of a cut-off for P leads to treating the analysis as a process for making a decision. Within this framework, it is customary (but unwise) to consider that a statistically significant effect is a real one, and conversely that a non-significant result indicates that there is no effect. Forcing a choice between significant and non-significant obscures the uncertainty present whenever we draw inferences from a sample. [5]

    Although discourse on this matter should continue (and it is [4]), what we need much more than “statistical significance” is statistically-minded researchers, clinicians, journal editors, and the like who are transparent in their practices, meticulous in their analyses, judicious in their inferences, and comfortable with uncertainty and the limitations of “knowing”.

    References

    1. Ioannidis JPA. The Importance of Predefined Rules and Prespecified Statistical Analyses: Do Not Abandon Significance. JAMA; 4 April 2019. [Epub ahead of print] doi:10.1001/jama.2019.4582
    2. Ioannidis JPA. Retiring statistical significance would give bias a free pass. Nature 2019;567:461. doi:10.1038/d41586-019-00969-2
    3. Amrhein V, Greenland S, McShane B. Scientists rise up against statistical significance. Nature 2019;567:305–7. doi:10.1038/d41586-019-00857-9
    4. Wasserstein RL, Schirm AL, Lazar NA. Moving to a World Beyond “p < 0.05”. The American Statistician 2019;73:1–19. doi:10.1080/00031305.2019.1583913
    5. Altman DG. Practical statistics for medical research. 1st ed. Boca Raton; London; New York; Washington, D.C.: Chapman & Hall/CRC 1991.
    CONFLICT OF INTEREST: None Reported
    READ MORE
    Reply to Ioannidis
    Valentin Amrhein, PhD | University of Basel
    Authors of this reply: Valentin Amrhein, Andrew Gelman, Sander Greenland, Blakeley B. McShane

    We are delighted to hear Ioannidis believes "all results should be published"—not just those that attain statistical significance—and we agree with much of what he writes in his reply. For example, we agree that in certain applications like genome-wide surveys with huge data sets simple screening can sometimes be useful. However, in such applications, screening based on statistical significance will lead to a "large upward bias in estimation of locus-specific effects from genomewide scans" [1].

    In applications requiring direct decision making, screening or otherwise, we
    would prefer a decision-analytic approach rather than P-value (or Bayes factor) thresholds. Some may say that decision analysis is great in theory but requires too much effort in practice as compared to making decisions based on "statistical significance." This excuse seems to us analogous to skipping a biopsy because it is too much effort before deciding whether to apply radiation therapy for a tumor discovered by palpation. We would rather see decisions made using explicit (even if imperfect) quantification of costs, benefits, and probabilities.

    Although we continue to concur with the maxim that "the publication of unreliable findings is unavoidable," we disagree with Ioannidis that this implies we are advocating a "laissez-faire definition of research." Instead, we call for removing the old, extreme, and failed solution of drawing laissez-faire conclusions from single studies based on statistical significance or other simplistic and noisy filters. Such filters may once have seemed useful because they were easily applied, but they have led to overconfident conclusions and the replication crisis.

    We think research reports would be a lot more trustworthy if the narrative that researchers used for their studies was "to just describe their data" (in the context of a clearly-stated model) rather than to make decisions on the "most informative signals" based on noisy and widely misunderstood statistical filters. Once more, the most reliable decision-making will be based on the totality of available evidence and a weighing of costs and benefits of each decision, not just a P-value or other statistic from the latest study, or a meta-analysis based on such summaries that have already ignored so much crucial information.

    References

    [1] Göring HHH, Terwilliger JD, Blangero J. 2001. Large upward bias in estimation of locus-specific effects from genomewide scans. American Journal of Human Genetics 69: 1357-1369. https://doi.org/10.1086/324471
    CONFLICT OF INTEREST: None Reported
    READ MORE
    The Other Side of Statistical Significance
    Milind Watve, M. Sc., Ph. D. | Deenanath Mangeshkar Hospital and Research Centre
    So far only one side of the problem seems to be highlighted and most seem to have ignored the other possible and extremely common way of misinterpreting significant results. In the field of medicine, researchers are in a haste to suggest preventive or remedial measures and they are often carried away by statistical significance. For example, obesity is consistently and significantly associated with insulin resistance across different studies with different ethnic, cultural, nutritional, behavioural backgrounds. A widespread belief therefore is that controlling obesity is sufficient to prevent type 2 diabetes mellitus (T2DM). However, across studies the variance in insulin resistance explained by obesity is very small, the mode being less than 10% and median around 15% (Vidwans and Watve J. Insul Resist 27; 1(1)). Therefore whether controlling obesity alone would be sufficient to prevent T2DM is questionable, but since most people are carried away by the high level of statistical significance, nobody looks at the variance explained and wonders whether and to what extent the association is biologically significant.

    This applies to several other associations. The associations of specific food components to obesity, or microbiota to behaviour and health, GWAS studies of obesity or diabetes are all marked by high level of significance but very low variance explained. But researchers are often quick to suggest translational measures based on high levels of statistical significance alone. Unlike statistical significance, biological significance is contextual and purpose-driven, and therefore unlikely to be captured in a single number. Nevertheless we need to talk more about biological significance over statistical significance in the field of translational medicine.

    Harshada Vidwans, Anagha Pund, Milind Watve
    CONFLICT OF INTEREST: None Reported
    READ MORE
    Additional Reply to Amrhein et al and Watve
    John P.A. Ioannidis, MD, DSc | Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA
    I agree on many points with Amrhein et al. as clarified by their latest response. However, I concur with Watve that the misinterpretation of significant results as being necessarily truly non-null and actionable is a far more serious/common problem than the misinterpretation of nonsignificant results as being necessarily null and non-actionable. This is a substantial deviation from the inverse rationale of Amrhein et al.

    More importantly, I continue to disagree with Amrhein et al. and Hulbert that problems can be fixed just with a linguistic ban. Banning the use of a word will not suddenly make any
    researcher better trained in statistics overnight. Indeed, we should avoid strong statements and major decisions based on single, error-prone studies, but the notion that authors can and should “just describe their data” is an open invitation to subjectivity. Amrhein et al. try to keep decisions at arm’s length from statistical analysis. They insist on avoiding all dichotomization yardsticks, but researchers will use some good or bad, overt or covert yardsticks anyhow in trying to “just describe their data”. In the absence of any formal rules, these yardsticks are likely to be those that satisfy their prior beliefs and biases in the ensuing narratives.

    Who will be the wise stakeholder(s) to run the sophisticated decision analyses that Amrhein et al. envision on every topic touched by scientific query? With over 5 million papers published per year, and with some of them currently including millions of analyses, some simple, if crude, screening is often necessary. In many cases (e.g. with agnostic testing of massive data), issues of “cost, benefits, and probabilities” are not even relevant. In other cases, the additional components that are introduced in the decision-making carry far more subjectivity, error, and propensity for misinterpretation than the statistical analysis itself.

    Statistical significance at p<0.05 is like a poorly equipped peace-keeping battalion in a land torn by war. Many atrocities are committed daily despite its presence. The solution is not to remove entirely the peace-keeping battalion, however – let alone remove it and leave behind its p-value weapons for free use by the belligerents. The notion that a simple linguistic ban can salvage applied statistics may even undermine the effort to inform the scientific community at large that training in statistics and data science requires major commitment. This serious commitment cannot be replaced by any over-simplification like “statistical significance” or its linguistic ban.
    CONFLICT OF INTEREST: None Reported
    READ MORE
    Additional reply to Ioannidis
    Valentin Amrhein, PhD | University of Basel
    Authors of this reply: Valentin Amrhein, Andrew Gelman, Sander Greenland, Blakeley B. McShane

    We appreciate Dr Ioannidis’ acknowledgement of increased agreement, and we hope, with the further clarification offered here, the convergence will continue.

    Dr Ioannidis misattributes to us the view that the error of taking statistically nonsignificant results as zero is worse than the error of taking statistically significant results as true. We have repeatedly argued against both errors and do not take a general position on which is worse, because which is worse surely varies by context. We hope this resolves that perceived disagreement.

    He
    also misattributes to us the view that “a simple linguistic ban can salvage applied statistics.” Instead, we have called for a stop to the use of statistical significance as determined by thresholded p-values (or other statistical measures) “to decide whether a result refutes or supports a scientific hypothesis [1].” We believe this is an important first step against the many statistical and cognitive errors that arise from dichotomization, but we would never deny that “training in statistics and data science requires major commitment.” We hope this resolves that additional perceived disagreement.

    We believe the subjectivity Dr Ioannidis fears in research is a reality regardless of whether we abandon statistical significance or not. All aspects of study design and presentation are rife with subjective choices; even p-values themselves are subjective in the sense that they are affected by the many necessarily subjective choices involved in statistical modeling.

    We disagree that “statistical significance at p<0.05 is like a poorly equipped peace-keeping battalion in a land torn by war. Many atrocities are committed daily despite its presence.” Instead, we view statistical significance as a primary driver of these atrocities.

    The need to filter by p<0.05 is perhaps the only persistent disagreement we seem to have with Dr Ioannidis, but we cannot discern in what contexts he finds this filter necessary. Above, he has written that (i) “all results should be published, regardless of whether they are ‘statistically significant’ or not” and (ii) “some simple, if crude, screening is often necessary. In many cases …‘cost, benefits, and probabilities’ are not even relevant.” Given (i), he must think filtering by p<0.05 unnecessary for publication. Given (ii), he must also think it unnecessary for decision-making because “costs, benefits, and probabilities” are clearly relevant for decision-making. We thus appear to agree on publication and decision-making but are then left at a loss for any remaining contexts where filtering by p<0.05 would be necessary (or even useful or desirable).

    Finally, we agree that one “should avoid strong statements and major decisions based on single, error-prone studies” which is one reason why Dr Ioannidis is correct in attributing to us in his very next sentence the view that we should “keep decisions at arm’s length from statistical analysis.” We view his calling for the former while chiding us for the latter a contradiction.

    [1] Amrhein V, Greenland S, McShane B. 2019. Retire statistical significance. Nature 567, 305-307. https://doi.org/10.1038/d41586-019-00857-9
    CONFLICT OF INTEREST: None Reported
    READ MORE
    The Problem is Overestimation of Statistical Significance, Not Use of Significance Level
    Frederico Sousa, PhD | Histology, Department of Morphology, Federal University of Paraiba (Brazil)
    Dr Ioannidis does not make himself clear when he writes defending his article’s title (“…Do not abandon significance”). It is unclear the subject of the “war” he writes about. In the second paragraph of the first page he mentions “the so-called war on significance”, and in the first paragraph of the second page he writes that “A low barrier such as P< .05 is typically too easy to pass” and that “The proposal to entirely remove the barrier...”. On the second page, he distinguishes “statistical significance” from “P<.05” (this later referring to significance level), but further ahead he uses “P=.09” to refer to statistical significance, creating a confusion on the distinction between significance level and statistical significance. Such a confusion is particularly damaging for those new in the debate, mainly those who are accustomed with the flawed practice of relying their conclusions on statistical significance.

    A clear definition of statistical barriers could contribute a lot to the debate. Given a test of a hypothesis of difference between two groups (A and B, with group B presenting higher values than group A) with continuous (normally distributed) data, each one represented by a bell curve, the area of intersection between them is the main concern for statistical analysis. The first barriers to be selected are two reference lines in one of the bell curves (group A), represented by statistical scores (Z or T scores, for instance) equidistant from the group A mean. The distances from those lines to the nearest tails of group A yield the probability of the significance level, while the distance between them is the confidence level (used for confidence interval). The statistical significance (also known as p value) is the probability located between the mean of group B and the nearest tail of group A. The type II error is the probability in group B located between the significance level and the tail of group B that approaches group A. The power is the probability that remains in group B excluding type II error. The effect size can be represented by the distance between the means of the two groups, and its confidence interval is represented by the area in group A within the confidence level. For any given effect size (distance between groups’ means), type II error and the confidence interval of the effect size can be reduced by increasing sample size.
    It is widely known in statistics that the significance level is required to compute type II error, power, and the confidence interval. The point raised by Amrhein et al. [1] is that statistical significance has been overestimated in research papers, and in fact the statistical significance has a relatively low importance compared to the other parameters (effect size and its confidence interval, type II error, and power). It is important to highlight that even if one chooses not to report statistical significance, he must use barriers (the significance and confidence level) to compute power and confidence interval. The proper definition of statistical barriers is of paramount importance in this current debate on the use of p values. Ill definition of statistical significance and other statistical parameters might well be the reason of disagreement that some scientists (including Dr Ioannidis) express regarding the proposition to “retire statistical significance” [1].

    REFERENCES
    [1] Amrhein V, Greenland S, McShane B. 2019. Retire statistical significance. Nature 567, 305-307. https://doi.org/10.1038/d41586-019-00857-9.
    CONFLICT OF INTEREST: None Reported
    READ MORE
    ×