RCT indicates randomized clinical trial.
HR indicates hazard ratio; ITT, intention to treat; NA, not applicable; PPS, per protocol set.
The x-axis shows the point estimates and 95% CIs of the prevalence of spin.
aSpin prevalence for the abstract only.
eMethods. PubMed Detailed Search Strategy
eTable 1. Examples of Spin in the Abstract Conclusion Section
eTable 2. Factors Associated With Level of Spin
Customize your JAMA Network experience by selecting one or more topics from the list below.
Ito C, Hashimoto A, Uemura K, Oba K. Misleading Reporting (Spin) in Noninferiority Randomized Clinical Trials in Oncology With Statistically Not Significant Results: A Systematic Review. JAMA Netw Open. 2021;4(12):e2135765. doi:10.1001/jamanetworkopen.2021.35765
Is the interpretation and reporting of noninferiority trials with primary end point results that are not statistically significant correct, and what are the associated factors of misleading reporting?
This systematic review of 52 noninferiority randomized clinical trials of cancer treatments with results for primary end points that are not statistically significant, 75% included misleading reporting. Multivariable analysis found that the prevalence of misleading reporting was significantly lower in reports with funding from for-profit sources and higher in reports of novel experimental treatments.
These findings suggest that authors should carefully consider noninferiority cancer clinical trial result interpretation and reporting, especially for primary outcome results that are not statistically significant.
Spin, the inaccurate reporting of randomized clinical trials (RCTs) with results that are not statistically significant for the primary end point, distorts interpretation of results and leads to misinterpretation. However, the prevalence of spin and related factors in noninferiority cancer RCTs remains unclear.
To examine misleading reporting, or spin, and the associated factors in noninferiority cancer RCTs through a systematic review.
A systematic search of the PubMed database was performed for articles published between January 1, 2010, and December 31, 2019, using the Cochrane Highly Sensitive Search Strategy.
Two investigators independently selected studies using the inclusion criteria of noninferiority parallel-group RCTs aiming to confirm effects to cancer treatments published between January 1, 2010, and December 31, 2019, reporting results that were not statistically significant for the primary end points.
Data Extraction and Synthesis
Standardized data abstraction was used to extract information concerning the trial characteristics and spin based on a prespecified definition. The main investigator extracted the trial characteristics while both readers independently evaluated the spin. The Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) reporting guideline was followed.
Main Outcomes and Measures
The main outcome was spin prevalence in any section of the report. Spin was defined as use of specific reporting strategies, from whatever motive, to highlight that the experimental treatment is beneficial, despite no statistically significant difference for the primary outcome, or to distract the reader from results that are not statistically significant. The associations (prevalence difference and odds ratios [ORs]) between spin and trial characteristics were also evaluated.
The analysis included 52 of 2752 reports identified in the PubMed search. Spin was identified in 39 reports (75.0%; 95% CI, 61.6%-84.9%), including the abstract (34 reports [65.4%; 95% CI, 51.1%-76.9%]) and the main text (38 reports [73.1%; 95% CI, 59.7%-83.3%]). Univariate analysis found that the spin prevalence was higher in reports with data managers (prevalence difference, 27%; 95% CI, 1.1%-50.3%), reports without funding from for-profit sources (prevalence difference, 31.2%; 95% CI, 4.8%-53.8%), and reports of novel experimental treatments (prevalence difference, 37.5%; 95% CI, 5.8%-64.7%). Multivariable analysis found that novel experimental treatment (OR, 4.64; 95% CI, 0.98-22.02) and funding only from nonprofit sources only (OR, 5.20; 95% CI, 1.21-22.29) were associated with spin.
Conclusions and Relevance
In this systematic review, most noninferiority RCTs reporting results that were not statistically significant for the primary end points showed distorted interpretation and inaccurate reporting. The novelty of an experimental treatment and funding only from nonprofit sources were associated with spin.
Randomized clinical trials (RCTs) are the criterion standard in research for hypothesis-based treatment efficacy and safety evaluation.1,2 RCTs must be performed according to predefined study protocols and statistical analysis plans.3 RCT results are typically interpreted based on the statistical significance of the primary end point analysis results. Trials with statistically significant results for the primary end point are known as positive trials, while those with results that are not significant results are negative trials. Both results are equally important for scientific progress if quality RCTs are planned, conducted, analyzed, and reported.4
However, problems can arise when reports mislead readers by distorting result interpretation and suggesting that positive results have been obtained, even if statistically significant differences have not been determined for the primary end point. This problem of misleading reporting is called spin.5-10 Boutron et al6 observed that RCT reports with spin were more likely to emphasize the benefit of the experimental treatment by focusing on statistically significant results, including those of secondary end points and subgroup analyses, which should be interpreted as exploratory results. Systematic reviews have previously classified spin in the field of oncology.7-11 Studies on the impact of spin have concluded that it might lead readers to overestimate result positivity.12-14
Spin in RCTs have mainly been discussed in superiority trials, with few reports assessing spin in noninferiority trials.15 Compared with superiority RCTs, noninferiority RCTs may have more factors that complicate interpretation, including noninferiority margin, assay sensitivity, and choice of analysis population.16 However, no systematic review has investigated spin prevalence in noninferiority RCTs in oncology. Given the complexities of interpretation inherent to noninferiority RCTs, we consider it is important to clarify how much spin exists and the factors associated with the spin when it is present.
Therefore, we performed a systematic review of the spin prevalence in negative noninferiority cancer RCTs published since 2010, when the US Food and Drug Administration guidelines for noninferiority trials were published.16 This systematic review of noninferiority RCTs investigated spin prevalence and its associated factors in oncology.
This systematic review follows the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) reporting guideline. Data were analyzed from March 22 to September 15, 2021.
The selection criteria were reports on an RCT, examining anticancer treatment, using a noninferiority design, and published between January 1, 2010, and December 31, 2019. Using PubMed, we created a search strategy reflecting the inclusion criteria and systematically collected reports. This search strategy is described in the eMethods in the Supplement. The Cochrane Highly Sensitive Search Strategy17 was used to identify the RCTs. After collecting reports identified by the search strategy, we read the abstract and main text of each report, and excluded reports based on our exclusion criteria, including reports other than direct anticancer efficacy assessments (eg, quality of life, cost-effectiveness, diagnosis method, and primary end point could not be identified); exploratory studies (eg, phase 1, phase 2) or pilot studies; superiority or equivalent studies; other study designs (eg, crossover study, multigroup or single-group study, and factorial design); studies that terminated early at an interim analysis; and reports not written in English. We defined negative trials as those that did not show that the effect of the experimental treatment, compared with the control treatment, was better than a prespecified noninferiority margin and included them in analysis (Figure 1).
To ensure the strategy of literature collection, we confirmed that the same reports were selected by 2 authors (C.I. and A.H.), independently conducting the search. The inclusion of reports not selected by both authors was discussed among all authors.
From the selected reports, we extracted journal information (name, impact factor in 2018), general report information (publication year, number of citations in March 2021), author information (numbers of statisticians and other roles; eg, project and data managers), funding source (for-profit [eg, pharmaceutical or medical device industry], nonprofit [eg, public funds], both, no funding), rationale for conducting the study (novelty [ie, not used in clinical practice], safety, experimental treatment application simplicity), primary end point (efficacy measurement), noninferiority margin, number of patients or participants (planned and actual numbers of individuals enrolled), and analysis population. This information was recorded in a data collection form that was prepared in advance.
We defined spin as the “use of specific reporting strategies, from whatever motive, to highlight that the experimental treatment is beneficial, despite a statistically nonsignificant difference for the primary endpoint, or to distract the reader from statistically nonsignificant results,” as proposed by Boutron et al.6 We used this definition to evaluate the presence of spin in the results and conclusion sections in the abstracts, as well as the results, discussion, and conclusion sections in the main text of each report. The spin strategies were identified according to the type of research,18-21 and we identified spin strategies for noninferiority RCTs. Spin was considered to be present if the report corresponded to the strategies of spin specified in advance and improperly emphasized the benefit of the experimental treatment.
The 8 spin strategies specified in advance claimed the benefit of the experimental treatment (1) by emphasizing trends in point estimates, despite lacking significance for the primary end point results, denoted as trend for primary end point; (2) based on the results of the secondary end point, denoted as secondary end point; (3) based on the results of subgroup analysis, denoted as subgroup analysis; (4) based on the secondary analysis results of the primary end point, such as by changing the analysis population or measuring the treatment effect, denoted as secondary analysis of the primary end point; (5) based on the intragroup comparison results, such as those before and after treatment, denoted as within-group comparisons; (6) with no mention in the discussion of the experimental treatment’s unclear safety profile in reports that stated the rationale for conducting the study was safety of the experimental treatment, denoted as no mention of safety profile; (7) based on safety alone, despite insignificant results of the primary end point analysis in the conclusion section, denoted as safety; or, (8) for any other situations deemed as spin by the reviewers, denoted as other. No report was found in the within-group comparison category, so the category is not presented in the results.
The spin levels in the conclusions of the abstract and the main text were evaluated as none, low, moderate, and high. Low indicates that the report contained spin, despite mentioning that noninferiority was not confirmed; alternatively, the report did not mention that noninferiority was not confirmed but rather mentioned uncertainty in claiming the treatment benefit, as well as the need to perform additional confirmatory studies. Moderate indicates that the report did not mention that noninferiority was not confirmed and failed to mention either the presence of uncertainty to claim the experimental treatment benefit or the need for additional confirmatory studies. High indicates that the report did not mention noninferiority was not confirmed and recommended the clinical application of the experimental treatment, despite not mentioning the uncertainty in claiming the experimental treatment benefit and not mentioning the need for additional confirmatory studies.
Spin was evaluated independently by the main reviewer (C.I.) and 2 secondary reviewers (K.U. and K.O.). Disagreements were discussed to reach a final evaluation.
The primary end point of this study was the spin prevalence in any section of the reports, including results and conclusion of the abstract and the results, discussion, and conclusion in the main text. The secondary end points were the spin prevalence in the abstract and main text sections and the spin level in the abstract and main text conclusion.
For discrete variables, we calculated the number of reports and percentages for each category. For continuous variables, we calculated the medians and IQRs. To determine the spin prevalence in any section of a report, we calculated the spin proportion and the 95% CI. To examine the associations between trial characteristics and prevalence of spin, we calculated the prevalence difference relative to the reference category, as well as the 95% CI. The Agresti-Coull22 or Agresti-Caffo methods23 were used for all 95% CI calculations. Logistic regression was performed to adjust for variables. Backward stepwise selection was used (P = .15 removed), and the Wald 95% CIs for odds ratios (ORs) were calculated. Journal specialty, publication year, journal impact factor, number of citations, presence of various author roles, funding from for-profit source, rationale of novelty, safety, application simplicity, primary outcome type (hazard ratio or survival proportion difference), noninferiority margin, planned sample size achievement, and analysis population were used as variables. All statistical analyses were performed using SAS statistical software version 9.4 (SAS Institute). Statistical significance was determined with 2-sided 95% CIs that do not cross 1.
Our PubMed search identified 2752 studies, of which we selected 166 eligible parallel-group noninferiority RCTs, including 52 negative studies (Figure 1).24-75 Table 1 presents the selected study characteristics. The negative studies comprised 12 reports published from 2010 to 2013, 19 reports published from 2014 to 2016, and 21 reports published from 2017 to 2019. Two-thirds of the reports (34 reports [65.4%]) assessed drugs as experimental cancer treatments. More than half of the trials (29 reports [55.8%]) were funded by nonprofit sources only. No specific noninferiority margin was more frequently set as a trial design.
Table 2 shows the spin prevalence in the 52 negative studies, according to each abstract and main text section. A total of 39 reports (75.0%; 95% CI, 51.7%-84.9%) contained spin in at least 1 part. Regarding the abstracts, 34 reports (65.4%; 95% CI, 51.8%-76.9%) contained spin in at least 1 section, and 10 reports (19.2%) contained spin in all sections. One report (1.9%) contained spin the results only, and 23 reports (44.2%) contained spin in the conclusions only. Regarding the main text, 38 reports (73.1%; 95% CI, 59.7%-83.3%) had spin in at least 1 section, and 6 reports (11.5%) had spin in all sections. Finally, 3 reports (5.8%) contained spin in the discussion only, and 13 reports (25.0%) contained spin in the conclusions only. In both the abstract and main text, the conclusions section tended to show spin more often.
According to our spin classification system, 7 reports (13.5%) contained secondary end point spin in the results of the abstract. Nine reports (17.3%) included secondary end point or subgroup analysis spin with no consideration of a primary outcome that was not statistically significant, and 10 reports (19.2%) had secondary end point or subgroup analysis spin that acknowledged results that were not statistically significant for the primary outcome in the conclusion of the abstract. Six reports (11.5%) contained safety spin in the same section. In the main text, 6 reports (11.5%) contained secondary end points spin, and 7 reports (13.5%) had secondary analysis of primary end point spin in the discussion of the main text. Regarding the conclusion of the main text, 9 reports (17.3%) showed secondary end point or subgroup analysis spin without mentioning that results related to the primary outcome were not statistically significant, 11 reports (21.2%) had secondary end point or subgroup analysis spin that mentioned that results related to the primary outcome were not statistically significant, and 6 reports (11.5%) had safety spin.
We identified 33 reports (63.5%) with spin in the conclusion of the abstract; of these, 24 reports (46.2%) had low spin levels and 8 reports (15.4%) had high spin levels. Similarly, 35 reports (67.3%) had spin in the main text, with 25 reports (48.1%) showing low spin levels and 9 reports (17.3%) showing high spin levels. Examples of the conclusions for each spin level in the conclusion of the abstract are shown in eTable 1 in the Supplement.
We evaluated the association between spin and the assessed variables to consider their associations with spin prevalence (Figure 2). Lack of data managers, project managers, or similar roles among authors (prevalence difference, 27.0%; 95% CI, 1.1%-50.3%) and having no funding by for-profit sources (prevalence difference, 31.2%; 95% CI, 4.8%-53.8%) were associated with higher spin prevalence. Studies with novel study treatments showed more prevalent spin (prevalence difference, 37.5%; 95% CI, 5.8%-64.7%). Considering the backward stepwise multivariable logistic regression, of 52 reports included in this study, 3 reports were excluded with no reported efficacy measurement (hazard ratio or difference in survival proportion) and funding sources, and all variables shown in Figure 2 were entered; the final model included the novelty of the study treatment and source of funding. The analysis found that reports without funding by for-profit sources (OR, 5.20; 95% CI, 1.21-22.29) and reports on novel study treatments (4.64; 95% CI, 0.98-22.02) were associated with more prevalent spin.
We examined the associations of spin level in the conclusion of the abstract for the factors with large differences in spin prevalence and relatively large numbers of studies in those categories. We did not include as targets for examination factors that had relatively few studies (eTable 2 in the Supplement).
This systematic review found that in noninferiority oncology RCTs with results that were not statistically significant, three-quarters of reports contained spin. Direct comparisons are complicated because of the differences in study periods and the definition of spin. However, the spin prevalence was higher in these reports than in studies reporting superiority RCTs for cancer treatments6-10 (Figure 3). Certain design features in noninferiority trials make interpretation difficult, and certain regulatory authorities16,76 or guidelines77 have made recommendations for interpretation. These complexities could lead to a high spin prevalence.
We identified specific spin strategies in noninferiority RCTs. Based on our spin strategies, approximately 10% of reports focused on safety, a spin unique to noninferiority trials. Safety itself is only the rationale, or merely a prerequisite for conducting a noninferiority trial. Researchers should consider the benefit of an experimental treatment based on the results of a confirmatory analysis of efficacy, not on the safety of the treatment.
As with superiority trials, the most prevalent spin strategy was focusing on the secondary end point or subgroup analysis. This common strategy was consistent across all sections. Some spin focusing on the secondary end point or subgroup analysis was based on claiming similar efficacy without statistical consideration (eg, only with the point estimation value).
Our analysis found that the conclusions of RCTs showed the highest spin prevalence compared with other sections, a finding consistent with other reports6,7,9 and with important implications, as most physicians are interested only in the conclusions sections of publications.78 Spin-based reporting of P values resulted in distorted conclusions. Moreover, readers might interpret studies to have positive results even with results for the primary outcome that are not significant.79
Few studies have explored spin-related factors. A study by Khan et al80 reported that a small number of citations per year, a primary end point of efficacy or nonbinary outcome, use of a drug as a control treatment, and publication in specific journals were factors associated with high spin level in reports on superiority trials in the field of cardiology. In our study, we generated 2 hypotheses for factors associated with spin prevalence. The first factor was the rationale for the experimental treatment being novelty. To our knowledge, this is the first study to suggest the possibility of an association between the experimental treatment novelty and spin. Even if authors report negative results for an experimental treatment that is already in clinical use, a single clinical trial might not be enough to eliminate the given treatment from clinical practice. However, if the experimental treatment is not used in clinical practice, not only would the experimental treatment not become the standard of care but the results might not even get published (ie, publication bias). The second factor was a lack of funding from for-profit sources. While clinical trials funded by for-profit sources tend to report data that favor for-profit sources,81 and a study by Vera-Badillo et al10 reported that funding did not appear to be significantly associated with spin, our study found a different trend. We suggest that this could be because the organization responsible for reviewing the contents of an RCT report is likely to be well designed if the study was conducted in an environment with an external funding source. The organizational structure of an RCT is important from the perspective of the study’s feasibility and the resulting data’s reliability.82 Our findings suggest that the organizational structure might also be important for ensuring proper reporting. In this context, data managers might contribute not only to improving data reliability but to reducing the spin incidence in the reports.
To improve the quality of RCT reports, the International Committee of Medical Journal Editors mandated in 2004 that RCTs need to be registered for them to be published as academic reports.83 This requirement has resulted in an increasing proportion of cancer clinical trials registering the primary end point,84 as well as studies disclosing all details of the protocol in addition to the primary end point.85 These sustained efforts, in addition to the efforts of journal editors to reduce spin, might contribute to the decreasing trend of spin prevalence.86 We anticipate that more studies are likely to be reported properly in the future.
This study has some limitations. We only searched PubMed; therefore, this review potentially excludes studies not present in PubMed. As this review was limited to studies on cancer, the conditions of spin in reports on other diseases might differ. While we examined the associations between trial characteristics and spin prevalence, the number of reports included was too small and the characteristics were heterogenic to examine the associations between these characteristics in detail. Moreover, in the multivariable analysis, we selected variables by backward stepwise selection and the results are only useful for hypothesis generation; other factors could be confounders for the factors we found associated with spin and were impossible to adjust for.
This systematic review found that 75% of negative reports on noninferiority RCTs of cancer treatments contained spin. Our results suggest that lack of funding from for-profit sources and novel experimental treatments may be associated with a high spin prevalence.
Accepted for Publication: September 28, 2021.
Published: December 7, 2021. doi:10.1001/jamanetworkopen.2021.35765
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2021 Ito C et al. JAMA Network Open.
Corresponding Author: Koji Oba, PhD, Interfaculty Initiative in Information Studies, The University of Tokyo, Annex of Bldg 3, 5th Floor, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033 Japan (email@example.com).
Author Contributions: Mr Ito and Dr Oba had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Ito, Uemura, Oba.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Ito, Uemura.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Ito, Uemura, Oba.
Administrative, technical, or material support: Ito, Oba.
Supervision: Uemura, Oba.
Conflict of Interest Disclosures: Dr Oba reported receiving personal fees from Eizai, Chugai Pharmaceutical, Ono Pharmaceutical, Asahi-Kasei Pharma, Takeda Pharmaceutical, Daiichi-Sankyo, and Bristol Myers Squibb outside the submitted work. No other disclosures were reported.
Additional Contributions: Editage provided English language editing.