Customize your JAMA Network experience by selecting one or more topics from the list below.
Lee M, Valero C, Morris LGT, Marti JL. Association of Study Methods and Industry Sponsorship With Inconsistent Performance of Molecular Assays for Indeterminate Thyroid Nodules. JAMA Otolaryngol Head Neck Surg. 2021;147(1):101–103. doi:10.1001/jamaoto.2020.3252
A subset (15%-25%) of thyroid nodules undergoing biopsy cannot be classified as benign or malignant. Under the Bethesda System for Reporting Thyroid Cytopathology, these cytologically indeterminate thyroid nodules (ITNs) are categories III and IV.1 Molecular assays using genetic and/or gene expression data may help risk stratify ITNs, potentially guiding decisions to operate or observe. These assays include the Afirma Gene Expression Classifier and Genomic Sequencing Classifier (Veracyte), and ThyroSeq v2 and v3 (CBLPath). Afirma panels are marketed to rule out malignancy with high negative predictive value, and ThyroSeq panels are marketed as rule out/rule in tests with high negative predictive values (NPV) and positive predictive values (PPV).
Recently, some studies have observed that the real-world performance of these assays often falls short of that reported by initial industry-sponsored studies.2-4 Causes for this divergence between efficacy (performance under ideal circumstances) and effectiveness (performance in real-world conditions) are not well understood. In this article, we examined whether methodologic differences in study design were associated with observed performance of molecular assays for ITNs.
A PubMed search identified studies of Afirma and ThyroSeq assays for Bethesda category III/IV ITNs published between January 2012 and December 2019. Only 1 study of ThyroSeq v3 was identified and was not included. A prior systematic review2 was used to supplement missing data. We examined several methodologic questions for each study: (1) Was the assay used selectively or reflexively (universally sent for all ITNs)? (2) Were final surgical pathology results matched to the biopsied nodule? (3) Were separate incidental carcinomas identified and excluded? (4) Were noninvasive follicular thyroid neoplasms with papillarylike nuclear features (NIFTP) classified as nonmalignant? and (5) Was the denominator for the NPV of negative molecular tests based on resected nodules or all biopsied (even nonresected) nodules?
Studies receiving funding from the parent company were considered industry sponsored. Sensitivity, specificity, NPV, and PPV were compared between each methodologic choice across all studies using a nonparametric Mann-Whitney U test in Stata, version 16 (StataCorp) with a prespecified α of .05.
There were 54 published studies that met inclusion criteria: ThyroSeq v2 (9 [16.7%]), the Afirma Gene Expression Classifier (39 [72.2%]), and the Afirma Genomic Sequencing Classifier (6 [11.1%]). Overall, 30 of 44 studies (68%) used the molecular assay selectively; 36 of 54 (67%) matched surgical pathology results to the biopsied nodule; 15 of 54 (28%) identified and excluded separate carcinomas; and 12 of 54 (22%) classified NIFTP as nonmalignant. Each of these methodologic choices was associated with lower PPVs (selective use, 40% vs 48%, P = .11; matched pathology, 39% vs 49%, P = .03; separate carcinomas excluded, 33% vs 46%, P = .01; and NIFTP classified as nonmalignant, 26% vs 45%, P < .001). Similar numeric differences were observed within each assay (Table). The denominator for NPV calculation was limited to resected nodules in 22 of 28 studies (79%). This was associated with a lower NPV compared with studies that assumed nonsurgical cases were benign (95% vs 99%, P = .01). The PPV was higher (62% vs 41%, P = .04) in industry-sponsored studies.
The observed performance of molecular assays used to evaluate ITNs has varied widely across different studies.2 This has been partially attributed to differences in disease prevalence and institutional characteristics.3,4 We found that study methods and industry sponsorship were also markedly associated with the PPV and NPV of these assays, further explaining variation and inconsistencies in published results. To optimize internal and external validity, research should apply methods that reflect prevalent practices and clinical guidelines for the use of these assays, eg, using the assay selectively rather than universally,5 matching surgical pathology results with the biopsied nodule, not assuming that all nonresected nodules with negative molecular testing are benign, and not classifying NIFTPs as cancers.6 These methodologic choices were each associated with lower PPVs and NPVs, and were less commonly used in industry-sponsored studies. Harmonization of study methods may help better represent the value of these assays.
Accepted for Publication: July 31, 2020.
Corresponding Author: Jennifer L. Marti, MD, Department of Surgery, Weill Cornell Medicine, 420 E 70th St, 2nd Floor, New York, NY 10065 (email@example.com).
Published Online: October 8, 2020. doi:10.1001/jamaoto.2020.3252
Author Contributions: Dr Marti had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Lee, Marti.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Lee.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Lee, Valero.
Administrative, technical, or material support: Lee.
Supervision: Morris, Marti.
Conflict of Interest Disclosures: Dr Valero reported grants from Fundación Alfonso Martín Escudero during the conduct of the study. Dr Morris reported grants from AstraZeneca and Illumina, Inc outside the submitted work. No other disclosures were reported.