Is the diagnostic performance of a common thyroid nodule gene expression classifier in the initial validation study consistent with results of postmarketing studies?
In this systematic review and meta-analysis of 19 studies involving 2568 cytologically indeterminate thyroid nodules, the diagnostic performance of the gene expression classifier reported in the initial validation study could not explain the results in subsequent publications and was significantly different for atypia or follicular lesion of undetermined significance compared with follicular neoplasm specimens.
The initial validation study cohort did not appear to be representative of the populations to which the gene expression classifier has subsequently been applied.
In the United States, the most used molecular test for the evaluation of cytologically indeterminate thyroid nodules is the Afirma gene expression classifier (GEC).
To evaluate the GEC’s diagnostic performance through a novel approach to assess whether the findings of the initial validation study are consistent with the results of postmarketing studies.
PubMed was systematically searched from inception through October 26, 2017, using the terms gene expression classifier or Afirma or GEC and thyroid.
Studies included were those in which the GEC diagnostic performance could be calculated on consecutively resected cytologically indeterminate thyroid nodules.
Data Extraction and Synthesis
Two observers independently assessed study eligibility and risk of bias using the quality assessment tool for observational cohort and cross-sectional studies of the National Heart, Lung, and Blood Institute. Summary data were extracted by a reviewer and reviewed independently by another. Study authors were contacted if missing data were needed. Data were pooled using a random-effects model. PRISMA and MOOSE guidelines were followed.
Main Outcomes and Measures
Evaluation of the linear correlation between the benign call rate (BCR) and the positive predictive value (PPV).
Of the 137 retrieved titles, 19 (13.9%) were included, comprising a total of 2568 thyroid nodules. Based on a simulation using the sensitivity and specificity reported in the initial validation study, the observed BCR and PPV values in postmarketing studies would have to be explained by different underlying prevalence rates of cancer (15% vs 30%), which is an impossible event. Furthermore, the overall correlation between BCR and PPV for independent studies fell outside the PPV 95% CI of the initial validation study (95% CI, 0.17-0.32) at the BCR of pooled independent studies (0.45) and was just at the limit of the BCR 95% CI of the initial validation study (95% CI, 0.32-0.45) at the PPV of pooled independent studies (0.45). The diagnostic performance was statistically significantly better for atypia or follicular lesions of undetermined significance (diagnostic odds ratio [DOR], 5.67; 95% CI, 4.23-7.60) compared with follicular neoplasms (DOR, 2.24; 95% CI, 1.45-3.47).
Conclusions and Relevance
The findings suggest that the initial validation study cohort was not representative of the populations in whom the GEC has been used, calling into question its reported diagnostic performance, including its negative predictive value.
Valderrabano P, Hallanger-Johnson JE, Thapa R, Wang X, McIver B. Comparison of Postmarketing Findings vs the Initial Clinical Validation Findings of a Thyroid Nodule Gene Expression Classifier: A Systematic Review and Meta-analysis. JAMA Otolaryngol Head Neck Surg. Published online July 18, 2019145(9):783–792. doi:10.1001/jamaoto.2019.1449
Customize your JAMA Network experience by selecting one or more topics from the list below.
Create a personal account or sign in to: