GEC indicates Afirma gene expression classifier.
aTwo studies described patients sent to a surgical procedure only; 1, nodules with oncocytic features only; 1, GEC-benign nodules only; 1, GEC-suspicious nodules only; and 1, noninvasive follicular thyroid neoplasms with papillary-like nuclear features only.
bOne multicenter publication was included only in part.
A, Each open circle represents the BCR in each study for all indeterminate thyroid nodules (blue) and for atypia or follicular lesion of undetermined significance (A/FLUS) and follicular neoplasm (FN) specimens separately. The orange diamond represents the value in the initial validation study (Alexander et al4). For B and C, the curves for each independent (postmarketing) study are represented in gray, the curve of the pooled independent studies is represented in light blue, and the curve of the initial validation study4 is represented in dark blue. NPV indicates negative predictive value; PPV, positive predictive value.
Benign cell rate–positive predictive value (BCR-PPV) correlation (dark blue line) for all cytologically indeterminate thyroid nodules (ITNs; A and B), including atypia or follicular lesion of undetermined significance (A/FLUS; C and D) and follicular neoplasm (FN; E and F) specimens separately, derived from the data of the initial validation study4 (orange dot reflects the actual value in that study). The correlation shows the PPV 95% CIs (shaded area) in the left panels, and the BCR 95% CIs (shaded area) in the right panels. The specific BCR-PPV value observed in each study is represented by open circles, and the overall value by a dark blue dot.
Data from Alexander et al4 are excluded from this analysis. Dashed lines represent the observed BCR and PPV values in each cohort. The boxplots at 5% prevalence of cancer intervals represent the BCR and PPV values derived from 200 simulations using the sensitivity and specificity of the initial validation study4 and the sample size and percentage of resected gene expression classifier–suspect nodules of the cohort. The BCR and PPV curves are plotted through the means of each boxplot. Shaded rectangles cover the prevalence rates of cancer for which the observed values (dashed lines) would be within the 5th and 95th quantile of the simulated values. A/FLUS indicates atypia or follicular lesion of undetermined significance; FN, follicular neoplasm; and ITNs, cytologically indeterminate thyroid nodules.
eTable 1. Exclusion Criteria of Full Text Articles
eTable 2. Risk of Bias Assessment of Enrolled Studies
eTable 3. Summary Data for Meta-analysis (All ITNs)
eTable 4. Summary Data for Meta-analysis (A/FLUS)
eTable 5. Summary Data for Meta-analysis (FN)
eTable 6. Diagnostic Performance of the GEC
eFigure 1. Forest Plots of Reported GEC Sensitivities and Specificities
eFigure 2. Observed BCR and Predictive Values for Studies With and Without Conflict of Interest with Veracyte
eFigure 3. Observed BCR and Predictive Values for Studies With Histological Diagnosis Matched by Both Size and Location to the Index Nodule
eFigure 4. Diagnostic Odds Ratios (DOR), Likelihood Ratios, SROC Curves, and Forest Plots of Log DOR
eFigure 5. Publication Bias (Log DOR Funnel Plot)
eFigure 6. BCR-PPV Correlation for Studies With and Without Conflict of Interest with Veracyte
eFigure 7. BCR-PPV Correlation for Studies With Histological Diagnosis Matched by Both Size and Location to the Index Nodule
eFigure 8. Expected Underlying Prevalence of Malignancy for BCR and PPV Values (Overall Cohort)
eFigure 9. Simulation for Individual Studies With ≥35 Nodules Resected (All ITNs)
eFigure 10. Simulation for Individual Studies With ≥35 Nodules Resected (A/FLUS)
eFigure 11. Simulation for Individual Studies With ≥35 Nodules Resected (FN)
Customize your JAMA Network experience by selecting one or more topics from the list below.
Valderrabano P, Hallanger-Johnson JE, Thapa R, Wang X, McIver B. Comparison of Postmarketing Findings vs the Initial Clinical Validation Findings of a Thyroid Nodule Gene Expression Classifier: A Systematic Review and Meta-analysis. JAMA Otolaryngol Head Neck Surg. 2019;145(9):783–792. doi:10.1001/jamaoto.2019.1449
Is the diagnostic performance of a common thyroid nodule gene expression classifier in the initial validation study consistent with results of postmarketing studies?
In this systematic review and meta-analysis of 19 studies involving 2568 cytologically indeterminate thyroid nodules, the diagnostic performance of the gene expression classifier reported in the initial validation study could not explain the results in subsequent publications and was significantly different for atypia or follicular lesion of undetermined significance compared with follicular neoplasm specimens.
The initial validation study cohort did not appear to be representative of the populations to which the gene expression classifier has subsequently been applied.
In the United States, the most used molecular test for the evaluation of cytologically indeterminate thyroid nodules is the Afirma gene expression classifier (GEC).
To evaluate the GEC’s diagnostic performance through a novel approach to assess whether the findings of the initial validation study are consistent with the results of postmarketing studies.
PubMed was systematically searched from inception through October 26, 2017, using the terms gene expression classifier or Afirma or GEC and thyroid.
Studies included were those in which the GEC diagnostic performance could be calculated on consecutively resected cytologically indeterminate thyroid nodules.
Data Extraction and Synthesis
Two observers independently assessed study eligibility and risk of bias using the quality assessment tool for observational cohort and cross-sectional studies of the National Heart, Lung, and Blood Institute. Summary data were extracted by a reviewer and reviewed independently by another. Study authors were contacted if missing data were needed. Data were pooled using a random-effects model. PRISMA and MOOSE guidelines were followed.
Main Outcomes and Measures
Evaluation of the linear correlation between the benign call rate (BCR) and the positive predictive value (PPV).
Of the 137 retrieved titles, 19 (13.9%) were included, comprising a total of 2568 thyroid nodules. Based on a simulation using the sensitivity and specificity reported in the initial validation study, the observed BCR and PPV values in postmarketing studies would have to be explained by different underlying prevalence rates of cancer (15% vs 30%), which is an impossible event. Furthermore, the overall correlation between BCR and PPV for independent studies fell outside the PPV 95% CI of the initial validation study (95% CI, 0.17-0.32) at the BCR of pooled independent studies (0.45) and was just at the limit of the BCR 95% CI of the initial validation study (95% CI, 0.32-0.45) at the PPV of pooled independent studies (0.45). The diagnostic performance was statistically significantly better for atypia or follicular lesions of undetermined significance (diagnostic odds ratio [DOR], 5.67; 95% CI, 4.23-7.60) compared with follicular neoplasms (DOR, 2.24; 95% CI, 1.45-3.47).
Conclusions and Relevance
The findings suggest that the initial validation study cohort was not representative of the populations in whom the GEC has been used, calling into question its reported diagnostic performance, including its negative predictive value.
Molecular markers have revolutionized in the past decade the clinical management of patients with cytologically indeterminate thyroid nodules (ITNs), and these markers are increasingly being integrated into professional guidelines.1-3 The most widely used molecular test for the evaluation of ITNs is a commercial proprietary gene expression classifier (GEC) called the Afirma GEC (Veracyte), which is performed more than 20 000 times annually in the United States.
The Afirma GEC was designed as a rule-out test for cancer and reports a binary outcome: GEC-benign or GEC-suspicious. Its diagnostic performance was validated in a blinded prospective multicenter trial, in which it achieved a negative predictive value (NPV) of 95% in atypia or follicular lesion of undetermined significance (A/FLUS) and 94% in follicular neoplasm (FN) specimens, which was similar to that of a benign cytological diagnosis.4,5 However, the sample size of the study was relatively small (129 A/FLUS and 81 FN specimens); thus, the NPV 95% CIs were wide (85%-99% for A/FLUS and 79%-99% for FN nodules).4 A study of this type would usually be followed by additional independent studies to confirm the findings and narrow the CIs. The GEC, however, immediately garnered wide acceptance in clinical practice, and patients with GEC-benign results started to be offered follow-up in lieu of a diagnostic surgical procedure.6-8 Furthermore, guidelines from major professional associations endorsed such an approach.1-3
After the publication of this pivotal study,4 several groups have published studies of their clinical experience with the Afirma GEC. However, none of these publications is a formal independent validation study, which requires the resection of ITNs regardless of GEC results.9 Instead, these postmarketing studies have been unblinded and have resection rates that are low for GEC-benign nodules and high for GEC-suspicious nodules.10 Selective resection makes measuring the false-negative rate and NPV impossible and provides unreliable estimates of sensitivity, specificity, and underlying prevalence of cancer in the study cohort. One approach to overcoming this fundamental limitation has been to treat GEC-benign nodules that do not develop evidence of cancer during follow-up as true-negative nodules.9 However, given the slow rate of progression of many early-stage thyroid cancers, this approach continues to underestimate the prevalence of cancer and the false-negative rate, artificially inflating the reported sensitivity, specificity, and NPV of the GEC.
The only reliably reported test metrics in postmarketing studies are the benign call rate (BCR, the proportion of nodules tested with a GEC-benign result) and the positive predictive value (PPV). Because an additional true clinical validation study is unlikely to ever be conducted, we developed a novel approach that uses the BCR and the PPV to assess whether the diagnostic performance of the Afirma GEC in the initial validation study4 is consistent with the results of postmarketing studies.
This systematic review and meta-analysis followed the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) reporting guideline11 and the Meta-analyses Of Observational Studies in Epidemiology (MOOSE) guidelines.12
PubMed was systematically searched from inception through October 26, 2017, without applying any limits using the following search terms: gene expression classifier or Afirma or GEC and thyroid. The search strategy that 2 of us (P.V. and J.E.H.-J.) developed retrieved 137 items, which were manually screened. References that consisted of abstracts alone were not considered. The reference lists of the selected articles were also screened to identify additional studies. Endnote X7 (Thomson Reuters) was used to compile and manage citations.
Two of us (P.V. and J.E.H.-J.) independently assessed each study’s inclusion eligibility according to the following predefined exclusion criteria: (1) the GEC test was not done on A/FLUS or FN nodules, or the results of these 2 categories could not be individualized; (2) the test metrics could not be calculated on resected nodules (the number of true-positives, false-positives, true-negatives, and false-negatives was not specified or could not be determined in resected nodules); (3) the series did not include all consecutively evaluated thyroid nodules with A/FLUS and/or FN during a specific period (the cohort was selected according to criteria); and (4) the cohort was included in a larger or methodologically better series. We contacted the authors of articles with missing information to offer them the opportunity to provide such data.
One of us (P.V.) extracted summary data, and another (J.E.H.-J.) independently reviewed these data. Disagreements at any time during the process were discussed and resolved by consensus. We included only nodules with A/FLUS or FN cytological results that yielded sufficient RNA for GEC testing (GEC-benign or GEC-suspicious). We extracted from each publication the following information: number of nodules evaluated with GEC (total, GEC-benign, and GEC-susicious); number of resected nodules (total, GEC-benign, and GEC-suspicious); and number of true-positives (GEC-suspicious and malignant histological diagnosis), false-positives (GEC-suspicious and benign histological diagnosis), true-negatives (GEC-benign and benign histological diagnosis), or false-negatives (GEC-benign and malignant histological diagnosis).
When both A/FLUS and FN cytological results were included in the same study, we collected aggregate and distinct data for each cytological category. When these data were missing, we obtained the information directly from the authors and used this information to validate the reported aggregate data. Test metrics were calculated with 95% CIs. We also collected study design (prospective; retrospective), name and number (multicenter; single center) of participating institutions, type of institutions (academic; nonacademic), period of study, cytological interpretation (local pathologist; GEC’s associated pathology laboratory), and conflict of interest (none; sponsored by Veracyte; or author’s conflict of interest with Veracyte or other companies).
Two of us (P.V. and J.E.H.-J.) independently assessed the risk of bias of each enrolled study using the quality assessment tool for observational cohort and cross-sectional studies of the National Heart, Lung, and Blood Institute.13 All disagreements were resolved by consensus.
In postmarketing studies, only 2 statistics have been consistently and appropriately evaluated: the BCR, which is independent of the surgical resection rates, and the PPV, because final histological diagnosis is available in nearly 80% of GEC-suspicious nodules in most studies. These 2 statistics were used in 2 approaches to validate the GEC’s diagnostic performance.
First, both BCR and PPV are linearly dependent on the same 3 variables: the underlying prevalence of cancer as well as the sensitivity and specificity of the test. The BCR and PPV alone, however, lack sufficient information to calculate the cohort’s prevalence of cancer or the sensitivity, specificity, and NPV of GEC without making assumptions about the false-negative rate, which was unknown in these studies. Nonetheless, if the sensitivity and specificity of the initial validation study4 were consistent with subsequent studies, the BCR-PPV correlation of those studies should fall within the 95% CIs of the initial validation study. The BCR-PPV values that fall outside of those CIs could not be explained by the differences in the underlying prevalence of cancer alone, indicating that the performance of the GEC is inconsistent with that reported in the initial validation study. The BCR-PPV correlation was explored for all published postmarketing studies, analyzing the results in the entire cohort of ITNs and separately for A/FLUS and FN specimens. Post hoc analyses were conducted to evaluate the differences between studies with and without conflicts of interest with the company that marketed the Afirma GEC and to reevaluate diagnostic performance only in studies with strict matching of histological diagnosis to the biopsied nodule by both size and location.
Second, if the sensitivity and specificity of the initial validation study remained consistent in subsequent studies, the observed BCR and PPV values in those studies should vary solely from the underlying prevalence of cancer in the cohort studied. Consequently, the observed BCR and PPV values should be aligned with (explained by) the same underlying prevalence of cancer. However, if the prevalence of cancer that predicts the observed BCR is inconsistent with that required to predict the observed PPV, the GEC diagnostic performance reported in the initial validation study would be called into question. The observed BCR and PPV values are subject to probabilistic error, so we calculated the range of prevalence of cancer that could explain the observed results. Expected values of BCR and PPV with 95% CIs were calculated for underlying prevalence of cancer at 5% intervals (ranging between 1% and 99%). These values were derived by randomly simulating the BCR and PPV values 200 times using the 90% sensitivity and 52% specificity of the GEC reported in the initial validation study, the sample size, and the proportion of GEC-benign and GEC-malignant nodules resected in the study. This simulation was performed for the entire cohort of ITNs, for A/FLUS and FN specimens separately, and for each independent study with at least 35 resected nodules.
All studies that met the inclusion criteria were included in the analysis. Cancer prevalence, sensitivity, specificity, NPV, PPV, and diagnostic odds ratios (DORs) were calculated on resected nodules only with 95% CIs. Measures of heterogeneity such as the I2 statistic and DerSimonian-Laird estimator for τ2 were calculated. In 1 study, the cytological diagnosis of 21 unresected nodules was unavailable, thus it was excluded from the comparison of BCR and resection rates between A/FLUS and FN specimens.14 Data were pooled using a random-effects model. All data analysis was done using R, version 3.4.3 (R Foundation for Statistical Computing)15 and mada package, version 0.5.8, was used for meta-analysis.16
The PubMed search retrieved 137 titles, of which 31 (22.6%) were selected for full-text review and 19 (13.9%) were included in the final analysis, comprising a total of 2568 ITNs (Figure 1).4,6,14,17-32 One (5%) of the included studies was a multicenter study for which we included the data of only 1 of the 5 contributing institutions (3 had published larger series that overlapped with the cohort reported in this study, and another declined to unblind and share their data).6 The exclusion criteria for the remaining 12 studies are detailed in eTable 1 in the Supplement. The quality rating in the risk of bias was good in 1 study (the initial validation study) (5%), poor in 1 (5%), and fair in the other 17 (89%) (eTable 2 in the Supplement). Only 3 studies (16%), including the initial validation study,4 were prospective, and 4 (21%) were multicenter (Table). All except 1 (5%) of the studies were conducted in the United States, and cytological specimens were evaluated by local pathologists in all but 2 studies (10%) in which specimens were evaluated by the GEC’s associated laboratory. At least 1 study author disclosed a conflict of interest in 7 (39%) of the 18 studies included in the analysis; Veracyte was the source of conflict in 6 of these 7 studies.
The overall cytological diagnosis was A/FLUS in 1872 ITNs (73%) and FN in 675 (26%), but the diagnosis was unavailable in 21 (0.8%) unresected nodules in 1 study.14 The breakdown for each study is reported in eTables 3 to 5 in the Supplement. Altogether, 1158 nodules were GEC-benign (45%), and 1410 (55%) were GEC-suspicious. The BCR was higher for A/FLUS compared with FN specimens (49%; 95% CI, 46%-51% vs 36%; 95% CI, 32%-40%) (Figure 2A). The odds of having a GEC-benign result were 68% higher in A/FLUS compared with FN specimens (OR, 1.68; 95% CI, 1.40-2.02).
A total of 1367 nodules (53%) were resected. The resection rates were lower for GEC-benign than for GEC-suspicious nodules (20% vs 80%), particularly if the data from the initial validation study4 was excluded (14% vs 78%), and were lower for A/FLUS compared with FN specimens (50% vs 62%). The resection rates of GEC-benign nodules were lower in A/FLUS compared with FN nodules (19% vs 25%) only if the initial validation study was included but not otherwise (14% vs 14%). The resection rates of GEC-suspicious nodules were similar for A/FLUS and FN specimens either with (79% vs 82%) or without (77% vs 80%) the initial validation study.
The Afirma GEC’s diagnostic performance was calculated on 1364 ITNs with histological follow-up (53% of the entire cohort), 944 A/FLUS specimens (69%), and 420 FN specimens (31%) (eFigure 1 in the Supplement). The overall prevalence of cancer on resected nodules was 39%, 39% for A/FLUS specimens, and 38% for FN specimens. The sensitivity of the GEC was 95% (95% CI, 92%-96%) overall, 95% (95% CI, 92%-97%) for A/FLUS specimens, and 94% (95% CI, 89%-97%) for FN specimens. The specificity of the GEC was 25% (95% CI, 22%-28%) overall, 27% (95% CI, 24%-31%) for A/FLUS specimens, and 20% (95% CI, 15%-25%) for FN specimens (eTable 6 in the Supplement).
The predicted PPV-prevalence of cancer correlation was consistent across the studies but lower than that reported in the initial validation study (Figure 2B). The PPV was 45% (95% CI, 42%-47%) overall, 46% (95% CI, 42%-50%) for A/FLUS specimens, and 41% (95% CI, 36%-47%) for FN specimens. In contrast, the predicted NPV-prevalence of cancer correlation was inconsistent across the studies (Figure 2C). The NPV was 88% (95% CI, 83%-92%) overall, 90% (95% CI, 84%-94%) for A/FLUS specimens, and 84% (95% CI, 72%-92%) for FN specimens. Similar results were found for studies with or without a conflict of interest with Veracyte (eFigure 2 in the Supplement) and for studies in which histological diagnosis was strictly matched by size and location to the biopsied nodule only (eFigure 3 in the Supplement).
The DOR was 4.67 (95% CI, 3.67-5.95) overall (eFigure 4 in the Supplement), and no evidence of publication bias was found (eFigure 5 in the Supplement). The test performance was statistically significantly better for A/FLUS specimens (DOR, 5.67; 95% CI, 4.23-7.60) compared with FN specimens (DOR, 2.24; 95% CI, 1.45-3.47), as shown by the lack of overlap of their 95% CIs. The positive likelihood ratio was statistically significantly higher for A/FLUS compared with FN specimens (1.22 vs 1.08), and the negative likelihood ratio significantly lower (0.25 vs 0.51), without overlap of their 95% CIs. The summary receiver operator characteristic curve and area under the curve were evaluated to test the pooled diagnostic performance of the studies (eFigure 4 in the Supplement). The area under the curve from univariate model approach was 0.84 (95% CI, 0.79-0.89).
We derived the expected BCR-PPV correlation from the data of the initial validation study4 for all ITNs and separately for A/FLUS and FN specimens (Figure 3). For simplicity, the correlation was plotted separately within the PPV and BCR 95% CIs of the initial validation study. The overall correlation between BCR and PPV for independent studies fell outside the PPV 95% CI of the initial validation study (95% CI, 0.17-0.32) (Figure 3A) at the BCR of pooled independent studies (0.45) and was at the limit of the BCR 95% CI of the initial validation study (95% CI, 0.32-0.45) (Figure 3B) at the PPV of pooled independent studies (0.45). Most of the postmarketing studies had values above the expected BCR-PPV trajectory, which was calculated given the set of sensitivity and specificity reported in the initial study. For A/FLUS specimens, the overall BCR-PPV value fell outside both 95% CIs, and most studies also had values above the expected BCR-PPV curve. The overall independent BCR-PPV value of FN specimens, on the other hand, fell within both 95% CIs. Single studies had, however, much more heterogeneous results, with values above and below the expected BCR-PPV curve, and most studies also were outside the 95% CIs. Similar results were found for nodules in studies with or without a conflict of interest with Veracyte (eFigure 6 in the Supplement) and after excluding nodules with strict matching of histological diagnosis to the biopsied nodule (eFigure 7 in the Supplement).
Next, we derived the expected BCR and PPV values with 95% CIs (Figure 4). Shaded rectangles in the figure represent the empirical lower and upper limits of prevalence in which the observed BCR and PPV values lie within the 5th and 95th quantile of the simulated values. In the overall cohort of independent studies, the observed BCR and PPV values were both 45%. However, the prevalence rates of cancer that could explain these findings (in which the observed value would fall within the 5th and 95th quantile of the simulated values) were different: 15% for BCR and 30% for PPV. For A/FLUS specimens, the overall BCR was 49% and the PPV was 47%. Again, the prevalence rates of cancer that would explain these observations were different. Although the observed BCR values would be expected for an underlying prevalence of cancer between 5% and 10%, the observed PPV values would be expected for an underlying prevalence of cancer of 30%. The observed BCR value was 36% and the PPV value was 42% for FN specimens. In this cohort, the expected underlying prevalence of cancer that could explain the BCR value was between 35% and 45%, but the observed PPV value would be explained by a lower prevalence of cancer (30%). The simulations for the overall cohort, including the initial validation study, and for each postmarketing study with at least 35 nodules resected are presented in eFigures 8 to 11 in the Supplement.
This systematic review and meta-analysis summarized the available data on the Afirma GEC diagnostic performance, using what is to our knowledge a novel statistical validation approach to overcome the limitations of postmarketing studies. The findings demonstrate substantial differences in GEC’s diagnostic performance for A/FLUS and FN specimens, suggesting that the diagnostic performance in routine clinical practice differed significantly from that reported in the sole clinical validation study4 because such performance could not explain the observed outcomes in subsequent publications. Because of the lack of resection of GEC-benign nodules, neither the sensitivity or specificity nor the NPV of the GEC could be calculated from postmarketing studies. Results of this present study, however, indicate that the true sensitivity and/or specificity of the GEC differed significantly from those reported in the initial study, suggesting that the cohort of nodules included in the initial study was not representative of the populations to which the test has subsequently been applied. These findings are particularly relevant given the widespread use of the GEC; thus, a note of caution should be added to the clinical use and interpretation of the GEC results.
These findings support the emerging evidence that the diagnostic performance of molecular tests is not uniform across the heterogeneous spectrum of ITNs.14,29,33 As a result, differences in the proportion of A/FLUS and FN aspirates and in the proportion of cytological scenarios (cytological atypia, architectural atypia, or oncocytic features) in each series may be associated with the observed heterogeneity in the GEC’s diagnostic performance. Differences in the clinical application of the GEC also may be associated with this heterogeneity. In some studies, the GEC was used as a reflex test in all ITNs, but in others it was used at the discretion of the attending physician or even after a repeated indeterminate cytological diagnosis only.
Although one-third of the independent publications reported a conflict of interest with Veracyte, we did not find differences in diagnostic performance between studies with or without a conflict of interest. This finding is likely a consequence of calculating the GEC diagnostic performance in the same way in all studies, which relied exclusively on resected nodules with histological confirmation. What we believe is our novel approach to this analysis, which exploits the BCR-PPV correlation, is applicable to the evaluation of any diagnostic tests and may prove useful in the evaluation of other molecular marker studies applied to ITNs that may similarly lack appropriate independent validation studies. The limited number of studies for other molecular tests to diagnose ITNs precludes the conduct of similar analyses at this time.
The Afirma GEC is being replaced by a newer test offered by the same company (Genomic Sequencing Classifier [GSC]).34 The GSC was validated using the same cohort of patients as the GEC, which does not seem representative of the population in whom these tests are currently being applied. As a result, the estimated diagnostic performance of the GSC likely has the same limitations as the diagnostic performance of the GEC, and the real-life performance of the GSC may substantially differ from that recently reported.34 Nonetheless, we acknowledge that the method used to develop the GSC is substantially different from that used to develop the GEC, and this method might improve the diagnostic performance of the test, which may, at least partially, overcome the limitation of using a nonrepresentative validation cohort. Future studies will need to appropriately validate the GSC diagnostic performance.
In contrast to previous findings in oncogene panels,35,36 which had better diagnostic performance in FN compared with A/FLUS specimens, the GEC appeared to perform better among A/FLUS than FN specimens. A type II error could be the reason significant differences in NPV were not found in the postmarketing studies between A/FLUS and FN resected nodules (87% vs 73%), but the low resection rates of GEC-benign nodules (14% for both categories) significantly limited the interpretation of these results. Nonetheless, it seems likely that the GEC has a higher false-negative rate than expected, particularly for FN specimens. As a consequence, patients whose nodules remain unresected on the basis of a GEC-benign result should be monitored more closely and for longer periods than typically would be recommended for a thyroid nodule with a benign cytological diagnosis.
The Afirma GEC’s diagnostic performance differs significantly from that reported in the initial validation study. These differences are not explained simply by the differences in the underlying prevalence of cancer.37
This study has a few limitations. Most of the independent studies were retrospective, and the meta-analysis was conducted on study-level data, which limits the quality of the results. However, we found no evidence of publication bias. Many studies were published before the change in nomenclature for the encapsulated noninvasive follicular variant of papillary thyroid carcinoma.38 Thus, noninvasive follicular thyroid neoplasms with papillary-like nuclear features (NIFTPs) were analyzed as malignant to estimate the GEC’s diagnostic performance. A possible limitation of this approach is the low interobserver agreement for NIFTPs, which might affect the PPV reliability.39,40 However, this limitation seems to have had little implication for the current analysis given the consistent PPV prediction across the studies, which was below that on the initial validation study in all postmarketing studies. If NIFTPs were not considered malignant, different underlying prevalence of cancer (which could decrease by more than 40%)32 and test statistics would likely be found.41 Altogether, cautious interpretation of GEC-suspicious results should be made, particularly when considering the extent of surgical procedures as they are usually associated with either benign or low-risk cancer.
These findings suggest that the cohort of the initial validation study4 was not representative of the populations in whom the test has subsequently been applied, calling into question the diagnostic performance of the GEC, including its NPV, as reported in the initial study. Furthermore, our study’s findings suggest that the GEC’s diagnostic performance is inhomogeneous across different cytological subsets within the spectrum of indeterminate aspirates. Additional studies appear to be needed to better understand the true GEC’s diagnostic performance. Until such studies are available, the follow-up of unresected GEC-benign nodules should be more intense and prolonged than that recommended for thyroid nodules with benign cytological diagnosis.
Accepted for Publication: May 30, 2019.
Corresponding Author: Pablo Valderrabano, MD, PhD, Department of Endocrinology and Nutrition, Hospital Universitario Ramón y Cajal, Ctra Colmenar Viejo, km 9, 100, 28034 Madrid, Spain (email@example.com).
Published Online: July 18, 2019. doi:10.1001/jamaoto.2019.1449
Author Contributions: Dr Valderrabano had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Valderrabano, Hallanger-Johnson, Wang, McIver.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Valderrabano, Thapa.
Critical revision of the manuscript for important intellectual content: Hallanger-Johnson, Thapa, Wang, McIver.
Statistical analysis: Hallanger-Johnson, Thapa, Wang.
Administrative, technical, or material support: Hallanger-Johnson.
Supervision: Valderrabano, McIver.
Conflict of Interest Disclosures: Dr McIver reported personal fees from Sonic Healthcare USA and grants from GenePro-Dx outside of the submitted work. No other disclosures were reported.
Funding/Support: The statistical analysis in this study was supported in part by the Biostatistics Core Facility at the H. Lee Moffitt Cancer Center & Research Institute, an NCI-designated Comprehensive Cancer Center (P30-CA076292).
Role of the Funder/Sponsor: The Biostatistics Core Facility at the H. Lee Moffitt Cancer Center & Research Institute had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Additional Contributions: We thank all of the study authors who shared additional information from their published cohorts necessary for us to conduct the current study.
Create a personal account or sign in to: