There was a negative predictive value (NPV) of 96% (95% CI, 90-99) (A) and a positive predictive value (PPV) of 47% (95% CI, 36-58) (B) at a 24% cancer prevalence in the current Bethesda III and IV cohort. A 2016 meta-analysis33 reported prevalence of malignancy among Bethesda III and IV nodules as 17% (95% CI, 11-23) and 25% (95% CI, 20-29), respectively. Deriving PPV and NPV at 11% cancer prevalence yielded 98% NPV and 26% PPV, and deriving PPV and NPV at 29% cancer prevalence yielded 95% NPV and 54% PPV.
eMethods 1. Training cohort.
eMethods 2. Reference methods.
eMethods 3. RNA purification.
eMethods 4. Library preparation.
eMethods 5. Next-generation sequencing.
eMethods 6. RNA sequencing pipeline, feature extraction, and quality control.
eAppendix. Genomic sequence classifier to gene expression classifier comparison on a per-samples basis.
eTable 1. Blinding of the independent test set.
eTable 2. Composition of the core ensemble model training set.
eTable 3. Feature sets used in each classifier within the final ensemble model.
eTable 4. List of 1115 core genes deriving the ensemble model prediction.
eTable 5. Histology subtype comparison between validation cohorts.
eTable 6. Prevalence of malignancy between validation cohorts.
eTable 7. Performance comparison between the genomic sequence classifier and gene expression classifier.
eFigure 1. Afirma gene expression classifier system.
eFigure 2. Standards for Reporting of Diagnostic Accuracy Studies diagram of sample flow through the study.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Patel KN, Angell TE, Babiarz J, et al. Performance of a Genomic Sequencing Classifier for the Preoperative Diagnosis of Cytologically Indeterminate Thyroid Nodules. JAMA Surg. 2018;153(9):817–824. doi:10.1001/jamasurg.2018.1153
What is the performance of a genomic sequencing classifier in cytologically indeterminate thyroid nodules?
In this validation study of 183 patients with 191 cytologically indeterminate thyroid nodules, the genomic sequencing classifier was validated and compared with blinded expert histopathology diagnosis as well as the gene expression classifier. The genomic sequencing classifier had a sensitivity of 91% and a specificity of 68%.
The genomic sequencing classifier accurately classified more patients with indeterminate thyroid cytology as benign than its predecessor, the gene expression classifier.
Use of next-generation sequencing of RNA and machine learning algorithms can classify the risk of malignancy in cytologically indeterminate thyroid nodules to limit unnecessary diagnostic surgery.
To measure the performance of a genomic sequencing classifier for cytologically indeterminate thyroid nodules.
Design, Setting, and Participants
A blinded validation study was conducted on a set of cytologically indeterminate thyroid nodules collected by fine-needle aspiration biopsy between June 2009 and December 2010 from 49 academic and community centers in the United States. All patients underwent surgery without genomic information and were assigned a histopathology diagnosis by an expert panel blinded to all genomic information. There were 210 potentially eligible thyroid biopsy samples with Bethesda III or IV indeterminate cytopathology that constituted a cohort previously used to validate the gene expression classifier. Of these, 191 samples (91.0%) had adequate residual RNA for validation of the genomic sequencing classifier. Algorithm development and independent validation occurred between August 2016 and May 2017.
Thyroid nodule surgical histopathology diagnosis by an expert panel blinded to all genomic data.
Main Outcomes and Measures
The primary end point was measurement of genomic sequencing classifier sensitivity, specificity, and negative and positive predictive values in biopsies from Bethesda III and IV nodules. The secondary end point was measurement of classifier performance in biopsies from Bethesda II, V, and VI nodules.
Of the 183 included patients, 142 (77.6%) were women, and the mean (range) age was 51.7 (22.0-85.0) years. The genomic sequencing classifier had a sensitivity of 91% (95% CI, 79-98) and a specificity of 68% (95% CI, 60-76). At 24% cancer prevalence, the negative predictive value was 96% (95% CI, 90-99) and the positive predictive value was 47% (95% CI, 36-58).
Conclusions and Relevance
The genomic sequencing classifier demonstrates high sensitivity and accuracy for identifying benign nodules. Its 36% increase in specificity compared with the gene expression classifier potentially increases the number of patients with benign nodules who can safely avoid unnecessary diagnostic surgery.
Thyroid cancer incidence has increased substantially in the United States in recent decades, with evidence to support both an increase in detection1 and a true increase in occurrence.2 Thyroid nodules are palpable in 5% of adults3 and are visualized with contemporary imaging in more than one-third of adults.3-5 Malignancy is present in only 5% to 15% of all thyroid nodules,3,5-7 and definitive diagnosis is achieved by surgical histopathology on resected tissue. Unfortunately, thyroid surgery is associated with discomfort, scarring, inconvenience, direct and indirect costs, potential lifelong medication, and occasional surgical complications.8,9 Efforts to exclude cancer with clinical assessment alone are admittedly imperfect,5 and laboratory testing of serum thyroid-stimulating hormone levels and thyroid imaging with radionuclides or ultrasonography identify benignity with high confidence in only 4% to 26% of nodules.10-13 Forty years ago, the application of cytology to thyroid nodule specimens obtained by fine-needle aspiration (FNA) biopsy had a substantial effect on patient management by reducing surgery by one-half and doubling the proportion of cancer among patients who underwent surgery.3,5 However, approximately one-third of thyroid nodule cytology findings today are cytologically indeterminate,14,15 with estimated risks of malignancy ranging from 5% to 30%.16 Consequently, approximately three-quarters of patients with cytologically indeterminate thyroid nodules have been referred for surgery,17,18 even though 80% ultimately prove to have benign nodules.15,16,18
The practice of using preoperative genomic information for thyroid nodule differential diagnosis is more than a decade old, and several commercial and noncommercial genomic approaches are currently available.19 Performance data from blinded prospective multicenter validation trials are limited and include the gene expression classifier (GEC), in which a machine learning–derived classification algorithm uses messenger RNA transcript expression levels to categorize cytologically indeterminate FNAs as either benign or suspicious.20 Altered messenger RNA expression can occur for several reasons, including complex upstream interactions that occur because of sequence changes in key core genes or in relevant peripheral genes,21 the effect of epigenetic changes that occur without DNA sequence alterations, and both internal and external modifiers, such as inflammation and lifestyle or environment.22,23 In a cohort with a 24% prevalence of malignancy, the GEC accurately identified 90% of malignancies (ie, sensitivity) and 52% of benign nodules (ie, specificity) with indeterminate Bethesda III or IV cytology.20 It intentionally favored high sensitivity over specificity to ensure the accuracy and safety of a benign genomic result. A test with improved specificity for identification of benign nodules and maintained high sensitivity for malignancy detection could spare even more patients from surgery with an accurate benign genomic result (negative predictive value [NPV]) and increase the cancer yield among those with a suspicious result (positive predictive value [PPV]).
Enhanced technologies for characterizing genomic information, including improved methods for the measurement of RNA transcriptome expression and sequencing of nuclear and mitochondrial RNAs, measurement changes in genomic copy number, including loss of heterozygosity, and the development of enhanced bioinformatics and machine learning strategies, have created the opportunity to develop a new, more robust genomic test. This study describes the blinded clinical validation24 of the novel genomic sequence classifier (GSC) on a prospective multicenter–derived set of patients with FNA samples whose referral to surgery and histopathological diagnosis were determined in the absence of genomic information.
The study was approved by institution-specific institutional review boards as well as by Liberty IRB (DeLand, Florida; now Chesapeake IRB) and Copernicus Group Independent Review Board (Cary, North Carolina). All patients provided written informed consent prior to participating in the study. The training cohort is described in eMethods 1 in the Supplement.
Dedicated thyroid nodule FNA specimens and surgical histopathology from nodules 1 cm or larger were collected using a prospective and blinded protocol at 49 academic and community centers in the United States from patients 21 years or older. These samples, stored at −80°C, were previously used to validate the GEC. The details of their enrollment and prespecified inclusion and exclusion criteria have been reported elsewhere.20 Histopathology diagnoses were previously established by an expert panel of thyroid surgical histopathologists that were blinded to all clinical and molecular data.20BRAF V600E DNA mutational reference status was established by testing DNA from all samples with the competitive allele-specific TaqMan polymerase chain reaction, as described in eMethods 2 in the Supplement. This independent validation cohort was prespecified and divided into a primary test set comprised of all patients with Bethesda III and IV samples described in the clinical validation of the Afirma GEC20 with sufficient RNA remaining and a secondary test set comprised of all patients with Bethesda II, V, or VI samples described in the clinical validation of the Afirma GEC20 with sufficient RNA remaining and not randomly assigned to the training set, as described in eMethods 1 in the Supplement.
The following steps were implemented to ensure the independent test set was securely blinded throughout algorithm development and validation (eTable 1 in the Supplement). First, each step was documented in a prespecified protocol and time-stamped on execution. Each team member was assigned a single role and allowed access only to information designated for that role. A randomly generated blinded identification number was assigned to each sample in the validation set by information technology engineers who operated independently of all other teams to ensure that all other personnel were unable to link clinical and genomic data. All historic information that could potentially reveal the clinical label on the independent test set was secured in a password-protected folder prior to the start of algorithm development. Information technology engineers conducted performance testing of the validation test set independently of all other teams. RNA purification, library preparation, next-generation sequencing, RNA sequencing pipeline, feature extraction, and quality control methods are described in eMethods 3-6 in the Supplement.
Fine-needle aspiration samples (n = 634) were used to build the GSC core ensemble model, as described in eMethods 1 and eTable 2 in the Supplement. The ensemble model consists of 12 independent classifiers: 6 are elastic net logistic regression models25 and 6 are support vector machines.26 The 6 models within each category differ from each other according to the gene sets used (eTable 3 in the Supplement).
To minimize overfitting and to accurately reflect classifier performance incorporating random noise, hyperparameter tuning and model selections were performed using repeated nested cross-validation.27 Hyperparameter tuning was performed within the inner layer of the cross-validation, and the classifier performance was summarized using the outer layer of the 5-fold cross-validation repeated 40 times. For each classifier, the decision boundary was chosen to optimize specificity, with a minimum requirement of 90% sensitivity to detect malignancy.
The locked ensemble model uses a total of 10 196 genes, among which are 1115 core genes (eTable 4 in the Supplement). These core genes drive the prediction behavior of the model, and the remaining genes improve classifier stability against assay variability.
In addition to the ensemble model described above, the Afirma GSC system includes 7 other components: a parathyroid cassette, a medullary thyroid cancer (MTC) cassette, a BRAF V600E cassette, RET/PTC1 and RET/PTC3 fusion detection modules, follicular content index, Hürthle cell index, and Hürthle neoplasm index. The first 4 are upstream of the ensemble classifier, targeting specific and rare patient subgroups (eFigure 1 in the Supplement). The last 3 (the follicular content index, Hürthle cell index, and the Hürthle neoplasm index) were developed to further improve the benign vs suspicious classification performance. They were incorporated with the ensemble classifier to form the core benign vs suspicious classifier engine.
Statistical analyses were performed using R statistical software version 3.2.3 (https://www.r-project.org). Continuous variables were compared using t test, and categorical variables were compared using Fisher exact test. We evaluated test performance using sensitivity, specificity, and NPV and PPV based on established methods.28 All confidence intervals are 2-sided 95% CIs and were computed using the exact binomial test.29 Test performance comparison between the GSC and GEC was done using McNemar χ2 test on the matched data set.30 Significance level in differential gene expression analysis is reported using a false discovery rate–adjusted P value.31 Two-sided P values less than .05 were used to declare significance.
We used the FNA samples that previously validated the GEC20 to independently validate the GSC. The earlier GEC validation samples were derived from 4812 nodule aspirations prospectively collected from 3789 patients at 49 clinical sites in the United States over a 2-year period.20 Of the 210 validation samples with corresponding Bethesda III or IV cytology and blinded postoperative consensus histopathology diagnoses, 191 (91.0%) had sufficient residual RNA for GSC testing. These samples from cytologically indeterminate nodules constituted the blinded primary test set.
The previously established thyroid nodule cytological diagnosis was used again.20 Patient demographic characteristics and baseline data are shown in Table 1. Age, sex, clinical risk factors, nodule size, histology subtype (eTable 5 in the Supplement), number of FNA passes, prevalence of malignancy (eTable 6 in the Supplement), and proportion of samples collected at community centers did not differ significantly between the primary study population (n = 191) and the GEC clinical validation cohort of samples (n = 210), consistent with unbiased drop out.
The Standards for Reporting of Diagnostic Accuracy Studies was developed to improve the quality of reporting diagnostic accuracy studies.32 eFigure 2 in the Supplement shows the flow of samples through the study in a Standards for Reporting of Diagnostic Accuracy Studies diagram. Of these 191 indeterminate FNAs, 46 (24.1%) were diagnosed as malignant by an expert surgical histopathology panel who were blinded to all cytologic and genomic results and to the local histopathology diagnosis. Results are reported in the order of testing through the GSC test system (eFigure 1 in the Supplement). Initially, all GSC samples are tested for RNA quantity and quality. None of the 191 samples failed. Subsequently, the GSC aimed to identify nodules composed of parathyroid tissue, those with MTC, and those with a BRAF V600E mutation or RET/PTC1 or RET/PTC3 fusion. Samples testing positive for these are included in performance calculations described below, except for samples testing positive for parathyroid tissue, as this result does not indicate a benign or malignant etiology. Among the 191 samples, positive results for parathyroid, MTC, BRAF, and RET/PTC occurred in 0, 1, 3, and 0 samples, respectively. All MTC and BRAF V600E results were concordant with reference methods (eMethods 2 in the Supplement). After this testing, samples were evaluated for follicular cell content by the follicular content index classifier. One sample, negative for the above results, was deemed to have inadequate follicular content and therefore was assigned no result. This sample was excluded from subsequent analyses, leaving 190 samples. Table 2 summarizes clinical performance characteristics for Bethesda III and IV nodules.
The GSC correctly identified 41 of 45 malignant samples as suspicious, yielding a sensitivity of 91.1% (95% CI, 79-98), and 99 of 145 nonmalignant samples were correctly identified as benign by the GSC, yielding a specificity of 68.3% (95% CI, 6076). Among Bethesda III and IV samples, the NPV was 96.1% (95% CI, 90-99) and the PPV was 47.1% (95% CI, 36-58). Performance of the GSC was similar between Bethesda III and IV categories (Table 2).
Among the 190 Bethesda III and IV samples, 17 (8.9%) were histologically Hürthle cell adenomas and 9 (4.7%) were Hürthle cell carcinomas, while 164 samples (86.3%) were histologically non-Hürthle. For samples with Hürthle histology, the sensitivity was 88.9% (95% CI, 52-100) and the specificity was 58.8% (95% CI, 33-82). For samples with non-Hürthle histology, the sensitivity was 91.7% (95% CI, 78-98) and the specificity was 69.5% (95% CI, 61-77).
A wide variety of malignant subtypes were correctly classified as suspicious (Table 3). Four false-negative cases occurred (Table 4). We assessed whether patient age or sex, malignancy subtype, or nodule size by ultrasonography or on histopathology were associated with false-negative cases, and none were. Comparisons of GSC to GEC results on a per-sample basis are reported in the eAppendix in the Supplement. The performance of the GSC in secondary analyses of nodules with Bethesda II, V, or VI cytopathology are reported in Table 2. Among the entire secondary analysis group, the GSC sensitivity was 100% (95% CI, 90-100) and the specificity was 73.1% (95% CI, 52-88).
A 2016 meta-analysis33 reported the risks of malignancy among Bethesda III and IV thyroid nodules to be 17% (95% CI, 11-23) and 25% (95% CI, 20-29), respectively. To safely avoid unnecessary diagnostic surgery among these cytologically indeterminate nodules, a test with a high sensitivity and NPV for malignancy is required. This blinded clinical validation of the GSC in a prospectively collected, representative, universally operated, and histopathologically diagnosed cohort demonstrates the required high NPV across these ranges of cancer prevalence encountered in Bethesda III and IV nodules in clinical practice (Figure). To independently validate the GSC, we implemented a set of strict blinding and deidentification protocols that enabled us to use the same FNA samples previously used to validate the GEC.20 Use of these samples allowed testing of complete and representative sets of nodules with corresponding surgical histology unaffected by the current widespread use of molecular testing to avoid or encourage surgery.
Test sensitivity of the GSC (91%; 95% CI, 79-98) compared with the GEC (89%; 95% CI, 76-96) was maintained, with the point estimate within the counterpart’s 95% CI, and the McNemar χ2 test (df = 1) on the matched sample set renders a test statistic of 0 (P > .99). On the other hand, test specificity of the GSC (68%; 95% CI, 60-76) was significantly improved from the GEC (50%; 95% CI, 42-59), with the point estimate outside the counterpart’s 95% CI, and the McNemar χ2 test (df = 1) on the matched sample set renders a test statistic of 16.447 (P < .001) (eTable 7 in the Supplement). In practice, this enhanced performance suggests that among Bethesda III and IV nodules that are histopathologically benign, at least one-third more will receive a benign result using the GSC compared with the GEC. At a cancer prevalence of 24%, more than half of tested patients are projected to receive a GSC benign result, and among GSC suspicious nodules, nearly half are anticipated to have cancer on surgical histology. This increased benign call rate is expected to result in more patients being assigned to active observation as opposed to diagnostic surgery. Given the high cost of surgery in the United States among Medicare and private payers,34 the increased avoidance of diagnostic surgery because of GSC benign results is expected to further improve cost-effectiveness and reduce surgical complications.8,9
While genomic data has been incorporated in clinical management decisions of multiple medical conditions for more than a decade, progress continues toward understanding the complexities of genomic and nongenomic pathways in the development and behavior of disease. Current evidence suggests that most common diseases are associated with small effects from a large number of genes and that most of these contributions are derived from transcriptionally active portions of the genome.21 This implies that diseases such as thyroid cancer are unlikely to be accounted for by the effects of a small number of genes. The fact that few genomic variants are associated with 100% penetrance toward malignant histology suggests that a complex interaction of multiple factors ultimately determines the benign or malignant nature of thyroid nodules.22,23 As the number of these factors expands, it becomes critical to use machine learning and statistical models to interpret their signals in a trained model to derive an accurate diagnosis.
Hürthle lesions exemplify the challenges inherent in complex biology and the opportunity to harness high-dimensional genomic data for predictive model training and subsequent validation. Most Hürthle cell–dominant Bethesda III and IV thyroid nodules have historically undergone surgery given the potential for Hürthle cell carcinoma, yet most have proven to be histologically benign. The GEC identified these samples at a high NPV, but most were categorized as GEC suspicious.35 We sought to maintain a high NPV while providing more benign results by including 2 dedicated classifiers to work with the core GSC classifier. Among the 26 Hürthle cell adenomas or Hürthle cell carcinomas reported here, the final GSC sensitivity was 88.9% and the specificity was 58.8%; the GEC sensitivity was 88.9% and the specificity was 11.8% among these same neoplasms. Thus, while the overall GSC sensitivity of 91.1% reported here is comparable with that of the GEC (by design), the improved overall GSC specificity of 68.3% results from significantly improved performances among both Hürthle and non-Hürthle specimen types. Given that most histologically benign Hürthle and non-Hürthle specimens are now both identified as GSC benign, GSC testing may further safely reduce unnecessary surgery among both specimen types.
Recently, the histological diagnosis of noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP) was recognized as a biologically distinct entity with a low risk of malignant behavior following surgical excision, which remains the currently recommended treatment.36 These lesions were previously described as encapsulated noninvasive follicular variant of papillary thyroid cancer.37 No NIFTP histopathology diagnoses were available in this independent validation cohort, as it was collected prior to the establishment of this diagnostic category. However, subsequent studies38-40 have suggested a high rate of GEC suspicious results among NIFTP cases. The GSC was trained to identify NIFTP cases as suspicious. While removal of NIFTP from the malignant category would reduce the prevalence of cancers among cytological categories and alter the anticipated PPV of GSC tested cases, this exercise would not be clinically meaningful since the goal of a positive GSC test is to identify all thyroid nodules that warrant surgery, which currently remains necessary for NIFTP.
We performed a secondary analysis of 61 Bethesda II, V, or VI samples that also were included in the GEC validation study (Table 2).20 While performance of a genomic test among these more definitive cytology categories may not predict performance of the test within the Bethesda III and IV categories, the consistency of these performance metrics is reassuring and supportive of the findings in the primary analysis.
Limitations of this study include the lack of performance data among children and data on when the nodule had been previously biopsied or when sample collection methods other than 1 or 2 dedicated FNA passes were used. Another potential limitation is that the prevalence of cancer in this study was toward the higher end of the expected range among Bethesda III and IV nodules, as seen in the Figure. It is possible that a cytologically indeterminate cohort with a significantly lower prevalence of cancer may contain more benign nodules that are easier for the GSC to classify, as seen in Table 2 among nodules with Bethesda II cytopathology. Should that happen, an effectively higher test specificity may occur.
The current trend in thyroid nodule and cancer management is more conservative, with physicians more aware of the burden of unnecessary thyroid surgery35 and the indolent behavior of most thyroid malignancies confined to the thyroid.41-44 Current US guidelines indicate that molecular testing may be used among Bethesda III and IV nodules to add additional information about the nodule’s risk of malignancy, which, along with patient preference, may guide clinical decision-making.7,45 This study demonstrates high test sensitivity and NPV among Bethesda III and IV cytologically indeterminate thyroid nodules across a broad range of nodule sizes (Table 1). As an adjunct to clinical judgment, the GSC is expected to reduce unnecessary diagnostic surgery, improve patient safety, reduce health care costs, and improve patient quality of life.
Accepted for Publication: February 25, 2018.
Corresponding Author: Kepal N. Patel, MD, Division of Endocrine Surgery, Department of Surgery, New York University Langone Medical Center, 530 First Ave, Ste 6H, New York, NY 10016 (email@example.com).
Published Online: May 23, 2018. doi:10.1001/jamasurg.2018.1153
Author Contributions: Drs Kennedy and Ladenson had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Babiarz, Barth, Harrell, Huang, Kennedy, Kloos, LiVolsi, Randolph, Shanik, Walsh, Whitney, Ladenson.
Acquisition, analysis, or interpretation of data: Patel, Angell, Babiarz, Barth, Blevins, Duh, Ghossein, Huang, Kennedy, Kim, Kloos, Randolph, Sadow, Shanik, Sosa, Traweek, Walsh, Whitney, Yeh, Ladenson.
Drafting of the manuscript: Patel, Babiarz, Barth, Harrell, Huang, Kennedy, Kim, Kloos, Randolph, Shanik, Ladenson.
Critical revision of the manuscript for important intellectual content: Patel, Angell, Babiarz, Barth, Blevins, Duh, Ghossein, Huang, Kennedy, Kim, Kloos, LiVolsi, Randolph, Sadow, Shanik, Sosa, Traweek, Walsh, Whitney, Yeh, Ladenson.
Statistical analysis: Barth, Huang, Kennedy, Kim.
Obtained funding: Kennedy.
Administrative, technical, or material support: Babiarz, Barth, Kennedy, Kloos, Randolph, Sadow, Traweek, Whitney.
Study supervision: Patel, Angell, Barth, Kennedy, Kloos, Randolph, Sosa, Walsh.
Conflict of Interest Disclosures: Drs Patel, Blevins, Shanik, and Ladenson have received speaker’s honoraria from Veracyte Inc. Drs Ghossein, LiVolsi, Sadow, and Ladenson serve as consultants for Veracyte Inc. Drs Blevins, Shanik, and Ladenson have received institutional research support from Veracyte Inc. Drs Babiarz, Barth, Huang, Kennedy, Kim, Kloos, and Whitney and Mr Walsh are employees of Veracyte Inc. Drs Babiarz, Barth, Huang, Kennedy, Kim, Kloos, Traweek, and Whitney and Mr Walsh own equity in Veracyte Inc. Dr Sosa is a member of the American Thyroid Association Data Monitoring Committee of the Medullary Thyroid Cancer Consortium, which is supported by GlaxoSmithKline, Novo Nordisk, AstraZeneca, and Eli Lilly. No other disclosures were reported.
Funding/Support: This study was funded by Veracyte Inc.
Role of the Funder/Sponsor: Veracyte Inc drafted the study design and oversaw the data collection, management, and initial analysis. Veracyte Inc had no role in data interpretation; preparation, review, and approval of the manuscript; and the decision to submit the manuscript.
Meeting Presentation: Summary findings from this study were presented as an abstract and oral presentation at the Third World Congress on Thyroid Cancer; July 27-30, 2017; Boston, Massachusetts.
Additional Contributions: We thank the many investigators and patients who provided the fine-needle aspiration samples used here for training and in the independent test set.