Box plot of TN size distributions for each Bethesda classification. The solid line within each box represents the median. The box encompasses the 25th to 75th percentiles of TN size within each Bethesda class. The whiskers represent the most extreme data point no further than 1.5 times the length of the box away from the 25th or 75th percentile. The empty circles represent the remaining TNs not included within the box and/or whisker ranges.
A, Probability of malignant disease as a function of thyroid nodule (TN) size based on a logistic regression model. The calculated probability (blue line) is presented with 95% CIs (gray shaded area). The TN size (in centimeters) is shown on the x-axis (wide/dark lines represent multiple values). Bethesda class 1 TNs were excluded from the analysis. B, Probability of malignant disease as a function of Bethesda classification and TN size based on a logistic regression model. Bethesda class 1 nodules were excluded from the analysis. The figure was truncated at 6.0 cm owing to a small number of TNs greater than 6.0 cm.
Magister MJ, Chaikhoutdinov I, Schaefer E, Williams N, Saunders B, Goldenberg D. Association of Thyroid Nodule Size and Bethesda Class With Rate of Malignant Disease. JAMA Otolaryngol Head Neck Surg. 2015;141(12):1089-1095. doi:10.1001/jamaoto.2015.1451
The ability to accurately stratify patients with thyroid nodules (TNs) preoperatively is imperative because most TNs are benign. The reliability of fine-needle aspiration biopsy (FNAB) in large TNs has been questioned in recent literature.
To determine whether TN size affects the reliability of FNAB results, and to determine the rates of malignant disease of each Bethesda class at Penn State Medical Center.
Design, Setting, and Participants
Retrospective electronic medical record review of patients undergoing FNAB followed by thyroidectomy from March 2010 through December 2013 at an academic, tertiary referral center. A total of 297 patients with 326 TNs were identified as part of a consecutive series.
Main Outcomes and Measures
The primary outcome was to determine the rate of malignant disease of TNs smaller than 3.0 cm or 3.0 cm or larger and of each Bethesda class. Statistical analysis included χ2 tests. The secondary outcome was to develop logistic regression models to estimate the probability of malignant disease on final pathologic diagnosis as predicted by TN size as well as TN size in conjunction with Bethesda class.
Of the 297 patients, 233 were female (78.4%). The mean (SD) age was 51.0 (15.4) years. Of the 326 TNs, 143 were malignant on surgical histopathologic analysis (43.7%). The mean TN size was 2.0 (1.4) cm. Rates of malignant disease for Bethesda classes 1 to 6 were 0% (95% CI, 0%-26.0%), 6.0% (95% CI, 1.7%-14.6%), 30.2% (95% CI, 18.3%-44.3%), 23.5% (95% CI, 14.8%-34.2%), 72.4% (95% CI, 52.8%-87.3%), and 98.8% (95% CI, 93.5%-99.9%), respectively. Overall sensitivity and specificity (excluding class 1 TNs) were 97.2% and 36.8%, respectively. The false-negative rate of benign cytologic results was 6.0% (95% CI, 1.7%-14.6%); only 1 false-negative result occurred in TNs 3.0 cm or greater. Of the TNs smaller than 3.0 cm, 48.4% were malignant compared with 33.3% of TNs 3.0 cm or greater (P = .049). Both Bethesda class and TN size were significant variables (P < .05) within our logistic regression models indicating that higher Bethesda class and TN size smaller than about 2.0 cm were associated with increased probabilities of malignant disease.
Conclusions and Relevance
Our results suggest that smaller TNs (smaller than about 2.0 cm) are associated with increased probabilities of malignant disease irrespective of Bethesda class. Routine diagnostic thyroid lobectomy solely owing to TN size of 3.0 cm or greater need not be performed.
The prevalence of thyroid nodules (TNs) increases linearly with age1 and has been found to be as high as 67%.2 Most TNs are benign, with only approximately 1 in 20 clinically identified nodules proving to be malignant.3 Although the likelihood of a given TN to be malignant is low, the incidence of thyroid cancer has been steadily rising in recent years, with annual increases of 5.3% and 4.5% for men and women, respectively.4 In 2015, there will be an estimated 62 450 new cases of thyroid cancer with an associated 1950 related deaths.4
Owing to its efficiency, reliability, and cost-effectiveness, fine-needle aspiration biopsy (FNAB) has become the diagnostic modality of choice for preoperatively evaluating TNs.3,5- 7 In recent years, the regular use of ultrasonography (US) to aid in FNAB has served to increase the accuracy of the procedure and decrease potential sampling errors, especially in small and impalpable TNs.8- 11 At experienced centers, FNAB has significantly reduced the total number of thyroidectomies and lowered associated health care costs.1 Historically, the terminology used to describe TN FNAB cytology was extremely variable but largely classified TNs as nondiagnostic, benign, malignant, or indeterminate.12,13 In 2009, the National Cancer Institute12 set forth a standardized method of classifying TN FNAB cytologic findings into 1 of 6 diagnostic categories generally known today as the Bethesda system for reporting thyroid cytopathologic results (TBSRTC). Since its inception, the TBSRTC has proven to be a reliable and reproducible method of stratifying TNs.14
Surgical treatment of TN hinges on FNAB and resultant cytologic findings. Previously, it was asserted that large TNs (≥3.0-4.0 cm) have an increased false-negative rate on FNAB, as high as 30%, compared with smaller nodules.15- 20 Based on these reports, some clinicians have proposed preforming diagnostic thyroid lobectomy based solely on TN size.15- 20 Conversely, some studies21- 27 suggest that increasing TN size does not adversely affect the accuracy and predictive value of FNAB. In this study, we sought to investigate the accuracy and predictive value of FNAB on TNs at our institution following the application of TBSRTC, specifically as it relates to the size of the TN. In addition, the rates of malignant disease ultimately associated with the cytopathologic characteristics of each Bethesda class were investigated.
Institutional review board approval from The Pennsylvania State University was obtained for this study. A retrospective medical record review of patients evaluated for TNs at the Penn State Hershey Medical Center (PSHMC) was performed from March 2010 (date of implementation of TBSRTC at our institution) through December 2013. Patients were evaluated in either the otolaryngology–head and neck surgery clinic or endocrine surgery clinic. A total of 326 discrete nodules in 297 patients were identified by way of a systematic review of the electronic medical record by cross-referencing FNAB Current Procedural Terminology (CPT)-4 code 10022 with operative CPT-4 codes 60200 to 60271. To meet inclusion criteria, patients had to be older than 18 years and have undergone preoperative FNAB followed by thyroidectomy or hemithyroidectomy. Exclusion criteria included pregnancy and/or incarceration at the time of surgery.
Information extracted from the electronic medical record included patient demographics, history of radiation exposure, cytologic classification based on TBSRTC, number and location of nodules, and final histopathologic diagnosis. Nodule size was determined as the largest TN dimension at the time of pathologic assessment of the thyroid specimen. Primary nodules were defined as the index TNs that underwent US-guided FNAB. Thyroid nodules of the same patient were treated independently when clear distinction between nodules was found both preoperatively and on final pathologic diagnosis. Papillary microcarcinomas (malignant lesions <1.0 cm) were considered malignant in the final analysis only when the microcarcinoma occurred within the index TN. Careful attention was given to ensure that all documented pathologic abnormalities, both benign and malignant, occurred within the index TN. In instances in which a clear correlation could not be made, the TN was excluded from the analysis. In cases in which multiple FNAB results were found for the same TN, insufficient aspirates were disregarded when followed by a sufficient study. Furthermore, if 2 sufficient studies were performed on the same TN with differing cytologic findings, the TN was classified as having the more suspicious of the 2 cytologic result.
More than 92% of the FNABs were performed by 1 of 11 board-certified radiologists at PSHMC with the remainder being performed by 1 of 2 board-certified endocrine or head and neck surgeons. All FNABs were performed using US guidance to directly visualize the needle tip within the nodule of interest; FNAB was performed primarily with 27-gauge needles and capillary action without a syringe or aspirator. The needle was attached to an air-filled syringe to express the specimen on a glass slide. A second glass slide was used to smear the specimen into a thin layer, creating 2 direct smears from each pass. One smear was immediately submerged in alcohol for Papanicolaou stain. The other was air dried for Diff-Quik stain. The needles were rinsed in a methanol-based fixative, and a monolayer slide was made from the sediment. Each FNAB consisted of 3 to 5 passes. The FNABs were evaluated by 1 of 5 board-certified pathologists (one of whom was N.W.) with more than 5 years of cytopathology experience at PSHMC. Eleven of the 326 TNs included in this study underwent FNAB at another facility; all of these FNABs were reviewed and confirmed by cytopathologists within our institution.
Cytologic evaluation was performed with strict adherence to current diagnostic criteria and reported using TBSRTC, assigning TNs to 1 of 6 diagnostic categories: nondiagnostic (class 1), benign (class 2), atypia of undetermined significance (AUS)/follicular lesion of undetermined significance (FLUS) (class 3), suspicious for follicular neoplasm/follicular neoplasm (class 4), suspicious for malignant disease (class 5), and malignant (class 6).12
The 12 class 1 TNs in our data set were reported in the summary tables but were excluded when determining FNAB test characteristics and other statistical analysis. In calculating FNAB sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), only a class 2 cytologic finding was considered a negative test result; class 3 to 6 cytologic findings were considered positive test results. The false-negative rate was defined as a TN having a benign FNAB result in which the nodule was later found to harbor malignant disease on surgical histopathologic examination. Surgical histopathologic results were considered the reference criteria for defining malignant disease in any given TN. Exact binominal 95% CIs were calculated for sensitivity, specificity, PPV, NPV, and all other percentages.
We used logistic regression to model TN malignant disease as a function of Bethesda class and nodule size. Two models were developed. The first model included only TN size. The second model included both TN size and Bethesda class. A natural spline with 3 df28 was used to estimate a nonlinear relationship between TN size and the log odds of malignant disease in each model. For the second model, an interaction between TN size and Bethesda class was investigated but ultimately dropped from the model owing to a nonsignificant result (P = .88). Results of the logistic regression models were reported using predicted probabilities. For each model, we also calculated the concordance index (C-index) as a measure of predictive ability. The C-index is calculated as the proportion of all pairs of TNs with different outcomes (benign vs malignant) in which the TN with higher predicted probability of malignant outcome was indeed the TN with a malignant outcome.
After excluding patients with a primary TN smaller than 0.5 cm27 (3), and those who did not have a preoperative FNAB performed and/or reviewed within our institution (15), 297 patients with 326 distinct primary TNs who met the inclusion criteria for this study remained for analysis. Our sample comprised 233 women and 64 men, a ratio of 3.6:1.0. The mean (SD) age was 51.0 (15.4) years (range, 15-89 years) (Table 1).
Of the 326 TNs, 12 (3.7%) were class 1, 67 (20.6%) were class 2, 53 (16.3%) were class 3, 81 (24.8%) were class 4, 29 (8.9%) were class 5, and 84 (25.8%) were class 6. Thyroid nodules ranged in size from 0.5 to 8.8 cm with a mean (SD) size of 2.0 (1.4) cm (Table 1). There was a total of 60 TNs smaller than 1.0 cm; 130 TNs were 1.0 to 1.9 cm, 73 were 2.0 to 2.9 cm, 31 were 3.0 to 3.9 cm, and 32 were 4.0 cm or greater. Age and sex were generally similar among patients in the Bethesda classes (Table 2).
The overall rate of malignant disease as determined by surgical histopathologic examination was 43.9% (143 of 326). Rates of malignant disease by Bethesda class were 0%, 6.0%, 30.2%, 23.5%, 72.4%, and 98.8% for patients in Bethesda classes 1 to 6, respectively (Table 3). These were generally within the expected ranges as prescribed by TBSRTC with the exception of class 3, which had roughly 2 times as many malignant TNs. The distributions of TN size among Bethesda classes showed only small variations (Figure 1). The class 2 TNs, however, did have a slightly higher median nodule size (just over 2.0 cm) and a slightly more skewed distribution. Papillary thyroid carcinoma (PTC) was the most common overall type of malignant disease followed by PTC, follicular variant (Table 3).
For only 59.8% of TNs was there specific mention as to the presence or absence of a history of radiation exposure. Of these, 23 TNs were associated with positive history of radiation exposure. Radiation exposure was not a statistically significant confounder when TNs were stratified by Bethesda class (P = .94), TN size (P = .20), or histopathologically diagnosed malignant disease (P = .83).
After excluding class 1 TNs and considering benign FNAB cytologic findings as a negative test result and higher Bethesda classes (3-6) as a positive test result, the sensitivity and specificity were 97.2% (95% CI, 93.0%-99.2%) and 36.8% (95% CI, 29.6%-44.5%), respectively. The PPV was 56.3% (95% CI, 49.8%-62.6%), and the NPV was 94.0% (95% CI, 85.4%-98.3%).Our data indicate a low false-negative rate of 6.0% (4 of 67), with only 1 false-negative occurring in TNs 3.0 cm or greater. Likewise, when considering only TNs with malignant FNAB, our data indicate that only 1 TN (1.2% [95% CI, 0%-6.5%]) proved to be falsely malignant on cytologic examination.
Thyroid nodules smaller than 3.0 cm accounted for 80.7% of our data set. When stratifying only by TN size, nodules smaller than 3.0 cm had a 48.4% (95% CI, 42.1%-54.8%) rate of malignant disease compared with a 33.3% (95% CI, 21.7%-46.7%) rate of malignant disease observed in nodules 3.0 cm or greater. Although the difference is statistically significant (P = .049), this analysis required an arbitrary cutoff of 3.0 cm, which was chosen based on previously reported literature.15,17,22,23,26 As an alternative, we used the actual TN size in a logistic regression model and observed that smaller TNs (<1.6 to 2.0 cm) showed an increased probability of malignant disease (P < .001). As Figure 2A shows, at approximately 2.0 cm the slope of the estimated probability changes. The fitted model indicates that although the overall probability of malignant disease continued to decrease as TN size increased, the change was less dramatic for TNs greater 2.0 cm. The C-index of this model was 0.64.
Stratifying our data set by both TN size and Bethesda class reduced some of the subclassifications to very small numbers (ie, we observed only 4 Bethesda class 5 TNs ≥ 3.0 cm). Because of this, a second logistic regression model was used to estimate the probability of malignant disease as a function of both Bethesda class and nodule size (Figure 2B). Both Bethesda class and TN size were statistically significant variables in this model with P < .001 and P =.02, respectively. As was expected, the probability of malignant disease substantially increased as Bethesda class increased. Each probability curve in Figure 2B suggests that smaller nodules (approximately <2.0 cm) had the highest probability of malignant disease for any given Bethesda class except class 6 TNs, which were all almost certain to be malignant. The C-index of this logistic regression model was 0.91, rendering it a highly predictive model.
This study investigates the accuracy and predictive value of US-guided FNAB of TNs at a single institution since the routine implementation of TBSRTC. The effect of TN size on both FNAB and final histopathologic diagnosis was assessed. Our data indicate that TNs 3.0 cm or larger were associated with decreased rates of malignant disease compared with smaller TNs. From our logistic regression model, our data suggest that smaller TNs (approximately <2.0 cm) were associated with increased probabilities of malignant disease on final pathologic diagnosis. These results are similar to those found by Shrestha et al,27 which show accuracy and specificity of FNAB increase as TN size increases, suggesting that TNs 4.0 cm or larger are not associated with an increased FNAB false-negative rate. In addition, Shrestha et al27 demonstrated a tendency of TNs smaller than 1.0 cm to have higher rates of malignant disease and false-negative rates compared with TNs that are 1.0 to 3.9 cm and 4.0 cm or greater in size. These findings, however, were not statistically significant.
Various other studies have also sought to elucidate the correlation between TN size and the accuracy of FNAB.15- 26 They have done so, however, with varying results, leading to the current differing opinions as to whether TN size, in itself, can be considered an independent indication for diagnostic thyroid lobectomy. In 1995, Meko and Norton17 described false-negative rates as high as 30% in “cystic/solid” TNs 3.0 cm or greater. A 2014 systematic review by Shin et al19 suggested reduced FNAB diagnostic accuracy in TNs greater than 3.0 to 4.0 cm. Two recent studies also report false-negative rates of 11%15 and 10.4%20 in cytologically benign TNs 4.0 cm or greater following US-guided FNAB. Our study reports a false-negative rate of 3.8% (1 of 26) in class 2 TNs 3.0 cm or greater.
Other authors have argued the contrary, stating that the validity of FNAB is not adversely affected by the size of a TN. According to the TBSRTC, cytologically benign TNs are expected to have a 1% to 3% risk of malignant disease.12 In a study performed at the Mayo Clinic, Porterfield et al23 reported a false-negative rate of 0.7% (1 of 145) when comparing TNs having a benign cytologic result obtained under US guidance with their postsurgical histopathologic examination. This rate is among the lowest in the reported literature. In a similar study evaluating only TNs 3.0 cm or greater, Yoon et al26 demonstrated a false-negative rate of 1.8% (in 2 of 112 TNs) in TNs 3.0 cm or greater. Kuru et al29 reported a slightly higher false-negative rate of 4.1% (4 of 98) in TNs 4.0 cm or greater but concluded that was still sufficiently low enough to accurately discern benign from malignant TNs. Others have also demonstrated the uniformity of FNAB in nodules of all sizes. Results of a series of nearly 1000 patients published by Varshney et al25 revealed no significant difference in US-guided FNAB sensitivity, specificity, PPV, or NPV between TNs smaller than 4.0 cm and those 4.0 cm or greater; these results were true for all classes of FNAB cytologic findings studied (benign, indeterminate, and malignant and/or suspicious for malignant disease). The same study also showed an increased rate of malignant disease in small vs large (<4.0 cm vs ≥4.0 cm) cytologically benign TNs (28.2% vs 15.7%), although this was without stated statistical significance.25
To further evaluate the potential of smaller TNs being associated with higher rates of malignant disease in light of our relatively limited data set, we used logistic regression models to evaluate TN size instead of an arbitrary cutoff point of 3.0 cm. An arbitrary cutoff point of 3.0 cm makes the implicit assumption that nodules of size 2.9 and 3.1 cm are wholly different. Modeling the data using logistic regression allows for the data, not the investigator, to dictate the “cutoff” value, if there truly is one. In the logistic regression model, which included both tumor size and Bethesda class, Bethesda class was the single most important factor in predicting malignant disease, but size was also an important predictor. Figure 2 shows a distinct change in the probability curves above 1.6 to 2.0 cm, indicating that above this threshold, the probability of malignant disease does not drastically change with increasing TN size.
The overall rate of malignant disease for each Bethesda class within our data did not substantially differ from what is expected based on TBSRTC’s approximation, except for class 3. According to TBSRTC, the expected rate of malignant disease for class 3 TNs is 5% to15%.12 Our data, however, revealed that 30.2% of these nodules were malignant. Similarly, a 2014 study from Memorial Sloan Kettering Cancer Center suggested an increased rate of malignant disease in class 3 nodules of 26.6% to 37.8%.30 In a study focusing on malignant risk factors of class 3 TNs, Ryu et al31 demonstrated that multiple AUS results from repeated FNABs and TNs smaller than 2.0 cm were risk factors for malignant disease on univariate analysis. The increased rate of malignant disease observed in our class 3 TNs may also partially explain the lower than expected specificity and PPV of our overall FNAB results. While sensitivity, specificity, PPV, and NPV are important metrics to evaluate a diagnostic test, in evaluating TNs, the local rates of malignant disease associated with each Bethesda class are among the most important factors to consider in arriving at an operative or nonoperative treatment plan.
The current study has several limitations. Our data represent the experience of only a single academic, tertiary referral center, and therefore the study is limited in size. Given that this was a retrospective analysis, there was no way to standardize the inherent variability associated with operative decision making. Furthermore, we used data only for patients who eventually went on to receive thyroid lobectomy or total thyroidectomy following a preoperative FNAB, as evidenced by our high overall rate of malignant disease of 43.9%. As such, it is not within the scope of this study to assess which patients opted for observation. Finally, we acknowledge that there is no way to completely eliminate all interpersonal variability within the reporting of FNAB pathologic findings.
Our study demonstrates that large TNs of any given Bethesda class are not associated with an increased probability of malignant disease beyond that which is expected based on their cytologic classification. Furthermore, our data would suggest that smaller TNs, as opposed to larger TNs, pose a relatively increased risk of malignant disease and should be viewed with caution. Specifically in regards to class 3 TBs, our data support the possibility that at some comprehensive centers, the rate of malignant disease may be considerably higher than traditionally assumed. Overall, we do not support the use of diagnostic lobectomy based solely on large TN size.
Corresponding Author: David Goldenberg, MD, The Pennsylvania State University, College of Medicine, Division of Otolaryngology–Head and Neck Surgery, 500 University Dr, H091, Hershey, PA 17033-0850 (firstname.lastname@example.org).
Published Online: August 20, 2015. doi:10.1001/jamaoto.2015.1451.
Author Contributions: Dr Magister had full access to all data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Magister, Chaikhoutdinov, Saunders, Goldenberg.
Acquisition, analysis, or interpretation of data: Magister, Schaefer, Williams, Saunders, Goldenberg.
Drafting of the manuscript: Magister, Williams, Goldenberg.
Critical revision of the manuscript for important intellectual content: Magister, Chaikhoutdinov, Schaefer, Saunders, Goldenberg.
Statistical analysis: Schaefer.
Administrative, technical, or material support: Williams, Goldenberg.
Study supervision: Chaikhoutdinov, Saunders, Goldenberg.
Conflict of Interest Disclosures: None reported.
Previous Presentation: This study was presented at the Annual Meeting of the American Head and Neck Society; April 22, 2015; Boston, Massachusetts.
Additional Contributions: Jonathan Derr, Department of Surgery, The Pennsylvania State University, College of Medicine, Hershey, assisted with data acquisition. He was not compensated specifically for his work in this study.