Tissue microarray staining with antibodies to galectin-3 and extracellular matrix metalloproteinase inducer (EMMPRIN). Sample 12191 represents follicular thyroid adenoma (FA) and demonstrates no staining with galectin-3 or EMMPRIN, whereas sample 9197, a follicular thyroid carcinoma (FTC), intensely stains with both antibodies. The positive control sample, in this case for galectin-3, was used for each of the 5 antibodies investigated. *Staining of epithelial cells was positive for galectin-3 and acted as an internal control, but all adenoma cells demonstrated negative immunohistochemical analysis results. Intensity levels are described in the “Tissue Microarray Scoring and Statistical Analysis” subsection of the “Methods” section.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Bryson PC, Shores CG, Hart C, et al. Immunohistochemical Distinction of Follicular Thyroid Adenomas and Follicular Carcinomas. Arch Otolaryngol Head Neck Surg. 2008;134(6):581–586. doi:10.1001/archotol.134.6.581
To use immunohistochemical (IHC) evaluation of proteins encoded by genes that were differentially expressed in follicular thyroid adenomas (FAs) vs follicular thyroid carcinomas (FTCs) to distinguish benign vs malignant follicular thyroid lesions. Multiple gene microarray studies suggest that benign and malignant follicular thyroid neoplasms have different gene expression profiles.
Immunohistochemical analysis of thyroid neoplasms, including FA (n = 62), FTC (n = 62), and follicular variant of papillary thyroid carcinoma (n = 58), using tissue microarrays. We evaluated antibodies galectin-3, autotaxin, intestinal trefoil factor 3 (TFF3), extracellular matrix metalloproteinase inducer (EMMPRIN), and growth arrest and DNA damage-inducible protein 153 (GADD153). We analyzed data for quantitative differences in IHC intensity and the percentage of positive cells between FAs and combined follicular carcinomas. Sensitivity and specificity analysis are reported, along with a dual-protein clinical algorithm.
Academic tertiary care center.
Adults with known follicular and papillary thyroid lesions that were surgically resected during the past 15 years.
Main Outcome Measures
Sensitivity and specificity of individual and combined antibodies for detecting benign from malignant lesions.
Quantitative analysis showed IHC validation of the gene expression differences noted in previously published microarray reports. A significantly higher percentage of FTC cells stained with galectin-3, EMMPRIN, and GADD153. Galectin-3 and EMMPRIN also showed a significantly higher intensity of staining in FTC cells. Compared with malignant lesions, TFF3 stained a greater cell percentage in FAs. Galectin-3 (sensitivity, 0.72; specificity, 0.62) and EMMPRIN (sensitivity, 0.63; specificity, 0.49) had the most promising diagnostic potential with a dual-protein sensitivity of 0.80 and specificity of 0.70. Autotaxin and GADD153 had overall higher sensitivities (0.88 and 0.82, respectively) but very poor specificities (0.02 and 0.21, respectively).
Protein expression data validate the pooled gene expression results that differentiate FTC from FA. Our results show promise for multiple-protein IHC analysis algorithms and their diagnostic ability. Future studies should focus on clinical translation of these molecular differences for the diagnosis of follicular thyroid neoplasms.
The American Cancer Society estimates that there will be 33 550 new diagnoses of thyroid carcinoma in 2007.1 The incidence of thyroid cancer has been reported to be increasing at a rate of 3% per year.2 These patients are identified from a much larger group who present with thyroid nodules, most of which are benign. Fine-needle aspiration (FNA) biopsy is the recommended initial test for the evaluation of solitary thyroid nodules because it is safe, inexpensive, accurate, and office based.3 Although FNA biopsy excels at identifying papillary thyroid carcinoma, a major deficiency of this diagnostic procedure is that follicular thyroid carcinoma (FTC), follicular variant papillary thyroid carcinoma (FVTC), and follicular thyroid adenoma (FA) cannot be differentiated cytopathologically.3 The diagnosis of follicular carcinoma is dependent on the presence of capsular or vascular invasion on formal pathologic evaluation. In roughly 20% of all FNA biopsy specimens, the diagnosis is follicular neoplasm or follicular lesion and is regarded as indeterminate or suspicious.4 Of these indeterminate neoplasms, approximately 80% are benign and 20% are malignant.5,6 As a result, patients with this diagnosis are typically taken to the operating room for a thyroid lobectomy. If the final pathologic reading is carcinoma, most patients return to the operating room for completion thyroidectomy in anticipation of radioactive iodine ablation. The accurate, preoperative diagnosis of follicular thyroid lesions represents a clear diagnostic void that results in many unnecessary thyroid surgeries.
Recent molecular analysis of follicular thyroid lesions suggests that FA and FTC have characteristic microarray expression profiles that differentiate each of these lesions.7-10 As a research tool, DNA microarray gene profiling is expensive, time consuming, and operator dependent. These factors currently make microarray analysis difficult to translate into clinical use; however, these data can be used as a basis for development of more routine clinical tests.
In our study, we used 5 antibodies to proteins whose genes were identified as being differentially expressed in follicular neoplasms via gene microarray analysis7-10 and evaluated the immunohistochemical (IHC) staining of FTC, FA, and FVTC. These antibodies included an extracellular matrix metalloproteinase inducer (EMMPRIN),11 autotaxin (a tumor cell motility-stimulating factor), galectin-3 (a β-galactosidase–binding protein that is involved in regulating cell-cell communication),12 trefoil factor 3 (TFF3) (a soluble protein that is secreted with mucin in the colon),13 and growth arrest and DNA damage-inducible protein 153 (GADD153) (a leucine zipper transcription factor that is up-regulated in response to stress).14 The goal of this research was to use our molecular and genetic understanding of follicular thyroid lesions and to translate these advances into a more accurate and serviceable preoperative diagnostic tool for follicular thyroid tumors.
After approval by the institutional review board of the University of North Carolina, thyroid pathology records from the past 15 years were queried for all specimens with a final pathologic diagnosis of FTC, FA, papillary thyroid carcinoma, and FVTC from the University of North Carolina Pathology and Laboratory Medicine surgical pathology database. Specimens with more than 1 diagnosis or histologic finding, such as Hürthle cell changes, were excluded. We identified 62 specimens of FA, 62 specimens of FTC, and 58 specimens of FVTC and obtained corresponding tissue blocks. The final diagnoses of the specimens were verified by independent reviewers (C.H. and L.T.).
Tissue microarrays were constructed by the University of North Carolina Anatomic Pathology Research Core Laboratory. Hematoxylin-eosin–stained slides of thyroid specimens were reviewed by the surgical pathology fellow (C.H.), who confirmed the diagnoses and identified the areas of tumor and normal tissue for each tissue microarray core. With the areas of interest identified, the recipient tissue microarray block was constructed using a manual tissue arrayer (MTA-1; Beecher Instruments Inc, Sun Prairie, Wisconsin). We placed 1-mm cores in the recipient block, heated the block to congeal the samples into the block, and applied a paraffin layer for proper facing. A spreadsheet (Excel; Microsoft Corporation, Redmond, Washington) was constructed using sample accession numbers without identifying the final pathologic finding to allow for blinded grading.
Analysis of published microarray literature generated a list of genes in which messenger RNA (mRNA) levels may distinguish different follicular thyroid pathologies (Table 1). The primary antibodies used in the study were EMMPRIN mouse monoclonal antibody (Santa Cruz Biotechnology, Santa Cruz, California), which had placenta as a positive control tissue and an ideal dilution of 1:100; galectin-3 mouse monoclonal antibody (AbCam, Cambridge, Massachusetts), in which the control tissue was also placenta with an ideal dilution of 1:100; TFF3 mouse monoclonal antibody (CalBiochem, San Diego, California), which had colonic tissue as a positive control tissue and an ideal dilution of 1:50; and GADD153 rabbit monoclonal antibody (Santa Cruz Biotechnology), for which the positive control tissue was skin with an ideal dilution of 1:50. The control tissue for autotaxin mouse monoclonal antibodies was placenta, and the ideal dilution was 1:200. Negative controls for each antibody were performed by removing the primary antibody.
Immunohistochemical staining for each antibody was performed using a commercially available kit (DAKO Cytomation LSAB plus kit; DAKO North America, Inc, Carpinteria, California). We heated 5-μm sections of the tissue microarray blocks at 60°C for 1 hour, then deparaffinized and dehydrated them. After being rinsed in automation buffer (DAKO North America, Inc), the slides were subjected to antigen retrieval using 1× citra buffer (pH, 6.0) (DAKO North America, Inc) at 100 °C (steam) for 30 minutes and allowed to cool to room temperature. The slides were subjected to a serum-free protein block, peroxidase block, and an avidin/biotin system block before incubation with the primary antibody at the appropriate dilution overnight at 4° C in a humidified chamber. A universal secondary antibody and streptavidin horseradish peroxidase (LSAB2 kit; DAKO North America, Inc) was applied to each slide after several washes with buffer. Antibodies were visualized using diamino benzidine solution at room temperature for 5 minutes. The slides were counterstained with hematoxylin, dehydrated, cleared with xylene, and coverslipped using a mounting medium (Permount; Fisher Science, San Francisco, California).
A surgical pathology fellow (C.H.) who was blinded to the arrangement of specimens within the tissue microarray scored the slides (scale, 1-4). The pathologist scored the specimens for staining intensity (0, 1+, 2+, or 3+), staining pattern (cytoplasmic, membranous, or nuclear), and the percentage of tumor and nontumor cell staining (1%-25%, 26%-50%, 51%-75%, or 76%-100%) (see the Figure for examples). The staining intensity and cellular staining percentage data were then subjected to statistical analysis. Comparisons between quantitative protein scores were performed using Prism (GraphPad, San Diego, California) and SAS (SAS Institute, Cary, North Carolina) statistical software to validate other microarray work. We used the Wilcoxon-Mann-Whitney 2-sample test for nonparametric data for this analysis with a significance cutoff of P = .05. Sensitivities and specificities were then calculated using the PRISM software and a 2 × 2 table contingency calculator in which programming is based on statistics from Ott.15 Likelihood ratios and posttest calculations were performed in accordance with formulas obtained from Fletcher.16 We report 95% confidence intervals along with P values.
Genes for which mRNA expression levels distinguish differing follicular thyroid abnormalities have been identified in published gene expression microarray studies.7-10 Autotaxin, EMMPRIN, GADD153, and galectin-3 mRNAs are all overexpressed in FTCs compared with FAs (Table 1); conversely, TFF3 mRNA is overexpressed in follicular adenomas compared with follicular carcinomas (Table 1). The proteins were chosen for this study according to the differences noted in these microarray studies and the availability of antibodies for IHC (Table 1).
Autotaxin exhibited cytoplasmic staining of all samples, and 3 of 58 FVTC specimens exhibited nuclear staining. Galectin-3 stained all specimens cytoplasmically with simultaneous nuclear staining in 25 of 62 FA, 22 of 62 FTC, and 21 of 57 FVTC specimens. Cytoplasmic staining was also observed with EMMPRIN, with concurrent membranous staining in 16 of 62 FA, 20 of 62 FTC, and 15 of 57 FVTC specimens. Only cytoplasmic staining was observed with TFF3. All specimens stained cytoplasmically with GADD153, with concurrent nuclear staining in 27 of 62 FA, 24 of 62 FTC, and 31 of 57 FVTC specimens. None of the IHC cellular localizations significantly differentiated FA from FTC. Grading and analysis of IHC for all further reported data are based on cytoplasmic staining scores.
We compared the quantitative analysis of protein cytoplasmic IHC levels between FA and FTC. Comparisons were made between the intensity of staining and the percentage of cells stained (Table 2 and Figure). Galectin-3 (P = .006), EMMPRIN (P = .004), and GADD153 (P = .02) demonstrated significant differences in cell staining scores, with more cancer cells stained than adenoma cells. Furthermore, galectin-3 (P = .002) and EMMPRIN (P = .007) had significantly higher IHC intensity scores. The percentage of cell staining with TFF3 was higher in FA when compared with FTC specimens (P = .002), which correlates with the expected results from the gene array data. Autotaxin uniformly stained all but 1 FA sample and all FTC samples at intensity levels of 2+ and 3+. No significant differences were noted with autotaxin in the intensity (P = .37) or the percentage (P = .08) of cell staining. Mean scoring, standard deviations, and FA vs FTC comparisons are reported for all scoring measures (Table 2).
We performed sensitivity and specificity calculations according to the intensity of cell staining for autotaxin and the percentage of cell staining for all other proteins (Table 3). Results are reported with 95% confidence intervals for comparisons between FA and FTC and between FA and combined carcinomas with follicular histologic findings (FTC plus FVTC) (Table 3). Using an intensity cutoff of 2+, autotaxin had a sensitivity and specificity of 88% and 2%, respectively. Galectin-3 staining demonstrated 72% sensitivity and 62% specificity. The percentage of EMMPRIN staining was 63% sensitive and 49% specific. For GADD153 staining, sensitivity was 82% and specificity was 21%; for TFF3 staining, sensitivity was 40% and specificity was 51%, with a cutoff of less than 50% of cells stained. Many criteria for intensity and cell staining percentage were analyzed; however, the measures reported in Table 3 show the most clinically significant changes to posttest probabilities as determined by routine likelihood ratio calculations (data not shown).
Galectin-3 and EMMPRIN were the only 2 proteins to show a significant intensity and percentage of cell staining scores (Table 2). Both proteins also had the highest clinical utility based on positive likelihood ratio (LR) calculations of all the protein measures analyzed in Table 3 (galectin-3 positive LR, 1.9; EMMPRIN positive LR, 1.2). To evaluate the clinical utility of both proteins in differentiating FA from FTC, a summed scoring analysis was performed (Table 4). The cell percentage score (1-4) for each FA and FTC sample was summed for galectin-3 and EMMPRIN. Scores of 0, 1, and 2 were set as a negative test result and scores of 6, 7, and 8 were set as a positive test result. Approximately 50% of the samples scored as indeterminate (scores of 3, 4, and 5). Using this dual-protein system, a sensitivity of 80% and specificity of 70% were calculated (Table 5) with a positive LR of 2.6 and a negative LR of 0.3. Thus, assuming a 20% pretest probability of follicular cancer with a follicular neoplasm FNA result, then the calculated posttest probability is 39.4% for a positive test result and 6.5% for a negative test result (Table 5).
At present, most patients with FNA-positive “follicular lesions” and those labeled “indeterminate” are taken to the operating room for a thyroid lobectomy. For those patients with benign adenomas (80%), this is an unnecessary operation. For patients with thyroid carcinoma (20%), a lobectomy is not sufficient, and the patient is returned to the operating room for completion total thyroidectomy, with or without a neck dissection. Our current treatment algorithm results in unnecessary surgery for 80% of patients and inadequate surgery for 20% of patients. Creative use of molecular and genetic differences in follicular thyroid lesions sets up the possibility of translating these advances into a more accurate and clinically feasible preoperative diagnostic tool for follicular thyroid tumors.
In a recent meta-review of thyroid microarray, 21 studies using 10 different gene expression platforms were evaluated.10 In all, 1785 genes were reported as differentially expressed in these studies (827 were upregulated and 958 were downregulated).10 The amount of data is staggering, and the results are difficult to interpret, particularly because most of these studies combined papillary thyroid carcinoma and FTC as a single group to be compared against all adenomas (and in some cases normal thyroid tissue). When considering the clinical translation of basic research, one must keep the clinical question in primary focus. Differentiation of normal thyroid tissue and papillary lesions by means of FNA biopsy is very good, with accuracies of greater than 90%. The primary diagnostic void is differentiating FTC from FA. Only 4 of 21 reviewed studies specifically evaluated the gene differences between these lesions.7-9,17 We reviewed these studies and formulated a combined gene list to guide the selection of the 5 antibodies used when performing IHC on tissue microarrays to differentiate follicular neoplasms.
Our results demonstrate 2 important points. First, microarray gene expression data correlate with results of IHC quantitative analysis (Table 2). Four of our 5 antibodies correctly correlated with the expected gene expression results (Tables 1 and 2). Only autotaxin did not reveal significant differences in our IHC assay. It is probable that gene chips and reverse transcription–polymerase chain reaction are much more sensitive at detecting quantitative differences than IHC grading is, and the uniform positive staining with autotaxin in our results likely represents these sensitivity differences. Second, although these quantitative results help to validate the microarray results of others, they do not imply clinical utility. Of the 4 antibodies that demonstrated quantitative differences, GADD153 and TFF3 had poor sensitivities and specificities; however, galectin-3 and EMMPRIN had reasonable sensitivities and specificities, with the dual-protein analysis having even better utility at differentiating FA from FTC. Our scoring system used scores of 0 to 3 for intensity, and percentages of 10%, 20%, or greater for cellular staining, which is in accordance with accepted scoring systems, although there is no true standard scoring system for IHC.18,19
In a study by Chen et al,11 EMMRPIN has been implicated in the metastatic potential of FTCs. Barden et al8 confirmed the differential expression of EMMRPIN by means of immunoblot analysis; however, no evaluation of its diagnostic potential was analyzed. Galectin-3 is a member the β-galactosidase–binding proteins that is involved in regulating cell-cell communication and has been implicated in the initiation and regulation of cell growth and malignant transformation.12 A multicenter study demonstrated that IHC detection of galectin-3 in 1009 FNA biopsy samples from thyroid nodules served as a highly specific and sensitive means of identifying malignant lesions of the thyroid.20 They reported a sensitivity and specificity of galectin-3 in discriminating benign from malignant thyroid lesions at 99% and 98%, respectively.20 This study is very compelling and well done, but one limitation is the inclusion of a data set in which most of the data were derived from papillary thyroid carcinoma. This has been confirmed by at least 3 other studies whose results refute the utility of galectin-3 as a single-protein marker, especially in the case of follicular lesions.17,21,22 At least 5 studies have suggested TFF3 as a useful biomarker at the RNA level in thyroid cancer, and this gene was listed as 1 of 12 (of a total of 1795 genes) to have the most clinical promise (based on results of the array analysis) for the preoperative thyroid diagnosis.10 Our IHC results did not substantiate the anticipated promise of TFF3 as a clinical marker. To our knowledge, no studies other than ours have evaluated TFF3 at the protein level in thyroid tissues.
After reviewing more than 20 years of protein data in thyroid tumors, it is unlikely that a single protein is going to be a sufficient biomarker for FTC.10,12,23 If we applied the sensitivity and specificity results from our dual-protein analysis (Table 4) (discussed in the “Dual-Protein Analysis” subsection of the “Results” section) to an estimated 60 000 follicular FNA biopsies with indeterminate results per year,24 approximately 9600 patients would avoid return trips to the operating room for a completion thyroidectomy. Approximately 2400 patients with an indeterminate FNA biopsy finding would have a false-negative result; however, this population would receive a correct diagnosis after formal pathologic evaluation of the lobectomy specimen. An estimated 14 400 patients would have an unnecessary total thyroidectomy, rather than a lobectomy. Thus, it is likely that multiple-protein analysis will be the key in advancing the sensitivities and specificities of IHC testing to reduce the percentage of false-positive results.
In summary, the clinical translation of genomic (and in the future proteomic) markers, in addition to FNA biopsy, may allow endocrinologists and surgeons to more accurately advise patients about the necessity and extent of surgery for thyroid carcinomas and possibly to offer insight into the prognosis postoperatively.
Correspondence: Adam M. Zanation, MD, Department of Otolaryngology–Head and Neck Surgery, University of North Carolina at Chapel Hill, CB 7070, Chapel Hill, NC 27599 (firstname.lastname@example.org).
Submitted for Publication: July 24, 2007; final revision received November 1, 2007; accepted November 13, 2007.
Author Contributions: Drs Bryson, Shores, and Zanation and Mr Richey had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: Bryson, Shores, Richey, and Zanation. Acquisition of data: Bryson, Hart, Thorne, and Farag. Analysis and interpretation of data: Bryson, Shores, Hart, Patel, Richey, and Zanation. Drafting of the manuscript: Bryson and Zanation. Critical revision of the manuscript for important intellectual content: Bryson, Shores, Hart, Thorne, Patel, Richey, and Farag. Statistical analysis: Richey and Zanation. Obtained funding: Bryson and Shores. Administrative, technical, and material support: Bryson, Hart, Thorne, and Richey. Study supervision: Bryson, Shores, and Zanation.
Financial Disclosure: None reported.
Funding/Support: Autotaxin mouse monoclonal antibodies were a generous gift from Tim Clair, PhD, at the National Cancer Institute, National Institutes of Health.
Previous Presentation: This study was presented at the 2007 American Head and Neck Society Annual Meeting; April 28, 2007; San Diego, California.