A total of 599 patients reported voice changes, 272 had a Voice Handicap Index-10 (VHI-10) score greater than 11 (10 elements of the VHI-10 questionnaire scored on a Likert scale [0, never; 1, almost never; 2, sometimes; 3, almost always; 4, always], summated to generate a composite VHI-10 score for each patient [range, 0-40]), and 105 had a diagnosis of vocal fold motion impairment (VFMI).
Customize your JAMA Network experience by selecting one or more topics from the list below.
Kovatch KJ, Reyes-Gastelum D, Hughes DT, Hamilton AS, Ward KC, Haymart MR. Assessment of Voice Outcomes Following Surgery for Thyroid Cancer. JAMA Otolaryngol Head Neck Surg. 2019;145(9):823–829. doi:10.1001/jamaoto.2019.1737
What is the association between thyroid cancer surgery and postoperative voice outcomes?
In this population-based study of 2325 patients aged 17 to 89 years diagnosed as having differentiated thyroid cancer, abnormal voice was noted in 272 patients following surgery for thyroid cancer.
These findings suggest a need for heightened awareness of voice abnormalities following surgery and warrant consideration in the preoperative risk-benefit discussion, planned extent of surgery, and postoperative rehabilitation.
An increasing number of surgeries are being performed for differentiated thyroid cancer (DTC). Long-term voice abnormalities are a known risk of thyroid surgery; however, few studies have used validated scales to quantify voice outcomes after surgery.
To identify the prevalence, severity, and factors associated with poor voice outcomes following surgery for DTC.
Design, Setting, and Participants
A cross-sectional, population-based survey was distributed via a modified Dillman method to 4185 eligible patients and linked to Surveillance, Epidemiology and End Results (SEER) data from SEER sites in Georgia and Los Angeles, California, from February 1, 2017, to October 31, 2018. Multivariable logistic regression and zero-inflated negative binomial analysis were performed to determine factors associated with abnormal voice. Participants included patients undergoing surgery for DTC between January 1, 2014, and December 31, 2015, excluding those with voice abnormalities before surgery.
Main Outcomes and Measures
Abnormal Voice Handicap Index (VHI-10) score, defined as greater than 11. The VHI-10 is designed to quantify 10 psychosocial consequences of voice disorders on a Likert scale (0, never; to 4, always).
A total of 2632 patients (63%) responded to the survey and 2325 met the inclusion criteria. With data reported as unweighted number and weighted percentage, 1792 were women (77.4%); weighted mean (SD) age was 49.4 (14.4) years. Of these, 599 patients (25.8%) reported voice changes lasting more than 3 months following surgery, 272 patients (12.7%) were identified as having an abnormal VHI-10 score, and 105 patients (4.7%) reported vocal fold motion impairment diagnosed by laryngoscopy. In multivariable analysis, factors associated with an abnormal VHI-10 score included age 45 to 54 years (reference, ≤44 years; odds ratio [OR], 1.49; 95% CI, 1.05-2.11), black race (OR, 1.73; 95% CI, 1.14-2.62), Asian race (OR, 1.66; 95% CI, 1.08-2.54), gastroesophageal reflux disease (OR, 1.67; 95% CI, 1.15-2.43), and lateral neck dissection (OR, 1.99; 95% CI, 1.11-3.56).
Conclusions and Relevance
A high prevalence of abnormal voice per validation with the VHI-10 emphasizes the need for heightened awareness of voice abnormalities following surgery and warrants consideration in the preoperative risk-benefit discussion, planned extent of surgery, and postoperative rehabilitation.
As the incidence of differentiated thyroid cancer (DTC) has risen greatly in the past 3 decades, an increasing number of thyroid surgeries are being performed.1 It is estimated that 118 000 to 166 000 thyroid surgeries are performed each year with many conducted for DTC.2,3 While the mortality rates of DTC remain low, morbidity associated with thyroid surgery is a concern. One of the more common adverse effects following thyroid surgery is a change in voice, which may be related to iatrogenic injury to the recurrent laryngeal nerve (RLN) or superior laryngeal nerve during surgery or by direct cancer involvement.3-5 However, vocal fold paralysis or paresis (vocal fold motion impairment [VFMI]) can occur even when the RLN is left anatomically intact, and voice changes following surgery may be present even when no risk factors, surgical complications, or signs of VFMI are readily apparent.6Quiz Ref ID Transient voice disturbance may be identified in up to 80% of patients after thyroidectomy, with approximately 10% showing temporary RLN injury.7,8 The burden of persistent voice changes in long-term follow-up is not well described.
Quiz Ref IDVoice changes and VFMI following thyroid surgery are likely underrecognized, at least in part owing to rare use of validated scales to assess the outcome of thyroid surgery on patients’ voice and in the differential use of routine preoperative and postoperative laryngoscopy examination.3,9,10 While a number of validated scales assessing patient voice exist,3,9-16 including the Voice Handicap Index-10 (VHI-10),12 these scales have primarily been used in single-institution studies and studies with small cohorts rather than large population-based cohorts.17-19
This investigation was designed as a population-based study to assess voice outcomes following surgery using patient VHI-10 questionnaire responses complemented by Surveillance, Epidemiology and End Results (SEER) clinical data. The objectives of this study were to describe the prevalence, severity, and characteristics of voice-related changes following thyroid surgery for DTC, examine factors associated with abnormal voice (VHI-10 score >11), and identify clinical, pathologic, and treatment variables associated with abnormal voice. Patient-reported diagnosis of VFMI by laryngoscopy is reported as a secondary outcome measure.
We conducted a large cross-sectional, population-based survey of patients aged 18 to 79 years diagnosed with DTC between January 1, 2014, and December 31, 2015. Patients were accrued from the SEER registries in Georgia and Los Angeles, California. The recruitment method included a modified Dillman approach, consisting of an initial survey mailing with a cover letter and small financial incentive, followed by telephone call follow-up and mailed reminders to nonresponders.20 Patient-reported survey data were collected between February 1, 2017, and October 31, 2018, at a point 2 to 4 years following diagnosis. Survey responses were linked to existing clinical data from SEER to construct an analytic data set. The study was approved by the University of Michigan, the University of Southern California, the California Protection of Human Subjects Review Board (California State Institutional Review Board), the Georgia Department of Public Health, and the Emory University Institutional Review Board, and received California Cancer Registry approval. A waiver of signed informed consent was provided at the SEER sites. Participants received financial compensation.
The questionnaire content was developed based on the research questions and hypotheses, prior literature on thyroid voice outcomes, and prior work studying differentiated thyroid cancer.21 We used standard techniques to assess content validity, including review by design experts and pilot studies in selected clinic populations.
The VHI-10 questionnaire is an abbreviated version of the VHI described by Jacobson et al11 in 1997, designed to quantify the psychosocial consequences of voice disorders (eg, I feel left out of a conversation because of my voice, the clarity of my voice is unpredictable).12 The survey included all 10 elements of the VHI-10 questionnaire scored on a Likert scale (0, never; 1, almost never; 2, sometimes; 3, almost always; 4, always). The 10 questions were summated to generate a composite VHI-10 score for each patient (range, 0-40). The VHI-10 reliably identifies abnormal voice as a score greater than 11 based on normative data; thus, summed scores from patient surveys were used to define abnormal voice as a VHI-10 score higher than 11.19 Vocal fold paralysis or paresis outcomes were obtained by patient report, with VFMI defined as a positive response to the question, “Have you been diagnosed with vocal cord dysfunction (palsy or paralysis) on laryngoscopy?”
Variables extracted from the patient survey also included patient-reported sex, race/ethnicity, diagnosis of gastroesophageal reflux disease (GERD), patient report of having voice changes more than 3 months following surgery or voice problems before surgery, and number of surgeries performed (single vs multiple). Variables extracted from the SEER data included age at diagnosis, tumor pathologic features (histologic subtype, size, extrathyroidal extension), disease extent (localized, regional, or distant), and surgical extent. According to 2 SEER surgical variables, patients were categorized as undergoing lobectomy, total thyroidectomy without lymph node dissection, total thyroidectomy with lymph nodes dissected location unknown, total thyroidectomy with central neck dissection, and total thyroidectomy with lateral neck dissection. Total thyroidectomy with dissected location unknown consisted of patients with lymph nodes resected but no lymph node metastases; those with positive lymph node metastases could be categorized as having central vs lateral neck dissections.
Descriptive statistics are reported. The full data set was used. Only complete cases were used for statistical models, and missing data were encountered for less than 5% for any variable. Tests of independence between categorical variables were performed using χ2 tests. Univariate analysis was performed for variables which were a priori expected to be associated with voice outcomes based on literature review. Subsequent multivariable logistic regression was used to determine the degree to which demographic, clinical, and pathologic variables associated with outcomes of VHI-10 scores greater than 11 and VFMI. Covariates included age, sex, race/ethnicity, GERD, tumor histologic findings, tumor size, extrathyroidal extension, and surgical extent. Odds ratios (ORs) with 95% CIs are reported. In addition, a zero-inflated negative binomial model was performed with the outcome VHI-10 score. Incidence rate ratios (IRRs) with 95% CIs are reported.
Statistical analyses incorporated weights to account for differential sampling and survey nonresponse, including the use of design weights to account for differential probability of sample selection and nonresponse weights to account for disproportionate nonresponse rates across different patient subgroups. This weighting aims to generate statistical inferences which are more representative of the target population.22,23 Percentages, ORs, and IRRs reported are weighted, and number of participants, when provided, are unweighted for clarity. With 2-tailed testing, findings were considered significant at P < .05. Analyses were performed using R, version 3.5.2,20 and Stata, version 15.1 (StataCorp).24
The survey was sent to 4185 eligible patients. A total of 2632 patients responded, resulting in a 63% overall response rate and 77% cooperation rate.25 Because this study examined voice changes following surgery for thyroid cancer, those reporting voice changes before surgery (n = 267) and/or those who did not undergo surgery (n = 48) were excluded, and analyses were performed on the remaining 2325 patients.
The 2325-patient cohort had a weighted mean (SD) age of 49.4 (14.4) years. Reported in Table 1 as unweighted number and weighted percentage, 1792 respondents were women (77.4%) and 1336 were of white race (53.0%). Tumor characteristics included largely papillary histologic findings (93.0%), size 2.0 cm or less (67.4%), extrathyroidal extension (29.2%), regional spread (28.7%), and distant spread (2.5%). Surgical extent was largely total thyroidectomy alone (38.6%) or total thyroidectomy with location of nodal dissection unknown (27.1%). Central (13.1%) and lateral (8.7%) neck dissection were observed. Four or more lymph nodes were dissected in 39.9% of the cohort with dissection location unknown, 65.0% of the central neck dissection cohort, and 84.9% of the lateral neck dissection cohort.
The Figure shows that 25.8% of the cohort reported voice changes lasting more than 3 months following surgery, 12.7% were identified as having an abnormal VHI-10 score, and 4.7% reported having VFMI (paresis or paralysis) diagnosed by laryngoscopy. Patients reporting a diagnosis of VFMI were more likely to have an abnormal VHI-10 score; 60.5% of those who reported VFMI were additionally found to have a VHI-10 score higher than 11 vs patients without VFMI (10.3%) (P < .001). Conversely, just 21.9% of those who met criteria for abnormal voice on the VHI-10 questionnaire also reported a diagnosis of VFMI by laryngoscopy. Quiz Ref IDPatients who reported VFMI were also more likely to have prolonged voice changes lasting more than 3 months following surgery than those without VFMI (79.3% vs 23.3%, P < .001). Eighty-nine percent of the cohort was surveyed 3 years after diagnosis, 8% at 2 years, and 3% at 4 years. There was no significant difference in the proportion with abnormal VHI-10 based on years since diagnosis.
The 3 most commonly cited problems noted from the VHI-10 questionnaire were the same for patients with VHI-10 score greater than 11, for those with VFMI, and for the overall study population, although the proportion with these common abnormalities varied markedly between groups. The proportions reporting sometimes, almost always, or always for the statement, “my voice makes it difficult for people to hear me,” were 89.6% (VHI-10 score >11), 64.3% (VFMI), and 19.0% (overall study population). For the statement, “the clarity of my voice is unpredictable,” the proportions were 89.0% (VHI-10 score >11), 74.1% (VFMI), and 19.0% (overall study population). For the statement, “I feel as though I have to strain to produce voice,” the proportions reporting sometimes, almost always, or always were 88.9% (VHI-10 score >11), 73.9% (VFMI), and 19.4% (overall study population).
Quiz Ref IDMultivariable logistic regression was performed to determine which patient, tumor, or treatment characteristics were associated with a VHI-10 score greater than 11 (primary outcome). Table 2 indicates that VHI-10 score greater than 11 has significant associations with age group 45 to 54 years (OR, 1.49; 95% CI, 1.05-2.11), black race (OR, 1.73; 95% CI, 1.14-2.62), Asian race (OR, 1.66; 95% CI, 1.08-2.54), GERD (OR, 1.67; 95% CI, 1.15-2.43), and surgical extent including lateral neck dissection (OR, 1.99; 95% CI, 1.11-3.56). Sex, tumor size, histologic characteristics, and extrathyroidal extension were not associated with the VHI-10 score in this multivariable analysis, although extrathyroidal extension showed a significant association in the univariable analysis (OR, 1.45; 95% CI, 1.11-1.90).
Given the properties and distribution of the VHI-10 scores, we modeled the data with a zero-inflated negative binomial model; the model includes the same independent variables as the logistic model previously estimated. Male sex (OR, 1.53; 95% CI, 1.21-1.92) was more likely to have a score of 0. Tumor sizes larger than 4 cm (OR, 0.59; 95% CI, 0.41-0.84) and positive lateral neck lymph nodes (OR, 0.56; 95% CI, 0.36-0.87) were associated with increased likelihood of scores greater than 0. Asian race (IRR, 1.28; 95% CI, 1.04-1.58), black race (IRR, 1.35; 95% CI, 1.12-1.64), and GERD (IRR, 1.43; 95% CI, 1.19-1.71) were associated with higher scores on the VHI scale.
As a secondary analysis, multivariable logistic regression was performed to determine which patient, tumor, or treatment characteristics were associated with patient-reported VFMI. Table 3 indicates that VFMI has significant associations with age groups of 45 years and older compared with those 44 years or younger, black race (OR, 3.24; 95% CI, 1.83-5.72), GERD (OR, 1.91; 95% CI, 1.08-3.39), and tumor size larger than 4.0 cm (OR, 2.44; 95% CI, 1.22-4.87). Follicular or Hürthle cell histologic characteristics were associated with a lower rate of VFMI compared with papillary histologic characteristics (OR, 0.30; 95% CI, 0.09-0.97). Sex, extrathyroidal extension, and surgical extent were not associated with VFMI in this multivariable analysis; however, extrathyroidal extension (OR, 1.75; 95% CI, 1.17-2.62), and surgical extent including lateral neck dissection (OR, 2.25; 95% CI, 1.30-3.90) showed a significant association with VFMI in their univariable analysis.
In this large population-based study of patients with DTC 2 to 4 years after diagnosis, abnormal voice was reported by 12.7% of patients. Age, minority race, GERD and lateral neck dissection were associated with abnormal voice based on VHI-10. A total of 4.7% of patients reported VFMI diagnosed with laryngoscopy, within the range previously noted (1%-15%).21,26-28
The findings that more than twice as many patients had an abnormal VHI-10 score compared with VFMI and that 78.1% of patients with an abnormal VHI-10 score did not report diagnosed VFMI are likely owing to several factors. First, there may be a subset of patients with undiagnosed VFMI, as not all patients with RLN injury are symptomatic or have symptoms severe enough to warrant diagnostic laryngoscopy. Similarly, more subtle cases of paresis may not be evident by standard flexible laryngoscopy vs video stroboscopy. Second, there are postsurgical effects unrelated to the RLN associated with voice abnormalities following surgery, including bilateral or unilateral superior laryngeal nerve injury, laryngeal irritation or edema, or cervical strap muscle injury.29-32 When considering the strongest contributors to abnormal VHI-10 score within the study population, the most frequently cited reports included the following statements: “my voice makes it difficult for people to hear me,” “the clarity of my voice is unpredictable,” and “I feel as though I have to strain to produce voice.” Superior laryngeal nerve injury alone would be unlikely to result in a grossly abnormal laryngoscopy examination yet would be consistent with these frequent VHI-10 reports because they pertain to weakened projection, unpredictability of the voice, and strain.26
In addition, nonsurgical causes of abnormal voice may be confounded by or unmasked by thyroid surgery, where a diagnosis such as GERD may predate surgery, occur following surgery, or slow recovery of the voice following surgery. Patients reporting a coexisting diagnosis of GERD were observed to have a higher incidence of both abnormal VHI-10 score and VFMI. The association of GERD with voice abnormalities follows logically, as acid reflux irritation of the larynx is itself a common cause of hoarseness and diminished voice quality. Thyroid surgery may compound and augment this insult to the laryngeal structures by the anatomic mechanisms described above. Voice abnormalities or other overlapping symptoms (eg, globus, hoarseness, dysphagia) may be secondary to GERD, swallowing difficulties, or poorer recovery and compensation following vocal fold paresis or paralysis, and GERD may lead to more screening for VFMI.
It is well known that poor voice outcomes are more common in older adults, and our data support this finding from prior studies.21,27,33,34 However, the role of race and voice is less studied. Based on earlier investigations on racial disparities in treatment of thyroid cancer and other cancers, the association between black and Asian race and VHI-10 scores greater than 11 is possibly secondary to less access to care, presentation and treatment at later stages, and/or treatment at low-volume institutions.35-37 Thus, based on past studies, it is possible that patients from minority racial and ethnic groups may be more likely to see low-volume thyroid surgeons and have more frequent complications, as higher complication rates among low-volume thyroid surgeons are well described.28,38
A number of studies have shown an increased risk of VFMI and/or poor voice outcomes with total thyroidectomy compared with thyroid lobectomy, offering a rationale for de-escalation of surgical extent for small tumors without nodal metastasis.2,39-41 However, our study included a small percentage of patients who underwent thyroid lobectomy and did not find statistically significant differences when comparing total thyroidectomy with thyroid lobectomy. With regard to the extent of surgery, our study found a higher risk of abnormal VHI-10 score when lateral neck dissection was performed. Nam et al42 found that lateral neck dissection was associated with objective pitch and vocal fold edema in the immediate postoperative period, as well as subjective voice abnormalities lasting much longer. While lateral neck dissection does not necessarily put the RLN at greater risk, it extends the operative field significantly, increases the postoperative bed of scarring, and disrupts the neck and perilaryngeal musculature to a greater degree, which may explain a stronger association with voice abnormalities by the VHI-10 instrument. As the indication for lateral neck dissection is known nodal disease and prophylactic lateral neck dissection is rarely performed, counseling and postoperative management are of greater importance in patients with lateral neck lymph node metastases.
Central neck dissection was not itself found to be significantly correlated with VFMI in this study, despite previously reported increased risk to the RLN when this compartment is dissected.43 Starmer et al43 found that most patients with VFMI following total thyroidectomy with reoperative central neck dissection showed clinically relevant changes in postoperative VHI scores. Central neck dissection is performed in nearly all cases where lateral neck dissection is performed; as such, using the SEER-defined pathologic variables in this instance may have underestimated the number of central neck dissections performed in our study. Although many of the tumor and surgical characteristics that we anticipated would correlate with poor voice outcome correlated on univariate analysis, when we controlled for additional patient and clinical characteristics, some findings were no longer statistically significant. Extrathyroidal extension was the most notable of these variables that correlated with both abnormal VHI-10 score and VFMI in the respective univariable logistic regression analysis but not in the respective multivariable logistic regression.
Strengths of this study include a large patient cohort that is representative of patients treated for thyroid cancer and is racially and ethnically diverse,44 the use of combined patient-reported outcomes with SEER data on tumor and treatment characteristics, and the use of the validated VHI-10 scale to measure voice abnormalities. The present study circumvents a number of limitations exhibited in prior reports of voice outcomes, including limited long-term follow-up, reliance on surgeon report, and common focus on single-institution studies with high-volume surgeons. Because patients were surveyed 2 to 4 years after diagnosis and the VHI-10 uses present tense, we believe that these data are more reflective of long-term outcomes rather than short-term transient problems that are perhaps within the expected bounds of surgical recovery.
Quiz Ref IDLimitations of this study include those common to patient-reported survey data but are tempered by the rigor of the population-based data collection through the SEER registry. Regarding the primary outcome of abnormal voice, the demographics of the cohort used to develop the VHI-10 are not well described and may differ from the large, diverse population in this study. Although patient survey is one of the few methods for obtaining patient reports of abnormal voice, for the secondary analysis on VFMI there is a risk of patient misunderstanding or recall bias. Reliance on patient report of VFMI also limits the ability to verify the degree of vocal fold paralysis or paresis. In addition, although SEER has exhaustive surgical data, for patients without positive lymph node metastases, details on the location of neck dissection are unknown. This uncertainty may minimize the significance of central vs lateral neck dissection. In addition, the fact that the proportion with 4 or more lymph nodes resected is greatest with lateral neck dissection may cloud the association between location vs number of lymph nodes resected and poor voice outcome. The data set does not include postoperative details of treatment, such as medialization procedures or voice therapy, and thus we cannot comment on the relative efficacy or indications of such interventions. Operative details, such as information about nerve monitoring, were also not available in this data set, although the effect of nerve monitoring on voice and nerve integrity is a current topic of interest that may be considered in subsequent studies.45
The high prevalence of patient-reported voice problems and identification of risk factors for poor voice outcomes in this study highlight the need for heightened awareness of voice abnormalities following surgery. Anticipated voice outcomes warrant consideration in the preoperative risk-benefit discussion, planned extent of surgery, and postoperative rehabilitation.
Accepted for Publication: May 20, 2019.
Corresponding Author: Megan R. Haymart, MD, Division of Metabolism, Endocrinology, and Diabetes, University of Michigan, North Campus Research Complex, 2800 Plymouth Rd, Bldg 16, Room 408E, Ann Arbor, MI 48109 (email@example.com).
Published Online: July 18, 2019. doi:10.1001/jamaoto.2019.1737
Author Contributions: Dr Haymart had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Kovatch, Reyes-Gastelum, Hughes, Haymart.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Kovatch, Reyes-Gastelum, Hughes, Haymart.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Kovatch, Reyes-Gastelum.
Obtained funding: Haymart.
Administrative, technical, or material support: Hughes, Hamilton, Ward, Haymart.
Supervision: Kovatch, Hughes, Haymart.
Conflict of Interest Disclosures: None reported.
Funding/Support: This study was funded by grant R01 CA201198 from the National Cancer Institute (NCI). The collection of cancer incidence data used in this study was supported by the California Department of Public Health pursuant to California Health and Safety Code Section 103885; Centers for Disease Control and Prevention (CDC) National Program of Cancer Registries, under cooperative agreement 5NU58DP006344, and the NCI SEER Program under contract HHSN261201800015I awarded to the University of Southern California. The collection of cancer incidence data in Georgia was supported by contract HHSN261201800003I, task order HHSN26100001 from the NCI and cooperative agreement 5NU58DP003875-04 from the CDC.
Role of the Funder/Sponsor: The funding organizations had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Disclaimer: The ideas and opinions expressed herein are those of the authors and endorsement by the State of California and State of Georgia Departments of Public Health, the NCI, and the CDC or their contractors and subcontractors is neither intended nor should be inferred.
Meeting Presentations: This work was presented at the 2018 American Head & Neck Society Meeting; April 18, 2018; National Harbor, Maryland; and the 2018 Michigan Otolaryngologic Society Meeting; July 28, 2018; Thompsonville, Michigan.
Create a personal account or sign in to: