At what point does increasing annual surgeon total thyroidectomy volume begin to be associated with lower complication rates?
In this cohort study of 10 546 patients who underwent total thyroidectomy, a generalized additive model depicted that occurrence rates of vocal cord paralysis and hypoparathyroidism began to decrease at an annual surgeon procedure volume of 18 thyroidectomies. A generalized linear mixed-effects model showed that small but statistically significant decreases in complication rates were associated with an increase in the annual volume.
Information about the point at which complication rates begin to decrease and the magnitude of this decrease in association with increase in annual surgeon volumes may enable system-level planning about care patterns for patients with thyroid nodules.
Although the association between annual surgeon total thyroidectomy volume and clinical outcomes is well established, published methods typically group surgeons into volume categories. The volume-outcomes association is likely continuous, but little is known about the point at which the annual surgeon procedure volumes begin to be associated with a decrease in complication rates.
To model the volume-outcomes association as a continuous function and identify the point at which increasing surgeon volume begins yielding better outcomes.
Design, Setting, and Participants
A retrospective cohort study was conducted in 2018 to 2019 on 10 546 patients from 2 Kaiser Permanente regions (Northern and Southern California), who underwent total thyroidectomy from January 1, 2008, through December 31, 2015, and were followed up through December 31, 2017. The association between annual surgeon procedure volume and outcomes was modeled with analyses that accounted for an association of unknown form and surgeon-specific effects, after adjusting for sociodemographics, prior-year utilization, and multiple comorbidities. Data were analyzed from October 2018 to April 2019.
Main Outcomes and Measures
Presence or absence of transient and permanent hypoparathyroidism and vocal cord paralysis (VCP) in relation to surgeon volume of total thyroidectomies.
Of 10 546 patients in this study, 8500 (81.0%) were male and 4877 (46.2%) aged 45 to 64 years. Surgeons with annual volumes of 1 to 9 total thyroidectomies operated on 2912 patients (27.7%), those with an annual volume of 10 to 19 operated on 3404 (32.6%), and those with an annual volume of 20 or more operated on the remaining 4232 (40.6%). During 2008-2015, a mean of 53.5 (range, 46-198) thyroidectomies were performed each year by surgeons with an annual volume of 40 or more procedures. A generalized additive model showed that the occurrence rates of VCP and hypoparathyroidism began to decrease at annual surgeon procedure volumes of 18.2 (95% CI, 15.0-21.5) and 18.1 (95% CI, 13.8-21.3) procedures per year, respectively. The model revealed a subsequent increase in complication rates for transient VCP. With the use of a refined model, statistically significant decreases were noted in the occurrence rates of complications as annual surgeon volumes increased. Among all 10 546 patients who underwent total thyroidectomy, 632 (6.0%) experienced transient hypoparathyroidism and 170 (1.6%) experienced permanent hypoparathyroidism, whereas 440 (4.2%) experienced transient VCP and 182 (1.7%) experienced permanent VCP. Absolute decreases in complication rates when all surgeons had modeled minimum annual procedure volumes greater than 40 were low, ranging from 0.6% for permanent VCP and hypoparathyroidism to 1.5% for transient hypoparathyroidism.
Conclusions and Relevance
In this study, occurrence rates of transient and permanent hypoparathyroidism and VCP appeared to decrease as the annual surgeon procedure volume increased, but the absolute decrease may be modest if the affected health system already has low complication rates. Shifting patients to higher-volume surgeons to realize these reductions may be of variable attractiveness in systems with low baseline complication rates.
Thyroid disease is increasingly common, with prevalence estimates for thyroid nodules among adults reaching 67% for detection using high-resolution ultrasonography.1,2 In 2011, US surgeons performed approximately 130 000 surgical procedures related to thyroid nodules and, based on 5-year trends, an estimated 169 000 in 2016.3 Complications of thyroidectomy include transient or permanent hypoparathyroidism, vocal cord paralysis (VCP), and hematoma. Many studies demonstrate that when high-volume surgeons perform thyroidectomy, patients experience fewer complications and have shorter hospital stays and lower readmission rates.4-11
However, the number of annual procedures that constitutes high volume is not clear. Previous studies identified high-volume surgeons as those performing as few as 6 and as many as 100 procedures per year, using volume categories chosen to ensure comparable numbers of surgeons across categories or to minimize bias.4,5,7-14 Dichotomizing relatively small populations and demonstrating that higher surgeon volume is associated with improved outcomes is not the same as identifying a volume cut point above which the outcomes improve.15 This is not merely an academic issue; US hospital systems have pledged to prevent selected complex elective surgical procedures from being performed by low-volume surgeons.16 The Leapfrog Organization, a nonprofit consortium of large US health care purchasers, consumer advocacy organizations, and regional business coalitions on health, has introduced a new measure addressing the minimum hospital and surgeon volumes that are associated with improved patient outcomes for 10 procedures; as of 2018, hospitals’ scores are publicly reported on these volume standards.17 Although the initial standards do not include thyroidectomy, quality standards for hospital and surgeon volume are likely to extend to other procedures over time.
A related factor is the difficulty of implementing surgeon volume cut points.18 Surgeon acceptance may be limited when volume categories are defined arbitrarily or for methodologic purposes. It is difficult to defend the notion that there is an abrupt change in outcomes between performing, for instance, 99 and 100 thyroidectomies per year. The association between surgeon volume and outcomes is almost certainly continuous.19 Thus, in the present study, we aimed to examine whether the association between surgeon volume and selected outcomes for total thyroidectomy is continuous, testing the null hypothesis that no statistically significant cut point would emerge from the data.
Study Design, Setting, and Oversight
The retrospective observational study was conducted between October 2018 and April 2019 in 2 Kaiser Permanente regions, Northern and Southern California, having a collective total of approximately 8.5 million members as of March 2017. Thyroidectomy is performed at 43 Kaiser Permanente–owned medical centers by nearly 200 surgeons, including general surgeons and otolaryngologists with and without specialty training. Of the total thyroidectomies performed during 2008-2015, 8014 (73%) were performed by head and neck surgeons and 2938 (27%) by general surgeons. The Kaiser Permanente Northern California Institutional Review Board approved this study and waived the need for patient consent because all data were obtained by retrospective medical record review.
The study population consisted of patients who underwent a single thyroidectomy between January 1, 2008, and December 31, 2015, and were followed up through December 31, 2017. Patients were identified by the International Classification of Diseases, Ninth Revision (ICD-9) codes for total thyroidectomy (complete [06.4], complete with sternotomy [06.39]).
Outcomes were transient and permanent hypoparathyroidism and VCP, measured dichotomously as present or not. Transient outcomes occurred 1 to 30 days postoperatively, whereas the permanent outcomes occurred 366 to 730 days postoperatively for hypoparathyroidism20 and 180 to 540 days postoperatively for VCP. Hypoparathyroidism was identified by a parathyroid hormone level less than 10 pg/mL, serum calcium level less than 8.0 mg/dL, or serum calcium level less than 10.5 mg/dL unless furosemide use occurred within 30 days of testing or active use of calcitriol or high-dose (50 000 IU) ergocalciferol (vitamin D2) was documented. (Conversion of the parathyroid hormone level to nanograms per liter is 1:1; to convert calcium levels to millimoles per liter, multiply by 0.25.) Vocal cord paralysis was identified by ICD-9 codes (478.3*, 478.4, 478.5, 784.49, 874.01, 908.0, 959.09, 874.11, and 906.0).
Independent Variable and Covariates
We measured surgeon volume as the number of total thyroidectomies each surgeon performed in individual calendar years. Other variables measured dichotomously included region (Northern or Southern California), sex, race/ethnicity, and past or present comorbid conditions (Hashimoto disease, Graves disease, thyroid cancer, obstructive sleep apnea, visual impairment, hearing loss, disseminated cancer, hypertension, chronic obstructive pulmonary disease, congestive heart failure, hyperlipidemia, bleeding disorders, diabetes, asthma, dementia, depression, pregnancy, radiotherapy within the past 90 days, chemotherapy within the past 30 days, dyspnea, corticosteroid medication use, anticoagulant therapy, dialysis, acute myocardial infarction, stroke, and seizures (eTable 1 in the Supplement provides the ICD-9 codes for these conditions). Categorical variables were created for age, body mass index, distance to nearest medical office building, Charlson comorbidity index, year of procedure, and DxCG risk score (Verisk Health, Inc).21 Emergency department, outpatient, and inpatient care use during the previous 12 months were also included. All data were obtained from electronic health records.
Data analysis was performed between October 2018 and April 2019. Categorical variables were created as follows: age: younger than 18, 18 to 44, 45 to 64, and 65 years or older; distance to nearest medical office building: less than 8, 8 to 32, and 32 km or more; body mass index (calculated as weight in kilograms divided by height in meters squared): lower than 18.5, 18.5 to 24.9, 25.0 to 29.9, and 30 or higher; Charlson comorbidity index: 0, 1 to 4, 5 to 9, and 10 or higher (higher values indicate greater burden of illness); and DxCG score: lower than 1, 1 to 4.99, and 5 or higher (higher values indicate greater risk). Missing values were replaced with the median value and a missing indicator was included. Among the cases, 150 (1.4%) were missing values for the same 5 variables: asthma, hyperlipidemia, depression, diabetes, and heart failure; these cases were assigned a missing variable group indicator. We did not perform sensitivity analyses to evaluate the impact of the few missing data.
A generalized additive logistic regression model (GAM) was fitted to examine the association of surgeon volume with the rates of transient hypoparathyroidism and VCP, controlling for all covariates (eTable 2 in the Supplement). The variables included in the model were selected by clinical judgment. The GAM provides a flexible method for examining the association of a continuous risk factor with an outcome when the nature of the association (ie, whether it is linear) is unknown.22 The model is flexible enough to approximate quadratic, cubic, and other nonlinear shapes without prespecifying the functional form and works well for estimates of hierarchical models. The functional form of the volume-outcomes association was calculated nonparametrically, with the value of the smoothing parameter calculated to optimize the penalized likelihood function.
The optimal surgeon procedure volume was calculated by observing the point at which the GAM intersected with y = 0 (mean risk). The upper and lower bounds of the 95% CIs were determined by multiplying the SE provided in the GAM function by 1.96 and similarly observing at what surgeon volumes the CIs intersected the y = 0 line.
The generalized additive mixed-effects modeling (GAMM) was used to address the nonindependence of observations from the same surgeons over multiple years. The advantage of GAMM is the use of the GAM smoothing parameter to address nonlinear associations between independent and dependent variables, with the additional inclusion of surgeon-specific random effects. The disadvantages are computational intensity and limits to potential model variables. In the GAMM analysis, sex, region, year of surgery, hyperlipidemia, DxCG score category, Charlson score, dyspnea, acute myocardial infarction, dialysis, seizures, stroke, anticoagulant use, thyroid cancer, and emergency department and outpatient encounters in the previous 12 months were included as covariates.
Finally, generalized linear mixed-effects modeling (GLMER), in which the linear predictor contains random effects, was used with the full range of covariates used in GAM. There is no smoothing parameter in the equation. The benefit of GLMER is the inclusion of surgeon-specific random effects. The results are expressed as mean estimated risk and relative changes in mean estimated risk of transient and permanent hypoparathyroidism and VCP for the entire population if surgeons with 1 to 9, 1 to 19, 1 to 29, and 1 to 39 annual procedures performed the minimum number of procedures in the next-highest volume category; for example, if all surgeons with annual volumes of 1 to 19 procedures performed 20 procedures per year.
The software environment R, version 3.5.0 (The R Foundation) was used for the analysis. The R packages mgcv version 1.8-24, gamm4 version 0.2-5, and lme4 version 1.1-18-1 were used for GAM, GAMM, and GLMER, respectively. P values were calculated from a Wald-type test using the Bayesian covariance matrix. All tests were 2-sided with α = .025.
A total of 10 546 total thyroidectomy procedures were included in the analyses. Of these, 8500 patients (81.0%) were male and 4877 (46.2%) were aged 45 to 64 years. Additional patient characteristics are summarized in eTable 3 in the Supplement. Surgeons with annual volumes of 1 to 9 total thyroidectomies operated on 2912 patients (27.7%), those with an annual volume of 10 to 19 operated on 3404 patients (32.6%), and those with an annual volume of 20 or more operated on the remaining 4232 patients (40.6%). During 2008-2015, a mean of 53.5 (range, 46-198) thyroidectomies were performed each year by surgeons with an annual volume greater than or equal to 40 procedures (after excluding the outlier high-volume surgeon).
Among all patients who underwent total thyroidectomy, 632 (6.0%) experienced transient hypoparathyroidism, and 170 (1.6%) experienced permanent hypoparathyroidism, whereas 440 (4.2%) experienced transient VCP and 182 (1.7%) experienced permanent VCP. Absolute decreases in complication rates when all surgeons had modeled minimum annual procedure volumes greater than 40 were low, ranging from 0.6% for permanent VCP and hypoparathyroidism to 1.5% for transient hypoparathyroidism.
Figure 1 depicts both outcomes modeled with GAM. In Figure 1A, we calculated the point at which the curve crosses 0 on the y-axis as occurring at 18.2 (95% CI, 15.0-21.5) procedures per year, representing the annual surgeon procedure volume at which the rate of transient hypoparathyroidism begins to decrease. Although the curve paradoxically crosses the line again at higher annual volumes, this association was not statistically significant. A similar curve was obtained for permanent hypoparathyroidism (Figure 1C). In Figure 1B, we calculated the point at which the curve first crosses 0 on the y-axis as occurring at 18.1 (95% CI, 13.8-21.3) procedures per year, representing the point at which the rate of transient VCP begins to decrease below the mean. However, the curve in Figure 1B crosses 0 again at 56.7 (95% CI, 53.8-58.3) procedures per year, indicating that a higher surgeon procedure volume was associated with transient VCP. This paradoxical effect was not observed for permanent VCP (Figure 1D), although the point at which the curve first crosses the line was similar.
To understand surgeon-specific random effects within the data, we then applied GAMM, the results of which are depicted in Figure 2. The curves for transient and permanent hypoparathyroidism (Figure 2A and C) again suggest a cut point for a volume just below 20 procedures per year and a paradoxical association between surgeon volume and outcomes that did not reach statistical significance. However, the conical nature of the curves for transient (Figure 2B) and, particularly, permanent VCP (Figure 2D) around the y = 0 line suggests that this was the wrong type of model to use and that using a smoothing parameter is excessive.
Attempting to improve model performance, we reviewed the outcomes data for surgeons at the high end of the volume range and identified that a single surgeon accounted for observed procedure volumes approaching 100 and the associated higher complication rates. This surgeon was the only outlier in terms of both volume and outcomes. We excluded this surgeon from further analyses, noting that the only substantial change in the overall complication rates was a decrease from 4.2% (440 complications) to 3.4% (346 complications) in the rate of transient VCP. All curves subsequently modeled with GAMM (Figure 3) displayed the same conical shape, indicating that the smoothing parameter was unnecessary and the association between volume and outcomes was indeed linear.
Consequently, we modeled the association between volume and outcomes with GLMER to incorporate surgeon-specific random effects in a linear relationship. Table 1 provides the number of procedures that were modeled as being performed by a surgeon with the minimum annual volume in the next-highest volume group. The proportion of these procedures was 29.0% (2912 procedures) for surgeons with annual volumes of 1 to 9 procedures (modeled as performed by surgeons with a volume of 10 annual procedures) and 95.8% (9644 procedures) for surgeons with annual volumes of 1 to 39 (modeled as performed by surgeons with a volume of 40 annual procedures). The estimated risk of transient and permanent hypoparathyroidism and VCP decreased steadily to a statistically significant degree as annual volumes increased (Table 2). The decrease in risk when procedures performed by surgeons with an annual volume of 1 to 19 thyroidectomies were assumed to be completed by those with an annual volume of 20 was substantially larger than the decrease in risk when the assumed increase in annual surgeon thyroidectomy volume was from 1 to 9 to 10. This was consistent with the finding of the GAM analysis that outcomes begin to improve at approximately 18 procedures per year. The relative changes in estimated risk were substantial (Table 2), although the overall low rates for these complications imply that the associated absolute changes in risk were small. For instance, if all surgeons with annual volumes less than 40 performed at least 40 procedures per year, the risk of permanent VCP would decrease by 36.5%, but the absolute reduction in risk would be 0.6%.
We used multiple modeling approaches to investigate the form of the association between annual surgeon procedure volumes and 2 transient and permanent complications of total thyroidectomy. After adjusting for multiple covariates and excluding a single high-volume surgeon with outlier complication rates, we found that the association was linear. However, the very low overall rates of these complications in our system limit the degree to which the statistically significant findings may have practical utility.
Many previous studies have examined thyroidectomy outcomes for predetermined categories of annual surgeon procedure volumes. In contrast, Adam et al23 also examined the association between volume and outcomes as a continuous function with multiple logistic regression with restricted cubic splines and found a corresponding inflection point at approximately 25 thyroidectomies per year. However, the results are not comparable because those authors examined only in-hospital complications. Many reports, including that by Adam et al,23 have also relied on national surgical or inpatient databases to examine the complications after thyroidectomy, limiting the observation period to either the inpatient stay alone or the first 30 days after surgery.5,7,9-14 Rates of permanent VCP and hypoparathyroidism that extend past the first 30 postoperative days have been reported less often. Among 4 studies examining permanent rates of VCP and hypoparathyroidism, 2 also identified permanent complications as occurring at 6 months after surgery, although the study follow-up period was also 6 months.24,25 The reported rates of permanent VCP were 2.1% and 3.1%,24,25 to which our rate of 1.8% for permanent VCP 6 to 18 months after surgery compares favorably. The other 2 studies identified permanent VCP as occurring after 1 year and reported rates of 1.0% to 1.1%.26,27 A systematic review of studies reporting permanent VCP rates after 3 months to 1 year found a mean rate of 2.3% (range, 0.3%-18.6%).28 Reported rates of permanent hypoparathyroidism were 1.7% to 2.7% during follow-up periods that ranged from 6 months to 5 years.24-26
Our findings suggest a statistically significant linear association between volume and outcomes. However, the clinical significance in our setting is unclear owing to low baseline complication rates and the number of patients who must be treated by higher-volume surgeons to avoid 1 additional complication. During 2008-2015, a mean of 53.5 (range, 46-198) thyroidectomies were performed each year by surgeons with an annual volume greater than or equal to 40 procedures (after excluding the outlier high-volume surgeon). Avoiding a single case of permanent VCP per year would require redirecting an additional 167 patients to surgeons performing 40 or more thyroidectomies per year. However, only approximately 25% of the patients with unilateral or bilateral VCP require subsequent interventions;29 therefore, up to 668 patients would need to be moved each year from lower-volume surgeons to those performing 40 or more thyroidectomies annually to avoid an additional case of VCP requiring intervention.
Our approach has been to set the goal for surgeons performing total thyroidectomies to aim for at least 20 procedures per year, beginning in 2017. Between 2017 and 2018, the proportion of thyroidectomies performed by surgeons in the Northern California region with an annual volume of more than 20 procedures increased from 64% (930 of 1455) to 82% (1307 of 1593), representing 377 additional patients cared for by higher-volume surgeons. By the end of 2018, surgeons performed a mean (SD) of 24.8 (19.5) thyroidectomies per year (unpublished data).
Disparities in clinical outcomes for patients undergoing thyroidectomy have been documented.30 Racial/ethnic differences exist in access to experienced surgeons, and uninsured status is associated with undertreatment among patients with differentiated thyroid cancer.30-32 Across our organization, all patients are insured and substantial effort is directed at reducing racial/ethnic care disparities.33 These may account, in part, for the observed low decreases in absolute risk with increase in surgeon volume. Additional factors that may play a role include the fact that surgeons are salaried and lack financial incentives to increase operative volumes or build a referral base that may motivate less experienced surgeons to perform thyroidectomies. Primary care physicians and endocrinologists refer patients for surgery, and surgeons decide whether they can appropriately care for individual cases. Finally, in our integrated care delivery system, multidisciplinary collaboration is common in the care of patients with thyroid nodules.
Strengths and Limitations
The strengths of our study include the large population and multiyear data that include occurrence rates of permanent complications. The use of multiple modeling approaches allowed us to investigate the form of the association between annual procedure volumes and 2 important outcomes of interest. In addition, we adjusted for multiple observations from the same surgeons across 8 years.
Limitations of our study include the fact that, although we included a robust list of covariates, unmeasured confounders may have affected our findings. When electronic health record data are used, a potential for coding errors and misclassification exists; we did not validate coding accuracy. We did not risk adjust the data, and our study took place in an integrated health care delivery system. Although we located a single report of a paradoxical association between greater surgeon experience in years and patient outcomes,24 previous studies have concluded that surgeon procedure volumes are uniformly associated with better patient outcomes,5,7,9-14,34 which appears to confirm our assumption that the single surgeon we observed with higher complication rates represented an outlier and should be excluded. We note that the observation period preceded a concerted effort across Kaiser Permanente to consolidate care for patients with thyroid nodules to higher-volume surgeons and the dissemination of multidisciplinary workflows based on the 2015 American Thyroid Association guidelines.35,36
Although our results suggest that the impact of surgeon volume on important complications of thyroidectomy may be less than that previously reported, they should be generalized with great caution to settings where access issues and fragmented care may be more likely. We suggest that future research should explore the nature of the association between annual thyroidectomy volumes and important quality outcomes in a variety of settings by using longitudinal patient data.
Accepted for Publication: May 22, 2019.
Corresponding Author: Charles Meltzer, MD, The Permanente Medical Group, 401 Bicentennial Way, Santa Rosa, CA 95403 (firstname.lastname@example.org).
Published Online: July 25, 2019. doi:10.1001/jamaoto.2019.1752
Author Contributions: Dr Meltzer and Ms Hull had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Meltzer, Adams.
Acquisition, analysis, or interpretation of data: Hull, Sundang, Adams.
Drafting of the manuscript: Meltzer, Sundang.
Critical revision of the manuscript for important intellectual content: Meltzer, Hull, Adams.
Statistical analysis: Meltzer, Hull, Adams.
Administrative, technical, or material support: Sundang.
Conflict of Interest Disclosures: None reported.
MA. Profile of a clinical practice: thresholds for surgery and surgical outcomes for patients with primary hyperparathyroidism: a national survey of endocrine surgeons. J Clin Endocrinol Metab
. 1998;83(8):2658-2665. doi:10.1210/jcem.83.8.5006PubMedGoogle ScholarCrossref
E. Association of socioeconomic status, race, and ethnicity with outcomes of patients undergoing thyroid surgery. JAMA Otolaryngol Head Neck Surg
. 2014;140(12):1173-1183. doi:10.1001/jamaoto.2014.1745PubMedGoogle ScholarCrossref
E. Association of surgeon volume with outcomes and cost savings following thyroidectomy: a national forecast. JAMA Otolaryngol Head Neck Surg
. 2016;142(1):32-39. doi:10.1001/jamaoto.2015.2503PubMedGoogle ScholarCrossref
et al. American Association of Clinical Endocrinologists and American College of Endocrinology disease state clinical review: postoperative hypoparathyroidism—definitions and management [published correction appears in Endocr Pract
. 2015;21(10):1187]. Endocr Pract
. 2015;21(6):674-685. doi:10.4158/EP14462.DSCPubMedGoogle ScholarCrossref
R. The Elements of Statistical Learning: Data Mining, Inference, and Prediction
. 2nd ed. New York, NY: Springer; 2009. doi:10.1007/978-0-387-84858-7
et al; CATHY Study Group. Influence of experience on performance of individual surgeons in thyroid surgery: prospective cross sectional multicentre study. BMJ
. 2012;344:d8041. doi:10.1136/bmj.d8041PubMedGoogle ScholarCrossref
DF. Epidemiology of vocal fold paralyses after total thyroidectomy for well-differentiated thyroid cancer in a Medicare population. Otolaryngol Head Neck Surg
. 2014;150(4):548-557. doi:10.1177/0194599814521381PubMedGoogle ScholarCrossref
et al. Disparities in the care of differentiated thyroid cancer in the United States: exploring the national cancer database. Am Surg
. 2017;83(7):739-746.PubMedGoogle Scholar
B. Population care management and team-based approach to reduce racial disparities among African Americans/blacks with hypertension. Perm J
. 2016;20(1):53-59.PubMedGoogle Scholar
et al. Evidence-based workflows for thyroid and parathyroid surgery. Perm J
. 2016;20(3):16-035.PubMedGoogle Scholar
et al. American Head and Neck Society Endocrine Section clinical consensus statement: North American quality statements and evidence-based multidisciplinary workflow algorithms for the evaluation and management of thyroid nodules. Head Neck
. 2019;41(4):843-856. doi:10.1002/hed.25526PubMedGoogle ScholarCrossref