Figure. Study selection flowchart.
McAteer JP, LaRiviere CA, Drugas GT, Abdullah F, Oldham KT, Goldin AB. Influence of surgeon experience, hospital volume, and specialty designation on outcomes in pediatric surgery: a systematic review. JAMA Pediatrics. Published online March 25, 2013. doi:10.1001/jamapediatrics.2013.25.
eTable. Characteristics of Individual Studies
McAteer JP, LaRiviere CA, Drugas GT, Abdullah F, Oldham KT, Goldin AB. Influence of Surgeon Experience, Hospital Volume, and Specialty Designation on Outcomes in Pediatric SurgeryA Systematic Review. JAMA Pediatr. 2013;167(5):468-475. doi:10.1001/jamapediatrics.2013.25
Author Affiliations: Division of Pediatric General and Thoracic Surgery, Seattle Children's Hospital, Seattle, Washington (Drs McAteer, Drugas, and Goldin); Department of Surgery, University of Washington School of Medicine, Seattle (Drs McAteer, Drugas, and Goldin); Department of Surgery, Louisiana State University, New Orleans (Dr LaRiviere); Department of Surgery, The Johns Hopkins University, Baltimore, Maryland (Dr Abdullah); and Division of Pediatric Surgery, Children's Hospital of Wisconsin, Milwaukee (Dr Oldham).
Importance Analyses of volume-outcome relationships in adult surgery have found that hospital and physician characteristics affect patient outcomes, such as length of stay, hospital charges, complications, and mortality. Similar investigations in children's surgical specialties are fewer in number, and their conclusions are less clear.
Objective To review the evidence regarding surgeon or hospital experience and their influence on outcomes in children's surgery.
Evidence Review A MEDLINE and EMBASE search was conducted for English-language studies published from January 1, 1980, through April 13, 2012. Titles and abstracts were screened in a standardized manner by 2 reviewers. Studies selected for inclusion had to use a measure of hospital or surgeon experience as a predictor variable and had to report postoperative outcomes as dependent response variables. Included studies were reviewed with regard to methodologic quality, and study results were extracted.
Findings Sixty-three studies were reviewed. Significant heterogeneity was detected in exposure definitions, outcome measures, and risk adjustment, with the greatest heterogeneity seen in appendectomy studies. Various exposure levels were examined: hospital level in 48 (68%) studies, surgeon level in 11 (17%), and both in 9 (14%). Nineteen percent of studies did not adjust for confounding, and 57% did not adjust for sample clustering. The most consistent methods and reproducible results were seen in the pediatric cardiac surgical literature. Forty-nine studies (78%) showed positive correlation between experience and most primary outcomes, but differences in outcomes and exposure definitions made comparisons between studies difficult. In general, hospital-level factors tended to correlate with outcomes for high-complexity procedures, whereas surgeon-level factors tended to correlate with outcomes for more common procedures.
Conclusions and Relevance Data on experience-related outcomes in children's surgery are limited in number and vary widely in methodologic quality. Future studies should seek both to standardize definitions, making results more applicable, and to differentiate procedures affected by surgeon experience from those more affected by hospital resources and system-level variables.
Hospital and surgeon characteristics (eg, operative volume, institution designation, and fellowship training) have often been implicated in influencing outcomes. These measures are generally thought to serve as proxy measures of provider experience and resources. Numerous investigations have examined the outcomes associated with surgical experience, with most studies focusing on operative volume. Although the conclusions of these studies do not necessarily establish a causal relationship, several have established a strong enough association to guide practice and shape policy.1- 5 Studies in adults have been numerous and generally consistent in methodology, but the quality and quantity of similar data in children are less consistent.
Although reviews of the volume-outcome relationship in adults have shown some methodologic shortcomings, the larger numbers of studies and of investigators engaged in adult research have made these data fairly reliable.6 Similar data in the children's surgical literature are sparse, despite the need for such data to help guide practice. Children's surgical practices, however, unlike those for adults, generally do not subspecialize in the performance of rare, technically demanding procedures. Moreover, the total numbers of rare pediatric procedures (eg, esophageal atresia repair and portoenterostomy) are far fewer than similar high-risk procedures in adults (eg, esophagectomy and pancreaticoduodenectomy). The relationship between surgical frequency and complexity as they affect outcomes in children must be understood better to provide the best care to these patients. Although guidelines have been developed by expert opinion highlighting the importance of involving specialists in children's surgical care, more data are needed to define the specific situations for which referral is indicated.7
To date, to our knowledge, no comprehensive review has examined the data on the correlation between hospital and surgeon characteristics and outcomes in children's surgery. To examine the association between such characteristics and outcomes in pediatric surgical patients, we reviewed observational studies in children assessing the relationship between prespecified measures of surgical experience and clinical outcomes.
We systematically reviewed English-language studies published since 1980 that focused on pediatric patients undergoing operative procedures. Included studies were required to have a measure of hospital or surgeon experience (operative volume, hospital designation, or surgeon subspecialty) as a predictor variable, and any clinical outcome (eg, mortality, complications, length of stay, and readmission) as a response variable. Studies that evaluated only patient characteristics at presentation rather than outcomes of care were excluded.
Studies were identified by searching the MEDLINE and EMBASE databases, with the most recent search performed on April 13, 2012. Our search process included the terms volume, regionalization, designation, specialty, pediatric, children, surgery, outcomes, outcome assessment, and volume-outcome. Bibliographies were reviewed, and experts were consulted about missed studies. Eligibility assessment was performed independently in a standardized manner by 2 of us (J.P.M. and A.B.G.) screening titles and abstracts. Articles were selected according to the aforementioned eligibility criteria, and disagreements were resolved through consensus.
Information was extracted from each study on (1) procedure of interest, (2) exposure level (hospital, surgeon, or both), (3) exposure definition, (4) primary outcome measure(s), (5) association between exposure and primary outcome(s), (6) database used (administrative, clinical, or none), (7) confounding adjustment, and (8) sample sizes. As in other systematic reviews organizing results by procedure, we elected to look at each separate procedure of interest in a single article as a stand-alone study.6
Validity within studies was evaluated by examining whether studies adjusted results for confounding. The adjustment method was recorded (eg, logistic regression), as well as whether authors reported model diagnostics (eg, goodness-of-fit tests). Authors' approach to risk adjustment was also recorded if the adjustment method accounted for disease severity or patient complexity to control for differences in case mix across hospitals and surgeons. Adjustment for demographic factors alone was not considered risk adjustment. Data were also gathered regarding whether authors adjusted for clustering and nonindependence of sampling. If the study found a statistically significant advantage (P < .05) in the primary outcomes for more “experienced” institutions or providers, results were considered positive. If only some outcomes were improved or if positive results were found only in certain subgroups, results were considered mixed. If no association was seen, results were considered negative. A uniform summary measure could not be applied due to the variation in outcome measures.
Our systematic review generated 1964 articles, of which 51 were selected after initial screening. After addition of 21 articles identified by bibliography review and personal communication, 72 articles were selected for further review. Of these, 10 articles that did not examine a measure of hospital or surgeon experience as a predictor and 8 articles that did not explicitly examine clinical outcomes were excluded. Of the remaining 54 articles, 6 examined more than 1 procedure, generating a total of 63 individual studies (Figure).
Studies were quite heterogeneous in methodology. We found variability in the exposure level analyzed, exposure definition, outcome measures, and, for volume studies, categorization of volume variables (eTable). Sample sizes were generally reasonable, with 62% (39 studies) using study populations larger than 1000 patients (Table 1). Only 2 studies8,9 evaluated sample sizes smaller than 100, and both reviewed rare conditions (congenital diaphragmatic hernia [CDH] and biliary atresia). Most studies (68% or 43 studies) focused on hospital-level characteristics. Studies focusing on surgeon-level characteristics tended to investigate more common, less resource-intensive procedures, such as appendectomy and pyloromyotomy.10- 16 Outcome measures varied according to diagnosis or procedure of interest, with most studies of high-acuity conditions (eg, congenital heart disease) considering primarily mortality and most studies of lower-acuity conditions considering combinations of other outcomes (eg, length of stay, readmission, charges, and complications).
Of the studies reviewed, 57% (36 studies) used a single specific exposure definition (annual hospital or surgeon operative volume for the procedure of interest), and the rest used a variety of definitions; 72% (31 of 43) of hospital-level studies used the annual volume definition, compared with only 45% (5 of 11) of surgeon-level studies. We also noted differences in definitions between specialties and procedures. Whereas 93% (13 of 14) of included studies on congenital heart disease used annual hospital volume as their volume definition, only 33% (3 of 9) of appendicitis studies used annual surgeon or hospital volume as a definition, the remainder considering primarily surgeon subspecialty or hospital designation.
In addition to the variability in exposure definition in general, studies specifically using volume as an exposure also varied in their treatment of the volume variable itself. Of studies using volume measures, 17% (9 of 52) treated volume as a continuous variable, and the rest categorized the variable in a myriad of ways that showed little consistency, even across studies of similar procedures (eTable). Similarly, the methods for delineating volume categories were not consistent. Some authors defined cut points a priori, but many used data-driven categories based on case number equality within groups or percentiles.
The quality of risk-adjustment methods varied widely (eTable). Eighty-one percent (51 of 63) of the studies used some form of adjustment, but 12% (6 of 51) of those studies failed to adjust for disease severity (Table 1). Only 43% (27 of 63) of studies adjusted for sample clustering. Although multivariate regression models were common across studies, only 4 studies reported model diagnostics, and 3 of them were in the congenital heart literature.
Regarding the type of database used, most studies (n = 37) used administrative data. Of 10 nondatabase studies, 5 (50%) used surgeon subspecialty or hospital designation rather than a volume measure as the exposure of interest, and 5 (50%) were appendicitis studies.
Although methodologic quality and results varied across studies, the most consistent literature was seen in congenital heart surgery (Table 2). Fourteen studies were evaluated, all of which uniformly used a volume measure as exposure. Twelve studies found significant correlation between increasing hospital volume and positive outcomes. All studies evaluated in-hospital mortality except 1 study17 that evaluated postdischarge mortality and found no association. Nearly all studies used rigorous risk-adjustment models, and most adjusted for clustering. These studies generally focused purely on hospital volume, although 1 study considered both surgeon and hospital volume and found significantly improved outcomes for both.18 All studies used well-established databases.
Four neurosurgery studies19,30- 32 were included, all using volume measures as exposures. Three studies focused on ventriculoperitoneal shunt placement, and the other evaluated craniotomies for tumor resection. Three studies used hospital or surgeon annual caseload as their volume definition, while 1 used the annual number of procedure-specific admissions. Two studies considered both surgeon and hospital volume, 1 surgeon volume alone, and 1 hospital volume alone. Risk-adjustment models were rigorous and the studies were based on large administrative samples. Three studies found a strong positive association between high surgeon and hospital volume and improved outcomes, but the study focusing on hospital volume alone found no association.
Three studies19,33,34 were included in otolaryngology, 2 focusing on tracheotomy and 1 on cleft lip repair. Both tracheotomy articles considered only hospital volume, and the study on cleft lip repair evaluated both surgeon and hospital volume. All 3 studies measured volume by annual caseload and used robust risk-adjustment models. Results were inconsistent across studies. One tracheotomy study found positive associations across all outcomes, and the other found no positive associations. Although the cleft lip repair study found generally positive results, the associations for various outcomes differed depending on whether the focus was on hospital or surgeon volume. Only high surgeon volume was correlated with a decreased rate of complications.
Two studies19,35 in orthopedics were identified, both examining spinal fusion. Both studies analyzed hospital volume and used administrative databases. Results were largely negative, because the only improvement seen at high-volume institutions was a lower reoperation rate.
Two abdominal transplantation studies36,37 were identified, 1 on liver and 1 on renal transplantation. Both studies considered hospital volume and used a United Network for Organ Sharing (UNOS) clinical database. Each investigation found a strong positive association between center volume and improved outcomes after adjustment.
Three urology studies met criteria. One study38 examined hospital volume as an exposure in patients undergoing bladder exstrophy repair, and 2 studies39,40 examined both hospital and surgeon volume in patients undergoing ureteral reimplantation. Strong risk-adjustment models were used. Results were mixed for hospital volume, but a positive association between surgeon volume and outcomes was observed for ureteral reimplantation.
Studies in pediatric general surgery varied widely in quality and results. The most consistent results were found in studies on CDH and biliary atresia. Similar to cardiac studies, hospital volume was consistently defined as annual caseload and mortality was the primary outcome. Three CDH studies8,49,50 used a well-established Canadian database, and the fourth study51 used a national US administrative database. These studies generally reported significantly decreased mortality at higher-volume hospitals. One study49 that did not report a significant association did not adjust for risk.
The biliary atresia studies describe the French and United Kingdom experience and used either national health registries or provider surveys as data sources. Both United Kingdom studies9,54 showed a strong association between increasing hospital volume and survival, with national outcomes improving after regionalization. The earlier French study revealed a similar association, but this association became statistically insignificant after implementation of national practice standards without formal regionalization.55,56
Most studies in the general surgical literature focused on appendectomy and pyloromyotomy, and these studies displayed the most significant variability in methods. The appendectomy literature in particular presented few common threads between studies. Only 3 studies analyzed volume measures,11,41,42 and only 4 used databases.16,41,42,44 Studies that did not use volume measures used hospital designation and/or surgeon subspecialty as exposures. Four studies reported positive results, 4 mixed, and 1 negative. The most consistent result was a lower negative appendectomy rate for more experienced surgeons and hospitals. Results for other outcomes were inconsistent. Rigorous risk adjustment was uncommon. Although surgeon-level characteristics generally showed stronger associations with outcomes than hospital-level characteristics, the heterogeneity in definitions and lack of risk adjustment made this finding difficult to interpret.
Pyloromyotomy studies generally used better risk-adjustment methods than appendectomy studies and typically demonstrated a strong association between surgeon experience and improved outcomes. Specifically, postoperative complications (eg, duodenal perforation) were noted to be lower and length of stay shorter for high-volume and pediatric surgeons.13- 15,46,47 Less association was reported for hospital-level characteristics, although 3 studies44,46,48 reported significantly decreased lengths of stay at high-volume centers and designated children's hospitals. In studies that looked at both surgeon volume and surgeon subspecialty, the effect of subspecialty on complications disappeared when surgeon volume was controlled for.15 Similar results illustrating the greater importance of surgeon volume vs subspecialty were noted in large-database studies of inguinal herniorrhaphy, cholecystectomy, and thyroid surgery.57- 60
The literature for other procedures was sparse. Two studies44,61 of intussusception reduction were included and showed generally negative results, although neither study adjusted adequately for disease severity or reduction modality. Only 2 cancer articles52,53 were reviewed, both of which assessed survival for Wilms tumor and neuroblastoma and found no association with hospital volume. Rather, these studies found hospital characteristics, such as Children's Oncology Group membership and use of specific chemotherapy regimens, to be strongly associated with outcomes.
To our knowledge, this systematic review is the first critical analysis of the available evidence for the relationships between outcome and volume or experience in children's surgery. Our review included 63 studies evaluating 25 distinct procedures. There is significant heterogeneity in study methodology and definitions. The limitations of these data relate to both heterogeneous definitions that impair comparisons across studies as well as limited risk adjustment that impairs the internal validity of certain studies. Although this heterogeneity precludes a formal meta-analysis, some important findings can be extracted from the cumulative results of these articles once the limitations are understood.
First, volume-related research is limited by the types of outcomes that can be reliably identified. This research relies on large samples and is therefore limited to existing large databases, which have limitations.62 In noting the concerns regarding the validity of clinical and administrative databases, combined approaches using both data types may prove useful, as long as standard definitions and methods are used.63,64 Identifying appropriate outcomes is a complex process and is a limiting factor in study design and in the validity of conclusions. In-hospital mortality is frequently used as a primary outcome measure in adult studies because it is easily and reliably identified, but this outcome is too rare for most pediatric procedures (eg, appendectomy). The single field in which mortality has been useful in children's surgery is congenital heart surgery research. There is strong evidence for a positive association between hospital volume and improved survival in this area. These articles highlight the importance of quality and consistency in study design, definitions, data sources, and risk adjustment in outcomes research. Although the exact mechanism of improved outcomes across the congenital cardiac literature remains unclear, these data have led to policy changes that have improved outcomes.65 The CDH literature shows a similar association, but the number and quality of studies have not been sufficient to effect such policy changes. Further research with larger numbers and more thorough adjustment models will help to solidify these findings.
Another limitation in the current literature is the variability in risk adjustment. Nineteen percent of studies did not adjust for any covariates, and of those that did, several adjusted only for demographic factors. It is essential to adjust for case mix because such factors have a strong influence on outcome disparities. Ideally, models should adjust for factors specifically related to the procedures and outcomes of interest. An excellent example is the Risk Adjustment for Congenital Heart Surgery score used in several cardiac studies.66 Other risk measures, such as diagnosis-related group scores, are somewhat less specific, and study results for such adjustment modalities in our review are largely negative.19,35 Adjustment for clustering is another important consideration, and only 43% of studies (n = 27) accounted for this. Outcomes investigations must use statistical methods that account for nonindependence of sampling because such clustering can affect risk estimates and standard errors.
Conclusions within the literature are also limited by the heterogeneity of exposure definitions. Exposures in these studies aim to capture some measure of hospital and surgeon experience, measured by operative volume, surgeon subspecialty, or hospital designation. The first conceptual question addresses why there should be a relationship between volume and outcome or between surgeon subspecialty or hospital designation and outcome and what the nature of that relationship is. Other authors have addressed theories behind this,67 but here we will simply address the issue as to whether volume and specialty or designation are simply associated with improved outcome, or if they are causally related to improved outcome. It is important to understand these underlying theories because exposure definition affects conclusions. In general, the surgeon-volume definition of experience argues for a causal relationship—the “practice makes perfect” theory. Hospital volume is the definition that has been most consistently applied in the literature and is often assumed to be a proxy for surgeon volume, and therefore again assumed to be describing a causal relationship. Specialty or designation definitions also imply a causal relationship: expertise and experience lead to improved outcomes. With different exposure definitions, one may not necessarily be superior to the other, but some of the most informative studies considered both volume and specialty or designation measures.11,15,46,57,58 Similarly, studies assessing both surgeon- and hospital-level exposures are able to assess the relative contribution of each. Regarding volume measures specifically, investigators should try to define volume categories a priori and standardize volume cutoffs for specific procedures. This enhances generalizability and limits bias. Another option is to treat volume as a continuous variable, but this can impair interpretability and limit the ability to make clinical or policy recommendations.
A fourth limitation in these studies is the a priori temporal relationship between volume and outcome defined in the study design. If the assumption is that increased volume is causally related to improved outcome, then the exposure must precede the outcome. All of the volume measure studies in our review define volume within the same time period as the outcomes being measured. In this case, there is no time for a “practice makes perfect” relationship to be established. Preferably, researchers should define the volume-outcome relationship using hospital or surgeon caseloads from prior time periods to define volume strata for subsequent time periods. For example, 1 study (C.A.L., J.P.M., J. Huaco, MD, MPH, M. Garrison, PhD, J. Avansino, MD, T. Koepsell, MD, MPH, K.T.O., and A.B.G., June 2011, unpublished data) used the average annual hospital volume during a 2-year period to define the volume category to be associated with outcomes observed in the third year. This can help to decrease exposure misclassification. By the same token, only outcomes that temporally follow the exposure of interest should be assessed. For example, several studies assessed perforation rate as an outcome in appendectomy studies, even though this is a factor determined at presentation.68,69
The most important point manifested by the results of these studies is that hospital-level characteristics are often strongly associated with improved outcomes in less common, more complex problems (eg, CDH and congenital heart surgery), whereas surgeon-level factors appear to be more important in more common, less resource-demanding procedures (eg, appendectomy, pyloromyotomy, ureteral reimplantation, and cleft lip repair) as well as in procedures commonly encountered in adult surgery (thyroidectomy, inguinal herniorrhaphy, and cholecystectomy). These results highlight the importance of surgeon- vs system-level factors, depending on the condition of interest. Although surgeon volume and subspecialty probably represent measures of practitioner experience and competence, hospital volume and designation may serve as proxies for other characteristics of large centers of care. This point is highlighted by the findings of studies on solid tumors,53 showing improved care at Children's Oncology Group–designated centers, regardless of volume, and of the French studies in patients with biliary atresia,56 showing improved outcomes after nationwide standardization but no formal regionalization policy. Ideally, future research should evaluate surgeon- and hospital-level measures simultaneously, which will provide more information than evaluating either independently.
The system-level factors at play in high-complexity procedures include numerous multidisciplinary resources. Infants and children with complex surgical diagnoses require high-intensity therapies from many sources. Nurses, neonatologists, intensivists, anesthesiologists, and subspecialty consultants are essential components of the system necessary to care for the sickest children. High hospital volume, therefore, may not be the key exposure that is being captured in complex procedures. Rather, the system of care associated with high volume (or a children's hospital designation) is the key. Therefore, policies designed to ensure that children requiring resource-demanding procedures are cared for in fully equipped environments may improve outcomes.
One possible way to produce this improvement would be to define and standardize the hospital-level resources (eg, pediatric surgeons, anesthesiologists, and nurses) necessary to care for pediatric patients with low-, medium-, or high-complexity surgical conditions. Several specialty guideline articles70- 73 have been written to suggest system-level requirements needed to create the safest environment for pediatric surgical patients, much like the United States trauma system that has led to clear improvements in trauma outcomes.74 With the adoption of national guidelines for hospital-level resources needed to care for low-, medium-, and high-acuity patients, many hospital-level confounders will be removed and comparisons between hospitals and physicians will become more robust and effective. This will address many of the limitations of research using existing databases and may facilitate the creation and use of national disease-specific outcome registries.
Although conclusions regarding experience and outcomes in pediatric surgery can be drawn in some areas, as already highlighted, a great deal more work must be done to fully elucidate this relationship and the circumstances under which it is most important. Children's surgery presents special challenges in assessing the influence of hospital- and surgeon-level factors on patient outcomes. This fact, coupled with the ever-increasing societal focus on outcomes in medicine, makes it essential that 2 things occur in parallel. First, high-quality clinical research must be conducted to identify the key hospital and surgeon factors that affect patient outcomes, with an initial focus on high-acuity resource-intensive procedures, such as intracavitary neonatal procedures. Second, given the amount of time that it will take to complete such research, we should define and consider implementing resource standards for specific conditions that reflect current data and expert consensus opinion. Our review of the literature highlights a number of strengths and limitations in the present evidence base, and clarifying these may help inform the design of future studies that would be more amenable to comparison and meta-analysis.
Correspondence: Jarod P. McAteer, MD, Division of Pediatric General and Thoracic Surgery, Seattle Children's Hospital, 4800 Sand Point Way NE, Seattle, WA 98105 (email@example.com).
Accepted for Publication: December 21, 2012.
Published Online: March 25, 2013. doi:10.1001/jamapediatrics.2013.25
Author Contributions: Dr McAteer had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: McAteer, LaRiviere, Oldham, and Goldin. Acquisition of data: McAteer and Goldin. Analysis and interpretation of data: All authors. Drafting of the manuscript: McAteer and Goldin. Critical revision of the manuscript for important intellectual content: LaRiviere, Drugas, Abdullah, Oldham, and Goldin. Statistical analysis: McAteer and Goldin. Administrative, technical, and material support: Drugas and Goldin. Study supervision: Abdullah, Oldham, and Goldin.
Conflict of Interest Disclosures: None reported.