PELD indicates Pediatric End-stage Liver Disease.
Orange shading indicates the 95% CI for the reduced cohort; gray shading, 95% CI for the full cohort.
eTable 1. Primary diagnosis classification and the UNOS end-stage liver disease codes
eTable 2. Actual and predicted 90-day pretransplant mortality for all and by PELD category
Customize your JAMA Network experience by selecting one or more topics from the list below.
Chang CH, Bryce CL, Shneider BL, et al. Accuracy of the Pediatric End-stage Liver Disease Score in Estimating Pretransplant Mortality Among Pediatric Liver Transplant Candidates. JAMA Pediatr. 2018;172(11):1070–1077. doi:10.1001/jamapediatrics.2018.2541
Do Pediatric End-stage Liver Disease scores adequately estimate 90-day pretransplant mortality among pediatric patients?
In this study of 4298 patients with chronic liver disease identified using the United Network for Organ Sharing pediatric waiting list, Pediatric End-stage Liver Disease scores and mortality were concordant; however, the estimated risk using the Pediatric End-stage Liver Disease score significantly underestimated actual 90-day mortality.
The Pediatric End-stage Liver Disease system may be flawed and may disadvantage children awaiting liver transplant when used to adjudicate organ allocation decisions; a new system that reflects actual 90-day mortality should be developed.
Fair allocation of livers between pediatric and adult recipients is critically dependent on the accuracy of mortality estimates afforded by the Pediatric End-stage Liver Disease (PELD) and Model for End-stage Liver Disease, respectively. Widespread reliance on exceptions for pediatric recipients suggests that the 2 systems may not be comparable.
To evaluate the accuracy of the PELD score in estimating 90-day pretransplant mortality among pediatric patients on the United Network for Organ Sharing (UNOS) waiting list.
Design, Setting, and Participants
Patients who were listed from February 27, 2002, to March 31, 2014, for primary liver transplant were included in this retrospective analysis and were followed up for at least 2 years through June 17, 2016. The study analyzed 2 cohorts using the UNOS Standard Transplant Analysis and Research data files. The full cohort comprised 4298 patients (<18 years of age) who had chronic liver disease (excluding cancer). The reduced cohort (n = 2421) excluded patients receiving living donor transplantation or PELD exception points.
Main Outcomes and Measures
Observed and expected 90-day pretransplant mortality rates evaluated at 10-point interval PELD levels.
Among the 4298 patients in the full cohort (mean [SD] age, 2.5 [4.2] years; 2251 [52.4%] female; 2201 [51.2%] white), PELD scores and mortality were concordant (C statistic, 0.8387 [95% CI, 0.8191-0.8584] for the full cohort and 0.8123 [95% CI, 0.7919-0.8327] for the reduced cohort). However, the estimated 90-day mortality using the PELD score underestimated the actual probability of death by as much as 17%.
Conclusions and Relevance
With use of the PELD score, the ranking of risk among children was preserved, but direct comparisons between adult and pediatric candidates were not accurate. Children with chronic liver disease who are in need of transplant may be at a disadvantage compared with adults in a similar situation.
The Pediatric End-stage Liver Disease (PELD) score has been used to allocate livers for transplant in children since February 27, 2002, when the Organ Procurement and Transplantation Network (OPTN) adopted a prioritization algorithm based on risk of 90-day pretransplant death (ie, death without receiving a transplant). Because previous prioritization systems were considered to be subjective, the PELD score was developed to be a transparent, objective method to prioritize transplant candidates according to illness severity, just as the Model for End-stage Liver Disease (MELD) score is designed for adults.
Similar to the MELD score, the PELD score was derived from biological markers of liver function (albumin level, bilirubin level, and international normalized ratio for prothrombin time) and growth failure. However, because the liver allocation process is not a closed system, the PELD score also plays a crucial role in adjudicating allocation decisions between adults and children, when MELD and PELD scores for both candidates are compared and an organ is allocated to the individual with the higher estimated 90-day probability of death. Consequently, MELD and PELD scores must accurately and comparably reflect the probability of 90-day pretransplant death.
There is indirect evidence that the transplantation community questions the usefulness of the PELD score and is concerned about underestimating pretransplant mortality risk among children,1-5 especially those younger than 2 years.6 Shortly after the PELD score was implemented, Shneider et al4 and Salvalaggio et al7 found that fewer than half of all transplant decisions relied on a PELD score alone. Instead, exception requests were frequently applied, serving to augment calculated PELD scores when a child’s illness appeared worse than the calculated PELD score might suggest. Exception requests were more common in regions with relative scarcity of pediatric donors, thereby necessitating competition between children and adults for adult donor organs. In 2005, use of the PELD score was restricted to children younger than 12 years, and older children received MELD scores.8 Despite this change, exceptions are still commonplace,9 and other countries also apply adjustments routinely to PELD scores.1
PELD was developed on a relatively small cohort of 884 patients enrolled in the Studies of Pediatric Liver Transplantation (SPLIT) database, and 41 (4.6%) died without a transplant.10 Most children in this SPLIT database (the original PELD cohort) received transplants (552 [62.4%]), potentially confounding the analyses with informative censoring. The Kaplan-Meier method and Cox proportional hazards regression model assume noninformative censoring, in other words, the presumption that transplant provides no information about risk of death. In practice, however, transplant recipients tend to be sicker than nonrecipients and have a higher risk of death, meaning that censoring by transplantation is informative.
This study evaluated the accuracy of the PELD score in estimating 90-day pretransplant mortality among children on the United Network for Organ Sharing (UNOS) liver transplant waiting list. We first analyzed a broadly defined pediatric cohort and then analyzed a more narrowly defined subgroup. Finally, we conducted additional analyses to examine PELD score performance within specific disease categories.
With use of the UNOS Standard Transplant Analysis and Research (STAR) data file for the liver transplant waiting list, the data set contained 6822 pediatric patients first listed with a PELD score from February 27, 2002, to March 31, 2014. We excluded patients who previously received a solid organ transplant of any type (n = 3); patients who were inactive (ie, status 7) at listing (n = 69); patients who received multiorgan transplants (n = 1157); patients with acute liver failure (n = 406); patients listed as status 1 (n = 295), status 1A (n = 113), or status 1B (n = 131) because their level of urgency placed them outside the PELD system; patients with cancer (n = 342); and listed patients who had received transplants or died on the same day (n = 8). The final data set included 4298 children listed for primary liver transplant from February 27, 2002, to March 31, 2014 (Figure 1). The data are part of a publicly available data set and are deidentified. Study approval was provided by the University of Pittsburgh Institutional Review Board.
We used a 2-stage approach to evaluate the accuracy of the PELD score. Of note, we used the calculated PELD score at time of listing, not augmented scores that received exception points. For the initial analysis, we included all 4298 pediatric patients (full cohort), which served as a kind of naive cohort intended to be broadly generalizable. We then restricted the sample further and excluded patients who received living donor transplants (n = 464) and those granted exception points (n = 1413). In both cases, decision to transplant potentially reflected considerations beyond the PELD score. This approach gave us a reduced cohort of 2421 patients. Figure 1 depicts this process.
Among pediatric patients, the original UNOS data file included more than 50 diagnosis codes for classifying the cause of end-stage liver disease, which were initially combined into 10 groups based on recommendations of the Pediatric Acute Liver Failure (PALF) Study Group Steering Committee. Because of small sample sizes in some groups, we further combined them into 6 categories: biliary atresia, acute liver failure (excluded from this analysis), autoimmune disorder, cancer (excluded from this analysis and examined separately11), metabolic disease, and other liver disease. eTable 1 in the Supplement maps the original UNOS codes into 10 disease groups and 6 final groups.
The PALF Study Group is a National Institutes of Health–funded prospective, multicenter case study formed in 1999 to develop a database of demographic, clinical, laboratory, and short-term outcome data for children from birth to 18 years with acute liver failure. All study policies and quality assurance measures were approved by the PALF Study Group Steering Committee.
We used mean (SD) and median (range) for continuous variables and number (percentage) for categorical variables to describe our 2 cohorts. Because the cohorts were interrelated (the reduced cohort was a subset of the full), we did not statistically test for differences. Instead, we compared distributions of each variable and checked for clinically meaningful differences.
We also compared demographics and health-related variables for the cohorts in this study with the original PELD cohort.10 Because of the small numbers of events, the original PELD cohort had 2 forms in development of the PELD model: one cohort used death as the end point for children with chronic liver disease undergoing their first transplant (n = 884), and the second cohort used death or transfer to the intensive care unit (ICU) as the end point (n = 779).
To calculate the observed probability of 90-day mortality, we categorized children into 5 levels of medical urgency using their calculated PELD scores at time of listing: less than 0, 0 to 10, 11 to 20, 21 to 30, and greater than 30. For each child, we assessed survival outcome during the follow-up period until the end of the study (June 17, 2016). Patients without a confirmed date of death at the end of the study were censored; reasons included receipt of transplant, removal from waiting list for reasons other than death (eg, too sick, too healthy), or alive and still waiting for transplant on the last date of study follow-up. Because data include censoring, we calculated the observed probability of 90-day pretransplant mortality rates using the Kaplan-Meier method.
We estimated the expected probability of 90-day pretransplant mortality corresponding to the calculated PELD score from 2 sources: a study by McDiarmid et al,12 derived from the original PELD cohort, and a study by Horslen,13 calculated by UNOS based on PELD performance during its first 2 years. Estimations were obtained using the Cox proportional hazards regression models.
The performance of the PELD score in estimating 90-day pretransplant mortality was assessed using 2 methods: the concordance of the score with the actual probability of death to measure consistency in the rankings and the accuracy of the score to measure similarity in estimated and actual probabilities. In the former, we estimated the concordance statistic (C statistic), which ranges from 0.5 (no agreement) to 1.0 (perfect agreement). In the latter, we checked accuracy of the expected pretransplant mortality by comparing it with the observed mortality.
Finally, we examined whether the PELD score performs differently within specific disease categories, based on diagnosis at listing, using the same method. All data management and analyses were performed in SAS software, version 9.3 (SAS Institute Inc) and Stata software, version 13.1 (StataCorp).
The UNOS STAR data file was extracted on June 17, 2016. A total of 4298 patients who were listed before March 31, 2014, and met study inclusion criteria were included (mean [SD] age, 2.5 [4.2] years; 2251 [52.4%] female; 2201 [51.2%] white). All eligible patients were followed up until June 17, 2016, providing at least 2 years of follow-up for each patient.
Table 1 gives demographic information for the 2 cohorts; it also includes growth failure and laboratory values, mean and median values for the PELD score at listing, blood type, transplant status, primary diagnosis, and insurance type. In the full cohort, 2364 (55.0%) were younger than 1 year, 381 (48.8%) were nonwhite, and 1481 (34.5%) had growth failure; these rates were slightly higher in the reduced cohort. During follow-up, 3323 (77.3%) in the full cohort and 1522 (62.9%) in the reduced cohort received transplants.
For laboratory values measured at the time of listing, the full cohort had a mean (SD) total bilirubin level of 9.4 (8.8) mg/dL (to convert to micromoles per liter, multiply by 17.104), a mean (SD) albumin level of 3.2 (0.7) g/dL (to convert to grams per liter, multiply by 10), a mean (SD) serum creatinine level of 0.3 (0.5) mg/dL (to convert to micromoles per liter, multiply by 88.4), and a mean (SD) international normalized ratio of 1.5 (2.3). Corresponding values for the reduced cohort are also given in Table 1; except for the bilirubin level, mean (SD) values were almost identical to those in the full cohort. The mean (SD) total bilirubin level was higher in the reduced cohort (11.0 [9.5] vs 9.4 [8.8] mg/dL). The presence of ascites was similar between the 2 cohorts. Severe encephalopathy was rare in both cohorts (33 [0.8%] in the full cohort vs 19 [0.8%] in the reduced cohort). The reduced cohort retained more patients with public insurance (1294 [53.5%] vs 2141 [49.8%] in the full cohort). Biliary atresia was the most prevalent diagnosis in both cohorts, whereas metabolic disease was less prevalent in the reduced cohort (168 [6.9%] vs 600 [14.0%]), consistent with approved exceptions for metabolic disorders.
Table 1 also compares our study cohorts with the original PELD cohort. Our study cohorts included more infants (<1 year of age: 2364 [55.0%] in the full cohort and 1375 [56.8%] in the reduced cohort vs 401 [45.4%] with death as end point and 346 [44.4%] with death or transfer to ICU as end point in the PELD cohorts), had more patients with biliary atresia (2779 [64.7%] in the full cohort and 1666 [68.8%] in the reduced cohort vs 408 [46.2%] with death as end point and 385 [49.4%] with death or transfer to ICU as end point in the PELD cohorts), and had higher levels of mean total bilirubin level (≥9.4 vs ≤8.5 mg/dL) compared with the original PELD cohort. The cohorts in our study and the original PELD cohort had similar distributions for laboratory values and other demographics. Table 2 gives the distribution of PELD scores by diagnosis, showing that children with metabolic disorder had lower PELD scores than children with other diagnoses.
eTable 2 in the Supplement gives the Kaplan-Meier estimates for actual probability of 90-day mortality and its 95% CI, estimated probability calculated from PELD scores based on the study by McDiarmid et al,12 and estimated probability calculated from the PELD scores and back-calculated baseline survival probabilities by 10-point interval PELD levels based on the study by Horslen.13 The 90-day predicted survival probabilities from the study by Horslen13 can be calculated by 0.993052205exp(PELD × 0.0992608), where 0.99305225 and 0.0992608 are the 90-day baseline survival probability and the regression coefficient for the PELD score, respectively.13
For the cohorts in our study, the observed probability of 90-day pretransplant death monotonically increased with the PELD score. Observed probabilities were slightly higher in the reduced cohort except for those among patients with PELD scores greater than 30. In general, expected 90-day mortality rates obtained from McDiarmid et al12 or Horslen13 were underestimated compared with observed rates, and the differences became more pronounced with increasing PELD score.
Figure 2 depicts actual and estimated 90-day pretransplant death by 10-point intervals of the PELD score, in which estimated mortality based on the PELD scores from the probabilities defined by Horslen13 or McDiarmid et al12 were underestimated. Figure 3 stratifies this information by major diagnosis categories.
Both figures provide comparisons between actual and estimated 90-day mortality, which can be used to assess accuracy. The absolute difference between actual and estimated values was large, indicating worse estimation. Disease-specific graphs showed similar findings, with estimated mortality being lower than actual mortality in all disease categories (Figure 3). Estimated mortality within specific disease categories is not available.
Despite inaccuracy in the estimation of actual probabilities, our data showed good concordance (C statistics of 0.8387 [95% CI, 0.8191-0.8584] for the full cohort and 0.8123 [95% CI, 0.7919-0.8327] for the reduced cohort) between the actual and estimated 90-day pretransplant mortality, indicating that higher PELD scores were associated with higher probabilities of 90-day pretransplant mortality. If organ allocation decisions for adults and children were independent, concordance between PELD scores and risk of death would suffice. In practice, however, the PELD and MELD systems are compared directly for the same liver donor pool without an appropriate correction or normalization of the numeric scores. Therefore, concordance of the PELD score with a higher risk of death is insufficient for children because it significantly underestimates actual 90-day mortality compared with the MELD score when allocating available donor organs.
The PELD score was developed to objectively prioritize pediatric liver transplant candidates. The model used in developing the PELD score was derived from the SPLIT cohort, a small sample that had to adjust its end point because observed mortality was low and the model could not be estimated reliably using deaths alone.
Our goal was to assess accuracy of the PELD score in estimating 90-day pretransplant mortality using national data spanning more than a decade. We analyzed 2 cohorts of children, first testing PELD scores in the broadest group possible, that is, all children awaiting liver transplant with non–cancer-related chronic liver failure. Knowing that this group might exceed the intended scope of PELD, we then restricted the cohort to those who met the defined variables of the PELD score to provide the PELD score with every advantage to demonstrate its usefulness in estimating 90-day pretransplant mortality. To avoid limited information on pretransplant deaths, we followed up the cohort until June 17, 2016, ensuring at least 2 years of follow-up for everyone on the waiting list.
As shown in Figure 2 and Figure 3, the PELD score preserved the rank ordering of 90-day pretransplant mortality, with satisfactory concordance measures (C statistic >0.80 for both study cohorts). This finding compares favorably with the findings by Wiesner et al14 on MELD score performance, reporting a C statistic of 0.87 (95% CI, 0.82-0.92) in the original MELD cohort and values ranging from 0.78 to 0.87 in the 3 other cohorts. Despite good concordance, however, the scores underestimated true pretransplant mortality by as much as 17%. Even after applying the suggested calibration method, the estimated and observed probabilities differed significantly.
In most cases, donor organs are not earmarked strictly for children or adults15; instead, OPTN/UNOS must decide among all candidates and adjudicate between MELD and PELD scores in allocating donor organs to whom they are most needed. For this reason, the actual probability of 90-day mortality and not just the rank ordering matters. The systematic underestimation of mortality with the PELD score leads to misinformed decisions and must be adjusted if it is to be used as an allocation tool between adults and children.
There are several potential limitations of our study. First, as Table 1 indicates, there were differences between the cohorts and the original development and validation cohorts, which may partly explain the difference in the performance of the PELD score. However, we consider the differences to be justifiable in that our analyses used national data of all children with chronic liver disease listed for transplant. The cohort in our study was a more heterogeneous group than the original cohort, but it was precisely the group that is prioritized using the PELD system. Given that the PELD scoring system is intended for widespread use among pediatric candidates with chronic liver disease, the cohort in our study is representative of current practice. Second, in calculating the observed probability of death, we treated censoring attributable to transplantation as regular noninformative censoring. Depending on the correlation between transplantation and mortality, the bias may be nonnegligible. Competing risks and methods of adjustments have been discussed widely in the transplant community.16-21 Any reestimation in the future of the PELD score or a PELD-like model needs to account for competing risks. Third, we only used risk factors measured at the time of listing to calculate the estimated probability of death, which could be problematic if a risk factor changes substantially in a short period. Fourth, the patients who were removed in creating the reduced cohort may have been biased in ways that we did not measure (eg, insurance provider). We removed these subgroups because, as noted, the decision to perform transplants in these patients (or not) was not based on their clinically computed PELD score.
More than 50% of the cohort were infants (≤1 year of age) at listing, suggesting a possible need for 2 distinct pediatric systems. Additional work should stratify the cohort by age and examine the results to determine whether different systems are necessary.
Our study found that, despite good concordance, the current PELD system significantly underestimated the true risk of death. This finding may be well understood anecdotally or within single centers,3,22 but our analysis is the first, to our knowledge, to provide systematic evidence derived from national, long-term data. Children may be disadvantaged in the current organ allocation system compared with adults, a situation that most of society would find untenable.23,24 In addition, we found the precision of the PELD score to be poor, especially for children who are more sick (PELD score >20). For children with PELD scores greater than 30, the 95% CI for the likelihood of dying in 90 days spanned from 23% to 43%. Building a more accurate PELD model is a natural next step.
Researchers might ask whether a truly accurate new PELD score can be created given the heterogeneity of diagnoses, the vagaries of deceased donors, geographic and sociopolitical disparities, differences in medical expertise and experience, and secular trends over time. By way of comparison, the MELD scoring system has been validated in several different cohorts and is considered to be a better scoring system than the PELD system.25 Regardless of whether adults or children are prioritized in allocation policies, the transplant community should develop a risk estimation tool for children that allows for the direct comparison of risk of death with that among adults.
Accepted for Publication: June 4, 2018.
Corresponding Author: Mark S. Roberts, MD, MPP, Department of Health Policy and Management, University of Pittsburgh, 130 De Soto St, Ste A621, Pittsburgh, PA 15261 (firstname.lastname@example.org).
Published Online: September 17, 2018. doi:10.1001/jamapediatrics.2018.2541
Author Contributions: Dr Roberts had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Chang, Bryce, Shneider, Donnell, Squires, Roberts.
Acquisition, analysis, or interpretation of data: Chang, Shneider, Yabes, Ren, Zenarosa, Tomko, Donnell, Squires.
Drafting of the manuscript: Chang, Bryce, Ren, Squires, Roberts.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Chang, Yabes, Ren, Zenarosa, Tomko, Donnell.
Obtained funding: Squires.
Administrative, technical, or material support: Bryce, Zenarosa, Donnell, Roberts.
Supervision: Chang, Bryce, Squires, Roberts.
Conflict of Interest Disclosures: None reported.
Funding/Support: This work was funded by grants U01 DK072146 and R21 DK084201-01 from the National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health. This work was supported in part by contract 234-2005-37011C from the Health Resources and Services Administration.
Role of the Funder/Sponsor: The National Institutes of Health provided funds for coauthors to design and conduct the study, which includes data management, analysis, interpretation of the data and analysis results, and preparation of the manuscript for publication. The sponsor had no direct role in these activities.
Disclaimer: The content is the responsibility of the authors alone and does not necessarily reflect the views or policies of the US Department of Health and Human Services and mention of trade names, commercial products, or organizations does not imply endorsement by the US government.