Plotting the locally weighted scatterplot smoothing (LOWESS; a statistical technique for fitting a smooth line through a set of points) line permits a graphical assessment of calibration across the range of predicted values. The diagonal reference line denotes the line of perfect calibration. Deviation of the LOWESS line from the diagonal line indicates a lack of calibration. Overprediction (ie, LOWESS line below the diagonal line) and underprediction (ie, LOWESS line above the diagonal line) can be identified across different risk strata. For example, in ABCD-10 in 196 patients, mortality in patients with expected probability of death of less than 0.4 is underpredicted, whereas mortality in patients with expected probability more than 0.6 is overpredicted. ABCD-10 indicates age, bicarbonate, cancer, dialysis, 10% body surface area; AUC, area under the receiver operating characteristic curve; CITL, calibration in the large; E:O, ratio of the expected to observed events.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Identify all potential conflicts of interest that might be relevant to your comment.
Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.
Err on the side of full disclosure.
If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.
Not all submitted comments are published. Please see our commenting policy for details.
Koh HK, Fook-Chong S, Lee HY. Assessment and Comparison of Performance of ABCD-10 and SCORTEN in Prognostication of Epidermal Necrolysis. JAMA Dermatol. 2020;156(12):1294–1299. doi:10.1001/jamadermatol.2020.3654
Does the ABCD-10 score (age, bicarbonate, cancer, dialysis, 10% body surface area), a new risk prediction model for in-hospital mortality in epidermal necrolysis, show better performance than the Score of Toxic Epidermal Necrolysis (SCORTEN)?
In this cohort study of 196 patients in an Asian national referral center for epidermal necrolysis, while discrimination of ABCD-10 did not differ from SCORTEN, ABCD-10 was shown to have poorer calibration in prognostication of in-hospital mortality. Dialysis before admission, which weighted the most heavily in ABCD-10, was a weaker predictor in this cohort.
The findings of this study suggest that SCORTEN should be used for mortality prognostication in epidermal necrolysis instead of ABCD-10.
Epidermal necrolysis is a rare severe cutaneous drug reaction associated with high mortality. The ABCD-10 score (age, bicarbonate, cancer, dialysis, 10% body surface area), a new prognostic score for mortality in epidermal necrolysis, was recently developed and validated in the US. However, to our knowledge, it remains to be externally validated in other cohorts.
To assess ABCD-10 among patients in a contemporary Asian cohort and compare its performance with the Score of Toxic Epidermal Necrosis (SCORTEN) and study the associations of time and immunomodulatory therapy with the performance of ABCD-10 and SCORTEN.
Design, Setting, and Participants
This retrospective cohort study was conducted over a 17-year period from January 2003 to March 2019 and included 196 patients with epidermal necrolysis who were recruited from Singapore General Hospital, the national referral center for epidermal necrolysis.
Main Outcomes and Measures
In-hospital mortality. Discrimination and calibration of each risk score were assessed and compared using the area under the receiver operating characteristic curve and calibration plot, respectively.
Among 196 patients (median [interquartile range] age, 56 [42-70] years; 116 women [59.2%]), 45 (23.0%) did not survive to discharge. All risk factors in ABCD-10 were significantly associated with in-hospital mortality. However, dialysis before admission, the most heavily weighted factor in ABCD-10, performed weaker in this cohort (odds ratio, 3.7; 95% CI, 1.0-13.2, P = .04). Although the discrimination of ABCD-10 and SCORTEN did not differ (area under the curve: ABCD-10, 0.78; 95% CI, 0.72-0.85; vs SCORTEN, 0.77; 95% CI, 0.69-0.84; P = .53), the calibration of ABCD-10 was poorer compared with SCORTEN. From graphical analysis of the calibration plots, ABCD-10 showed mortality underestimation at lower score ranges and overestimation at higher score ranges. By contrast, SCORTEN was generally well calibrated, although at higher score ranges mortality may be overestimated. Assessment of calibration plots showed that there was increasing overestimation of mortality by SCORTEN during the later period or when immunomodulatory therapy was used compared with patients treated with supportive care alone. Calibration of ABCD-10 remained poor even during the later period or among patients treated with immunomodulatory therapy.
Conclusions and Relevance
In this cohort of patients, the performance of SCORTEN was superior to ABCD-10 in mortality prognostication in epidermal necrolysis. However, it did display time-associated deterioration in calibration leading to overestimation of mortality risk. Future studies may consider revising the existing SCORTEN given its current good discrimination.
Epidermal necrolysis (EN), the unified denomination for Stevens-Johnson syndrome (SJS), toxic epidermal necrolysis (TEN), and SJS-TEN overlap, is a rare severe cutaneous adverse reaction typically induced by medications.1 Mortality rates are high, ranging from 10% for SJS and up to 50% for TEN.2
The Score of TEN (SCORTEN) is an SJS/TEN-specific severity-of-illness score widely used to predict in-hospital mortality.3 SCORTEN comprises 7 clinical and biological parameters, with the predicted probability of mortality ranging from 3.2% to 90.0%. It has demonstrated good performance during the first 5 days of hospitalization.4 Apart from its primary role in prognostication, SCORTEN has been used as an internal control in therapeutic studies to assess the efficacy of immunomodulatory treatments5 and as a quality of care benchmark across centers worldwide.6
SCORTEN has been validated in earlier studies in various centers.7 However, advancements, particularly in supportive and intensive care, as well as different treatment protocols may affect its prognostic ability and generalizability. Recently, a new risk prediction model, the ABCD-10 score (age, bicarbonate, cancer, dialysis, 10% body surface area [BSA]), was developed and validated in a multicenter contemporary cohort in the US.8 It not only consists of fewer parameters, but also introduces a new variable previously not present in SCORTEN: dialysis before admission. The study reported that the discrimination of ABCD-10 was similar to SCORTEN. However, to our knowledge ABCD-10 remains to be validated in other populations. This study therefore aims to assess and compare the performance of SCORTEN and ABCD-10 in a contemporary Asian cohort treated in an EN referral center.
This retrospective cohort study was conducted over a 17-year period from January 2003 to March 2019 at Singapore General Hospital, the national referral center for SJS-TEN. This study was approved by the Singhealth institutional review board (CIRB Ref 2014/2011). A waiver of consent was provided for patients recruited until 2017 as the collected data were anonymized and deidentified; however, with the institution of the Human Biomedical Research Act in Singapore in 2017, all subsequent patients provided written consent. Patients with an involved BSA of greater than 10% are referred to this center. Diagnosis and classification of EN was based on established consensus criteria with supportive histological evidence.9 Patients were treated with supportive care and immunomodulatory therapy (if there were no contraindications),10,11 which consisted of intravenous immunoglobulin and cyclosporine. Patients with an involved BSA of greater than 10% were generally treated in the burns unit under the dermatology service. Intensive care facilities are available onsite within the burns unit.
Patient demographic characteristics, comorbidities, and parameters pertinent to the computation of SCORTEN and ABCD-10 were abstracted on admission. In-hospital mortality was recorded for all patients. Data were presented as medians for continuous data and percentages for categorical data. There were 4 patients (2.0%) whose serum bicarbonate levels could not be retrieved retrospectively. Following the methods of the original SCORTEN article,3 the 4 missing values were assumed to be within the normal range.
Discrimination refers to the ability to differentiate between patients with a higher risk of an event and those with a lower risk. For example, a clinical score with good discrimination would give more points to a patient at higher risk of mortality and lower points to a patient with a lower risk of mortality. Calibration refers to the accuracy of the risk estimates and the agreement between the predicted and observed number of events. For example, in a poorly calibrated score, patients with higher points may be predicted to have a higher mortality rate than the actual mortality rate. For this reason, calibration is important in prognostic models like SCORTEN and ABCD-10, as overestimation or underestimation of the chance of mortality would make the risk score clinically unacceptable.
The area under the receiver operating characteristic curve (AUC) was calculated to assess the discrimination of the scores. The calibration curve was used to evaluate the agreement between the number of observed and expected deaths predicted by SCORTEN or ABCD-10. The mortality equation was used in the analysis of the AUC and calibration curve. An AUC between 0.7 and 0.9 indicates fair to good discrimination while a calibration curve close to the ideal y = x line indicates good calibration. Locally weighted scatterplot smoothing (LOWESS) was performed. This permitted a graphical assessment of calibration across the range of predicted values. In addition, a secondary analysis of the performance of SCORTEN and ABCD-10 was performed that was restricted to (1) periods and (2) supportive care vs immunomodulatory treatment.
Statistical analyses were performed using SPSS, version 22 (IBM) and Stata, version 14 (StataCorp). Statistical significance was defined as P ≤ .05.
The study included 196 patients, comprising 66 patients with SJS (33.7%), 61 with SJS/TEN overlap (31.1%), and 69 with TEN (35.2%). The median age was 56 years (interquartile range, 42-70 years). The baseline characteristics of patients are summarized in Table 1. The overall in-hospital mortality rate was 45 of 196 (23.0%).
A bivariate analysis of the prognostic factors reported in ABCD-10 indicated that age older than 50 years (odds ratio [OR], 4.9; 95% CI, 2.1-11.8; P = .0003), BSA greater than 10% (OR, 10.0; 95% CI, 3.0-33.8; P = .0002), cancer (OR, 2.3; 95% CI, 1.0-4.9; P = .04), dialysis before admission (OR, 3.7; 95% CI, 1.0-13.2; P = .04), and serum bicarbonate levels less than 20 mEq/L (to convert to millimoles per liter, multiply by 1; OR, 2.2; 95% CI, 1.1-4.5; P = .03) were significantly associated with mortality (Table 2). Prior dialysis, the most heavily weighted factor in ABCD-10, performed weaker in the cohort while a BSA greater than 10% was the strongest predictor of mortality in the cohort.
ABCD-10 and SCORTEN had good discrimination and were not significantly different (AUC: ABCD-10, 0.78; 95% CI, 0.72-0.85; vs SCORTEN, 0.77; 95% CI, 0.69-0.84; P = .53) (Figure, A and B). However, in terms of calibration, ABCD-10 performed poorer (Figure, B). SCORTEN is generally well calibrated, although at higher score ranges, mortality may be overestimated. By contrast, ABCD-10 shows mortality underestimation at lower score ranges and overestimation at higher score ranges.
Across periods (2003-2010 and 2011-2019), ABCD-10 and SCORTEN had good discrimination that were not significantly different. For patients recruited from 2003 to 2010, the AUC for ABCD-10 and SCORTEN was 0.81 (95% CI, 0.73-0.89) and 0.78 (95% CI, 0.68-0.87), respectively (P = .40). For patients recruited from 2011 to 2019, the AUC for ABCD-10 and SCORTEN was 0.75 (95% CI, 0.66-0.87) and 0.77 (95% CI, 0.65-0.89), respectively (P = .88).
SCORTEN calibration was well calibrated when patients recruited from 2003 to 2010 were analyzed (Figure, C). However, from 2011 to 2019, SCORTEN tended to overestimate mortality across all score ranges, with increasing overestimation with higher scores (Figure, D).
In patients recruited from 2003 to 2010, ABCD-10 underestimated mortality at lower score ranges and overestimated mortality at higher score ranges (Figure, E). In patients recruited from 2011 to 2019, while the underestimation at lower scores was attenuated, the overestimation at higher scores increased (Figure, F). The quadratic shape of the calibration curve remains, although it is translated downwards, suggesting possible systemic overestimation across all scores compared with 2003 to 2010.
ABCD-10 and SCORTEN had good discrimination that were not significantly different. For patients treated with supportive care alone, the AUC for ABCD-10 and SCORTEN was 0.77 (95% CI, 0.66-0.89) and 0.75 (95% CI, 0.59-0.89), respectively (P = .57). For patients treated with immunomodulatory therapy, the AUC (95% CI) for ABCD-10 and SCORTEN was 0.78 (95% CI, 0.71-0.87) and 0.77 (95% CI, 0.68-0.85), respectively (P = .59).
In patients treated with supportive care, SCORTEN is generally well calibrated, although it tends to overestimate mortality at higher score ranges (Figure, G). In patients treated with immunomodulatory therapy, SCORTEN tends to overestimate mortality across all score ranges (Figure, H).
In patients treated with supportive care, ABCD-10 underestimated mortality at lower score ranges and overestimated mortality at higher score ranges (Figure, I). The same trend was observed in patients treated with immunomodulatory therapy (Figure, J).
Predictive scoring systems such as SCORTEN and ABCD-10 are vital in evaluating and treating patients with EN. They are useful in prognosticating mortality and serve as an internal control in therapeutic trials, as well as a quality of care benchmark in various centers worldwide. The performance of such scores depend on 2 factors: (1) calibration and (2) discriminatory ability. In this study, we have demonstrated that while SCORTEN and ABCD-10 have good discriminatory ability, the calibration of SCORTEN appears superior.
In the developmental cohort of ABCD-10, good calibration was based on the Hosmer-Lemeshow test. However, the limitations of this test include inadequate power and the inability to demonstrate the magnitude or direction of miscalibration.12 Calibration is best assessed graphically.13 In the study cohort, the calibration was deemed suboptimal because of the underestimates at lower score ranges and overestimates at higher score ranges (Figure, B).
The suboptimal calibration of ABCD-10 may be explained in part by the fact that prior dialysis, although significant, was not the most heavily weighted factor in the cohort and by multicollinearity in the model between dialysis and serum bicarbonate levels. Kidney impairment is an association and poor prognostic factor in SJS/TEN as a comorbidity (chronic kidney disease2,14,15 and prior dialysis8) as well as when it occurs as a complication of the disease as acute kidney injury and emergent dialysis.16,17 Its association is indirectly reflected in SCORTEN (urea and serum bicarbonate levels) as well as ABCD-10 (prior dialysis and serum bicarbonate levels). Although multicollinearity between chronic kidney disease, dialysis, serum creatinine levels, and serum blood urea nitrogen levels was evaluated in the developmental cohort of ABCD-10, the collinear association between prior dialysis and serum bicarbonate levels was not explicitly investigated.8 A potential for “double-hit” or “double-miss” exists, and this may explain the overestimation of mortality at high scores and the underestimation at low scores.
Our findings of SCORTEN are consistent with a recent meta-analysis of the prognostic accuracy of SCORTEN.18 The meta-analysis demonstrated that while SCORTEN is a reliable predictor of mortality, the confidence interval of the pooled estimates suggests a degree of overestimation. Torres-Navarro et al18 also compared the standardized mortality ratio (SMR) of SCORTEN between Asia, North America, and Europe and did not demonstrate any statistical difference in the SMR between these 3 main regions. The overall results for Asia and North America were found to be very close.
SCORTEN was originally constructed and validated among patients recruited from 1979 to 1998 who were treated with supportive care alone at the French referral center.3 One of the concerns has been the performance of SCORTEN over time, with improvements in dermatological and intensive care unit care, as well as the potential role of immunomodulatory agents. Interestingly, our results show that SCORTEN showed good calibration from 2003 to 2010, but from 2011 to 2019, SCORTEN overestimated mortality across all score ranges, particularly in the higher scores (Figure, D). Coincidentally, these periods corresponded to a general shift in our departmental protocol from intravenous immunoglobulins (until 2010) to cyclosporine (from 2011 onwards). To further clarify the role of immunomodulation, a further subanalysis was performed comparing those treated supportively and those with immunomodulation. In the supportive group, SCORTEN overestimated mortality at the higher scores, whereas in the immunomodulation group, SCORTEN tended to overestimate mortality across all score ranges. It remains to be seen if these findings provide indirect evidence that immunomodulation and improvements in supportive care over time improves mortality (hence the overestimation of SCORTEN). We attempted to clarify further with a subgroup analysis looking at periods, individual immunomodulatory agents, and supportive care, however, the small sample precluded meaningful analysis.
The use of prognostic scores, such as SCORTEN and ABCD-10, is useful for therapeutic studies primarily to define and control for the severity of disease in patients being studied. The rarity of SJS/TEN poses a challenge to conducting randomized clinical trials. An alternative approach that has been widely used has been to use SCORTEN as an internal control to predict expected mortality vs what was observed with the intervention.11,19 Our study findings raise concerns with such an approach. Mortality is overestimated at higher score ranges for SCORTEN, even among patients treated with supportive care alone, resulting in a lower SMR at higher score ranges. Caution must therefore be taken in interpreting the SMR of each therapy, considering the distribution of SCORTEN within the study cohort as well. All other things being equal, in 2 cohorts treated with the same immunomodulatory agent, the cohort with a higher median SCORTEN is likely to have a lower SMR than the cohort with a lower median SCORTEN because of systematic error. With SCORTEN overestimating at higher scores, survival benefits may be erroneously attributed to any proposed intervention.
The same holds true for using SCORTEN as a quality of care benchmark. Referral centers typically handle more patients with more severe SJS/TEN who may have higher SCORTEN. Because of the case mix, referral centers may have an SMR more optimistic than the truth because of the instrument inaccuracy of SCORTEN.
The retrospective nature with its inherent flaws, as well as the potential for referral bias in a national referral center, are limitations. While it would be helpful to separate the associations of period and treatment, the small number of patients in the divided subgroups (eg, patients recruited from 2003-2010 who were treated with supportive care alone) precludes further statistical analysis and were not included in this study. Overfitting is a potential limitation, as the sample size of this study is smaller than the development cohort of ABCD-10. Unfortunately, SJS/TEN is a rare disease, which poses difficulties in recruiting large numbers of patients. This study was also performed in an Asian population, different from the European and North American cohorts used to develop SCORTEN and ABCD-10, respectively. The difference in demographic characteristics may limit the generalizability of the study.
Nonetheless, there were strengths to this study. To our knowledge, this is the first study evaluating the external validity of ABCD-10 and comparing it with SCORTEN. This was a single-center study over a substantial period, which allowed us to evaluate the time worthiness of various scores.
Increasing attention has been placed on the validity and accuracy of SCORTEN in the contemporary population. This increasing attention is valid because SCORTEN displays time-associated deterioration in calibration leading to overestimation of mortality risk with the implications as described previously. While the development of ABCD-10 is an encouraging step toward developing a better prognostic score, the calibration is suboptimal, which limits clinical use. Like the periodic updating of the APACHE score (Acute Physiology and Chronic Health Evaluation), future studies may consider revising the existing SCORTEN given its current good discrimination.
Accepted for Publication: July 28, 2020.
Corresponding Author: Haur Yueh Lee, MBBS, MRCP (UK), MMed (Int Med), Department of Dermatology, Singapore General Hospital, Outram Rd, Singapore 169608 (firstname.lastname@example.org).
Published Online: October 21, 2020. doi:10.1001/jamadermatol.2020.3654
Author Contributions: Dr Hui Kai Koh and Ms Stephanie Fook-Chong had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Koh, Lee.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Koh.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Koh, Fook-Chong.
Conflict of Interest Disclosures: None reported.