eFI indicates Electronic Frailty Index; HFRS; Hospital Frailty Risk Score; mFI-5, 5-Factor Modified Frailty Index; RAI, Risk Analysis Index.
eTable 1: Description of Abdominal Surgical Groupings
eTable 2: Calculation of Hospital Frailty Risk Score (HFRS)
eTable 3: Calculation of Electronic Frailty Index (eFI)
eTable 4: Calculation of Five-Factor Modified Frailty Index (mFI-5)
eTable 5: Calculation of Risk Analysis Index (RAI)
eTable 6: Predictive Discrimination of Electronic Frailty Metrics Within Select Elective Surgical Procedure Types, for 30-Day Morbidity, Mortality, and Readmission
eTable 7: Stratified Predictive Discrimination of Multivariate Models for 30-Day Mortality
eTable 8: Stratified Predictive Discrimination of Multivariate Models for 30-Day Readmission
Customize your JAMA Network experience by selecting one or more topics from the list below.
Le ST, Liu VX, Kipnis P, Zhang J, Peng PD, Cespedes Feliciano EM. Comparison of Electronic Frailty Metrics for Prediction of Adverse Outcomes of Abdominal Surgery. JAMA Surg. 2022;157(5):e220172. doi:10.1001/jamasurg.2022.0172
How do electronic frailty metrics compare and what is their predictive value for adverse surgical outcomes?
In this cohort study among 37 186 patients who underwent abdominal surgery, the Hospital Frailty Risk Score (HFRS), adapted Electronic Frailty Index, 5-Factor Modified Frailty Index, and Risk Analysis Index demonstrated low to moderate correlation. The HFRS had the highest predictive value for mortality, readmission, and major complications, and augmented a multivariate preoperative risk model.
The 4 electronic frailty metrics often disagree and exhibit heterogeneous discrimination for adverse surgical outcomes; the HFRS demonstrated the strongest overall predictive value, which suggests its potential utility for automated screening for high-risk or frail patients.
Electronic frailty metrics have been developed for automated frailty assessment and include the Hospital Frailty Risk Score (HFRS), the Electronic Frailty Index (eFI), the 5-Factor Modified Frailty Index (mFI-5), and the Risk Analysis Index (RAI). Despite substantial differences in their construction, these 4 electronic frailty metrics have not been rigorously compared within a surgical population.
To characterize the associations between 4 electronic frailty metrics and to measure their predictive value for adverse surgical outcomes.
Design, Setting, and Participants
This retrospective cohort study used electronic health record data from patients who underwent abdominal surgery from January 1, 2010, to December 31, 2020, at 20 medical centers within Kaiser Permanente Northern California (KPNC). Participants included adults older than 50 years who underwent abdominal surgical procedures at KPNC from 2010 to 2020 that were sampled for reporting to the National Surgical Quality Improvement Program.
Main Outcomes and Measures
Pearson correlation coefficients between electronic frailty metrics and area under the receiver operating characteristic curve (AUROC) of univariate models and multivariate preoperative risk models for 30-day mortality, readmission, and morbidity, which was defined as a composite of mortality and major postoperative complications.
Within the cohort of 37 186 patients, mean (SD) age, 67.9 (female, 19 127 [51.4%]), correlations between pairs of metrics ranged from 0.19 (95% CI, 0.18- 0.20) for mFI-5 and RAI 0.69 (95% CI, 0.68-0.70). Only 1085 of 37 186 (2.9%) were classified as frail based on all 4 metrics. In univariate models for morbidity, HFRS demonstrated higher predictive discrimination (AUROC, 0.71; 95% CI, 0.70-0.72) than eFI (AUROC, 0.64; 95% CI, 0.63-0.65), mFI-5 (AUROC, 0.58; 95% CI, 0.57-0.59), and RAI (AUROC, 0.57; 95% CI, 0.57-0.58). The predictive discrimination of multivariate models with age, sex, comorbidity burden, and procedure characteristics for all 3 adverse surgical outcomes improved by including HFRS into the models.
Conclusions and Relevance
In this cohort study, the 4 electronic frailty metrics demonstrated heterogeneous correlation and classified distinct groups of surgical patients as frail. However, HFRS demonstrated the highest predictive value for adverse surgical outcomes.
Frailty is a state of decreased physiologic reserve and multiple system impairments that increases vulnerability to stressors, such as surgery. National institutions and specialty societies have strongly recommended preoperative assessment for frailty as a critical component of surgical care,1-5 as frailty is a powerful determinant of complications, nonhome discharge, and new disability.6-10 However, implementation of frailty assessments has been limited, in part owing to time-consuming frailty assessment methods that require direct clinician evaluation.11-13 Given the association between frailty and morbidity following low-risk procedures,14,15 the aging surgical population,16 and the prevalence of frailty among nonelderly adults,17 the number of patients for whom frailty assessment may be relevant is potentially large. Practical methods for assessing frailty are needed.
The most established frailty instruments require direct manual assessment by a clinician,18-20 which has raised concerns about the feasibility of using these tools across populations.21-23 This has led to the development of several electronic frailty metrics, which leverage readily available data within the electronic health record (EHR) to measure frailty. Therefore, these metrics have strong potential to streamline preoperative assessments. However, these metrics differ substantially in the data they incorporate, their construction, and their conceptual approach to frailty. Prior studies have demonstrated univariate associations between individual electronic frailty metrics and adverse outcomes within various surgical populations,24-33 but to our knowledge, no studies have systematically compared and characterized the predictive value of multiple metrics alongside other preoperative risk factors. This information is critical for clinicians and health systems seeking to improve surgical care and decision-making by implementing one of these tools into clinical workflows.
In this external validation of electronic frailty metrics, we used EHR data from surgical procedures that were performed at a large, integrated health care system to compare 4 existing electronic frailty metrics: the Hospital Frailty Risk Score (HFRS), the adapted Electronic Frailty Index (eFI), the 5-Factor Modified Frailty Index (mFI-5), and the Risk Analysis Index (RAI).34-37 We address the following questions: First, do these metrics agree in their measurement and classification of frailty? Second, what is the predictive value of each metric for adverse surgical outcomes, including mortality, readmission, and major postoperative complications? And additionally, do these metrics improve prediction of adverse surgical outcomes, independent of other, readily measurable preoperative risk factors?
This study was approved by the Kaiser Permanente Institutional Review Board. A waiver of informed consent was provided because the research complied with Exempt Research Category 4 requirements. We followed Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guidelines for prediction model validation.38
Within Kaiser Permanente Northern California (KPNC), all hospitals and clinics use the same information systems and share a single EHR system. Our study population consisted of 37 186 adult patients older than 50 years who required inpatient admission for abdominal surgical procedure at KPNC facilities between January 1, 2010, and December 31, 2020. We excluded cases if the patient had undergone an abdominal surgery within 30 days prior. Within KPNC, procedure types are labeled based on an internal coding system. We identified abdominal surgical cases based on procedure codes that were reviewed by 2 surgeons (S.T.L. and P.D.P.). In the absence of an accepted definition of abdominal surgery and given the variety of different types of procedures included in prior investigations,39-42 we selected a broad range of procedures performed within the abdomen, excluding transplant, trauma, obstetrics, and endoscopic procedures. We limited our study sample to cases that were manually reviewed for reporting to the National Surgical Quality Improvement Program (NSQIP), to ensure standardized ascertainment of postoperative complications. The sampling strategy for NSQIP requires a dedicated surgical clinical nurse review to abstract data on specific risk factors and 30-day postoperative complications.43
We captured a set of characteristics within the EHR that are known to be associated with adverse surgical outcomes. These include age, sex, comorbidity burden, and procedure information. We quantified comorbid disease burden with the Charlson Comorbidity Index,44,45 based on patients’ medical diagnoses within the 12 months preceding the date of surgery.46 We considered a case to be urgent if the surgeon indicated it was an add-on case or needed to be performed within 48 hours, in the form that is submitted to schedule the case. We classified the operative approach as minimally invasive vs open if the primary procedure code indicated that the procedure was performed laparoscopically, robotically, or endoscopically. Additionally, to model the risks of adverse surgical outcomes specific to different surgical procedures, we grouped cases based on procedure codes, using a hierarchical scheme, similar to how NSQIP previously grouped surgical procedures to model procedure-specific risk.47,48 To ensure adequate sample size, similar groups were further aggregated into 24 groups based on the number of cases during the study period (eTable 1 in the Supplement).
The developments of HFRS, eFI, mFI-5, and RAI have been previously reported in the literature.34-37,49-51 The HFRS is derived from a cluster analysis of older adult patients who were hospitalized and uses diagnosis codes to predict frailty.35 The eFI is conceptually based on the cumulative deficits model of frailty and incorporates diagnoses, laboratory values, vitals, medications, social history, and functional status.52 The mFI-5 is conceptually based on the cumulative deficits model of frailty and incorporates factors captured in NSQIP data.53 The RAI is based on a model to predict 30-day mortality among older adults surgical patients using NSQIP data.34,51 We calculated 4 electronic frailty metrics based on available EHR data prior to surgery, and details on our adaptation and calculation of these metrics are provided in eTables 2 through 5 in the Supplement. The eFI was deemed not calculable for 5237 patients (14.1%), based on previously published requirements to exclude patients with insufficient data to calculate eFI.
Our primary outcome was 30-day postoperative morbidity, which was defined as a composite of 30-day mortality and 30-day major postoperative complications. Mortality was identified using KPNC health system records, and 30-day major postoperative complications included cardiac arrest, myocardial infarction, pulmonary embolism, sepsis, septic shock, deep or organ space surgical site infections, unplanned intubation, deep vein thrombosis, progressive kidney insufficiency or kidney failure, or cerebrovascular accident. These complications were selected and outcome was constructed based on prior studies of major NSQIP complications.48,54 Secondary outcomes included 30-day mortality from the day of surgery and 30-day readmission from the day of discharge.
We assessed the correlation of the electronic frailty metrics with Pearson correlation coefficients. We considered correlation values of 0 to 0.19 as very weak, 0.20 to 0.39 as weak, 0.40 to 0.59 as moderate, 0.60 to 0.79 as strong, and 0.80 to 1.00 as very strong.55 P values associated with the Pearson correlation between pairs of metrics were computed from 2-tailed, single-hypothesis tests with the null hypothesis that these correlations are 0. We quantified the number of patients who might be considered frail based on each of the electronic frailty metrics. For patients in the HFRS, RAI, and mFI-5 groups, patients were classified as frail based on published thresholds, including 35 or higher for RAI, 15 or higher for HFRS, and 0.2 or higher for mFI-5. Because the availability of data within a health system can increase eFI, patients were classified as frail based on eFI if eFI was calculable and in the top quartile. We then compared the proportion of patients classified as frail based on the different metrics.
To characterize the predictive value of these metrics for adverse surgical events while avoiding the potential for overfitting, we prespecified models and compared their predictive discrimination. We first assessed the predictive discrimination of each individual metric for adverse surgical outcomes by using logistic regression models for each outcome with each frailty metric alone, as continuous predictors. For the models with eFI, patients with eFI not calculable owing to insufficient data were flagged, and not calculable eFI was entered into the model as an additional binary predictor. Model discrimination was evaluated based on the area under the receiver operator characteristic curve (AUROC).56 As a sensitivity analysis, we evaluated the predictive discrimination of the electronic frailty metrics for adverse surgical outcomes within a variety of elective surgical procedure types. We then assessed the AUROC of multivariate logistic models to measure the incremental predictive value of electronic frailty metrics along other risk factors for postoperative adverse outcomes. Of note, factors such as chronologic age and comorbidity are conceptualized to be associated but distinct constructs from the concept of frailty,18,57 and therefore we considered the contribution of frailty, as quantified by the metrices, to be independent of age or comorbidity for surgical outcomes. We evaluated the discrimination of base multivariate preoperative risk models that include age, sex, Charlson Comorbidity Index, procedure type, and urgency. Age was entered into the models as a restricted cubic spline with 3 knots. We examined the predictive discrimination of multivariate models including those base factors and individual frailty metrics, and reported the P value comparing the AUROCs of those models with the base models. Additionally, we stratified the predictive discrimination of these models in subgroups based on age, elective vs urgent, and open vs minimally invasive surgical approach.
All statistical tests were 2-tailed, and statistical significance was accepted at the P < .05 level. Statistical analyses were performed using SAS version 9.4 (SAS Institute), R version 4.0.2 (The R Foundation), and Stata version 16 (StataCorp).
The analytic cohort consisted of 34 945 patients (male, 18 059 [48.6%]) who underwent 37 186 abdominal surgical procedures. Mean (SD) age was 67.9 (10.6) years (Table 1). Within this cohort, 21 283 (57.2%) were designated American Society of Anesthesiologists class III or greater, 18 427 (50.2%) of the surgical procedures were performed using an open approach, and 12 260 (32.9%) were urgent. The most common procedure types were colectomies without low-pelvic anastomosis (11 283 [30.3%]), cholecystectomy (4006 [10.8%]), and nephrectomy (2317 [6.2%]). The rates of 30-day morbidity, mortality, and readmission were 4573 (12.3%), 923 (2.5%), and 4125 (11.4%), respectively.
The RAI had weak to moderate correlation with the other 3 metrics, ranging from 0.19 (95% CI, 0.18-0.20) with mFI-5 0.33 with eFI (95% CI, 0.32-0.34) (Table 2). The mFI-5 and eFI were moderately correlated (0.44; 95% CI, 0.43-0.45; P < .001) while the efI and HFRS models had strong correlation and the highest correlation of any pair of metrics (0.69; 95% CI, 0.68-0.70; P < .001). The proportion of patients classified as frail by each individual metric was 7496 (20.2%) in the mFI-5, 9360 (25.2%) in the HFRS, 9664 (26.0%) in the eFI, and 10 788 (29.0%) in the RAI. The overlap between patients classified as frail based on pairs of metrics ranged from 2711 (7.3%) classified as frail by both mFI-5 and RAI, to 5899 (15.9%) classified as frail by both HFRS and eFI; 1085 (2.9%) of patients were classified as frail by all 4 metrics.
Of the electronic frailty metrics, HFRS had the highest individual predictive discrimination in univariate models for morbidity (AUROC, 0.71; 95% CI, 0.70-0.72), mortality (AUROC, 0.82, 95% CI, 0.81-0.84), and readmission (AUROC, 0.64; 95% CI, 0.63-0.65), followed by eFI for morbidity (AUROC, 0.64; 95% CI, 0.63-0.65), mortality (AUROC, 0.73; 95% CI, 0.71-0.75), and readmission (AUROC, 0.61; 95% CI, 0.60-0.62) (Table 3). The mFI-5 and RAI demonstrated lower predictive discrimination for all 3 outcomes; for example, the AUROC of RAI for morbidity was 0.57 (95% CI, 0.57-0.58). Among the preoperative surgical risk factors, procedure factors demonstrated the highest discrimination for morbidity (AUROC, 0.70; 95% CI, 0.69-0.70), mortality (AUROC, 0.77; 95% CI, 0.75-0.79), and readmission (AUROC, 0.61; 95% CI, 0.60-0.62). Within select elective surgical procedure types, HFRS demonstrated higher discrimination for adverse surgical outcomes than mFI-5 and RAI; for example, among pancreatectomies, the AUROC of HFRS for mortality was 0.77 (95% CI, 0.68-0.87) compared with 0.64 (95% CI, 0.49-0.78) for RAI and 0.56 (95% CI, 0.44-0.68) for mFI-5 (eTable 6 in the Supplement).
In multivariate models, models including HFRS consistently demonstrated the highest predictive discrimination for all 3 outcomes, and the addition of the HFRS significantly improved the discrimination of the base risk models with demographic, comorbidity, and procedure factors (Figure). For example, the AUROC of the base model was 0.72 (95% CI, 0.71-0.73), but including HFRS improved the AUROC to 0.76 (95% CI, 0.75-0.77; P < .001). In subgroup analyses, the incremental predictive value of HFRS for adverse outcomes was seen in the 50 to 74 years age group and the 75 years and older age group, elective and urgent surgical procedures, and open and minimally invasive surgical procedures. For example, including HFRS improved the AUROC of the base model for morbidity from 0.64 (95% CI, 0.63-0.65) to 0.70 (95% CI, 0.69-0.71) among elective surgical procedures and 0.72 (95% CI, 0.71-0.73) to 0.76 (95% CI, 0.75-0.77) among urgent surgical procedures (Table 4). The incremental predictive value of HFRS was seen for mortality (eTable 7 in the Supplement) and readmission (eTable 8 in the Supplement) across all subgroups, whereas the inclusion of eFI, mFI-5, or RAI as predictors resulted in little incremental predictive value.
The tremendous effect that frailty has on surgical outcomes and decision-making has accentuated the need for objective and standardized assessments of frailty. The need to assess frailty at scale has led to the development of frailty metrics that are readily calculable using available EHR data. Within a large abdominal surgical population, the 4 electronic frailty metrics demonstrated low to moderate correlation with one another and classified different groups of patients as frail. However, of the 4 electronic frailty metrics, HFRS demonstrated the highest predictive discrimination across 3 adverse surgical outcomes, and HFRS augmented the performance of a multivariate preoperative risk model that included standard risk factors. The eFI demonstrated lower predictive discrimination and did not augment the discrimination of a standard preoperative-risk model, while the RAI and mFI-5 models demonstrated relatively poor predictive discrimination for adverse surgical outcomes.
Prior studies have characterized the predictive validity of electronic frailty metrics by measuring univariate associations between the metrics and adverse surgical outcomes, but this approach fails to adequately consider risk factors for adverse surgical outcomes that are conceptually distinct from frailty. By comparing the predictive discrimination of various multivariate models, we disentangled the degree to which HFRS, eFI, mFI-5, and RAI potentially measure frailty vs simply measuring age or comorbidity burden.58 We found that RAI and mFI-5 did not augment the predictive value of models that included age and the Charlson Comorbidity Index. Given the Charlson Comorbidity Index is readily calculable within many comprehensive EHR systems, the incremental utility of calculating RAI and mFI-5 is questionable.59-61
Ultimately, preoperative risk, comorbidity, and frailty assessments should be considered distinct objectives for surgical health systems.62,63 Preoperative risk assessment is concerned with predicting surgical outcomes, and several well-established tools have been developed to predict the risk of various adverse surgical outcomes,64,65 including automated tools embedded into the EHR.66,67 Comorbidity is defined as the presence of multiple medical conditions, and tools for the measurement of comorbidity44,68,69 can potentially identify patients with conditions that are amenable to specific medical treatments. Frailty assessments seek to identify patients with a syndrome that is amenable to frailty-specific interventions, such as exercise prehabilitation,70,71 nutritional optimization, or anxiety-relief strategies, or who may require additional support in surgical decision-making or postoperative recovery. Guidelines for preoperative assessment of older surgical patients from the American College of Surgeons and American Geriatrics Society1,3 strongly recommend the assessment of baseline frailty using the Fried frailty score,18,72 but this score requires manual examination of walking speed and grip strength, as well as additional patient input to determine the presence of exhaustion, low physical activity, and weight loss. Further research is needed to understand how appropriately and efficiently electronic frailty metrics can target additional frailty-specific evaluations or interventions.
Health systems that are considering implementing electronic frailty metrics into their workflows must consider the challenges of adapting these metrics for calculation and real-time deployment within their information systems. The mFI-5 and RAI models are derived from NSQIP data, which are limited to a sample of all surgical cases performed at participating institutions. Adapting these measures for the EHR would require the identification of sources or replacements for the relevant NSQIP variables within the EHR. Compared with the other metrics, eFI incorporates the largest breadth of patient data, which may not be readily available within all information systems, and a proportion of patients may have insufficient data to calculate eFI. The fragmentation of different health care environments and information systems may influence the calculation of these metrics.73
Our study has several limitations. First, our study did not assess the association between these electronic frailty metrics and other frailty instruments based on in-person clinician assessment, such as the FRAIL Scale, the Clinical Frail Scale, or the Edmonton Frail Scale. Further study is needed to consider the correlation and to compare the predictive validity of electronic frailty metrics and clinician-based frailty assessments. Second, our analysis was performed within an integrated health system with a single EHR system. Health systems with more fragmented processes and data systems may have more difficulty with standardized calculation of scores. Third, our calculations of the electronic frailty metrics necessitated minor deviations from what was originally published. For example, 6 of the 109 International Classification of Diseases codes used to calculate HFRS are not used within our health system, we used diagnosis codes to replace several NSQIP variables that are no longer available after 2015 to calculate RAI,74 and we calculated a published version of the eFI without functional status deficits owing to lack of availability of structured data from Medicare Annual Wellness Visits. Furthermore, the accuracy of the diagnostic codes used by the electronic frailty metrics likely varies across health systems with different coding practices. We view potential issues with diagnostic coding accuracy as inherent challenges with deriving a standardized frailty metric that can be transported across health care delivery systems and time periods. Fourth, while we evaluated the association between electronic frailty metrics and major NSQIP complications, mortality, and readmissions, we did not evaluate the association between promising electronic frailty metrics and important procedure-specific complications, such as endoleaks after endovascular aortic aneurysm repair. Additionally, while we investigated various subpopulations based on specific procedure types, urgency, age, and operative approach, further investigation is needed into the predictive value of these frailty metrics and ultimately their utility in guiding care within specific populations and clinical contexts.
The focus on leveraging the EHR to generate a measure of frailty reflects the need for objective and standardized metrics to support preoperative assessments without creating excess work for frontline clinicians. Health systems that are considering how to embed electronic frailty metrics into clinical workflows must consider the performance characteristics of these tools for identifying frailty, the value of these metrics above and beyond other tools for preoperative risk stratification, and the feasibility of adapting and calculating these metrics within their information systems. Additional study is needed to develop and evaluate measures of frailty that can be implemented at scale and to develop electronic measures of conditions underlying frailty, such as sarcopenia, malnutrition, and functional limitation.
Accepted for Publication: December 31, 2021.
Published Online: March 16, 2022. doi:10.1001/jamasurg.2022.0172
Corresponding Author: Elizabeth M. Cespedes Feliciano, ScD, MSc, Division of Research, Kaiser Permanente Northern California, 2000 Broadway, 5th Floor, Oakland, CA 94612 (firstname.lastname@example.org).
Author Contributions: Drs Le and Zhang had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Le, Liu, Peng, Cespedes Feliciano.
Acquisition, analysis, or interpretation of data: Le, Liu, Kipnis, Zhang, Cespedes Feliciano.
Drafting of the manuscript: Le, Liu.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Le, Kipnis, Zhang.
Obtained funding: Cespedes Feliciano.
Administrative, technical, or material support: Liu, Peng, Cespedes Feliciano.
Supervision: Liu, Kipnis, Peng, Cespedes Feliciano.
Conflict of Interest Disclosures: Dr Liu reported grants from National Institutes of Health (G35GM128672) during the conduct of the study. Dr Cespedes Feliciano reported grants from the National Institute on Aging during the conduct of the study and grants from the National Cancer Institute outside the submitted work. No other disclosures were reported.
Funding/Support: This work was supported by National Institute on Aging/National Institutes of Health (R01AG065334). Dr Le received funding from The Permanente Medical Group Delivery Science Fellowship Program. Dr Liu was supported in part by National Institutes of Health (R35GM128672).
Role of the Funder/Sponsor: The National Institutes of Health and The Permanente Medical Group had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.