[Skip to Navigation]
Sign In
Invited Commentary
August 11, 2020

Guiding Clinical Decisions Through Predictive Risk Rules

Author Affiliations
  • 1Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina
  • 2Department of Medicine, Duke University, Durham, North Carolina
JAMA Netw Open. 2020;3(8):e2013101. doi:10.1001/jamanetworkopen.2020.13101

The ubiquity of integrated electronic health record systems has greatly simplified the development of clinical decision support tools.1 For both clinical and methodological reasons, the hospital setting is now one of the most appealing domains for developing these tools. Clinically, the inpatient setting is a high-stakes arena where serious events, such as septic shock, cardiac arrest, or death, may occur. Methodologically, the inpatient setting is a closed system where there is little loss to follow-up. Moreover, data are enriched by the magnitude of data points as well as the varying data domains in the hospital setting. With this proliferation of clinical decision support tools in the hospital setting, clinicians have grown accustomed to interacting with these tools daily.

The study by Churpek et al2 presents a risk model for in-hospital acute kidney injury (AKI). This condition is generally characterized by consensus guidelines based on changes in serum creatinine or urine output. The outcomes of in-hospital AKI are well known and include prolonged hospital stays, the need for renal replacement therapy, and increased mortality risk.3 Although most patients with AKI experience partial or complete renal recovery, observational studies have shown increased risk for progressive loss of renal function.4 Churpek at al validated their models internally and externally using a combined data set covering 3 hospital systems and almost 500 000 patients. The results were impressive, with an area under the curve as high as 0.92 for AKI and as high as 0.97 for the need for a rapid-response team. Results were consistent across the internal and external validation sites.

The next question is how to implement clinical decision support as a clinically useful tool. The simplest approach is to present a patient’s individual risk, which allows the clinician to incorporate risk into the larger clinical assessment. While presenting risk is intuitively appealing, this approach is not clinically useful in our experience. For instance, an estimated risk of 7% for stage 2 AKI may not raise any clinical concern, as only 1 of 15 such patients would be expected to experience the event. However, given the low baseline prevalence, this represents a 2-fold increase on baseline risk. Without knowledge of baseline risk, the display of absolute risk is meaningless. In our internal work, we have also explored displaying relative risk, but even clinicians with strong statistical backgrounds found relative risk confusing.

A more useful approach would be a decision rule that guides the clinician. However, attempting to create such a rule exposes a key challenge in estimating inpatient events: the challenge of rare outcomes. For example, when Churpek et al2 estimated the risk of stage 2 AKI for the Loyola University Medical Center group, a risk probability threshold of 0.010 resulted in a sensitivity of 90.1% and a positive predictive value (PPV) of 10.1%. This means that to obtain a decision rule that captures at least 90% of patients at increased risk, that decision rule will be accurate only approximately 10% of the time (ie, approximately 90% of alerts will be false alarms). Decision rules like these are associated with alert fatigue5 and ultimately may influence clinicians to ignore the clinical decision support tool.6

Conversely, if an alert is accurate approximately 50% of the time (PPV, 48.6%), that alert will capture only 17% of all patients with increased risk (sensitivity, 17.1%), missing most true events. This is a common scenario and does not reflect a poor model, but rather a difficult, almost impossible problem. Mathematically, as an event becomes rarer, the PPV gets lower and lower, and the PPV is arguably the most important metric for clinical utility, as it reflects how often an alert is correct. In the study by Churpek et al,2 the rate of at least stage 2 AKI was 3.2%, or a rare outcome.

This low PPV raises an important question: what is the best way to perform clinical decision support when outcome rates are rare? One of the simplest approaches is to make the outcome less rare. This can be achieved by creating a composite outcome.7 Churpek et al2 did this in some of their analyses by estimating at least stage 2 AKI, a combination of stages 2 and 3. Interestingly, although not surprisingly, this composite outcome produced a worse model. The area under the curve for stage 3 AKI ranged from 0.91 to 0.92, while the area under the curve for at least stage 1 AKI ranged from 0.67 to 0.72. The former range presents a very strong risk model; the latter is barely clinically useful.

Therefore, instead of a single decision rule, we prefer a multitiered decision rule that guides the clinician through the decision-making process. When implementing inpatient clinical decision support tools in our own work, we have created a red/yellow/green-light system.8 Red represents a high-risk group with a known (and high) PPV. A clinician can be relatively certain that these patients will experience the event. Green is a low-risk group with an accordingly low sensitivity and high negative predictive value. A clinician can be confident that these patients will not experience the event. Yellow represents those patients for whom there is less certainty and clinical judgement is needed. By framing the high-risk group around PPV, we can help protect clinicians from an onerous number of false alerts. Anticipated workload could be a secondary means for choosing these thresholds. If patients in the red-light group need direct assessment, this group should be small enough to avoid overwhelming clinicians.

Using the data from Loyola University Medical Center, one could create a low/medium-risk cutoff of 0.01 and medium/high-risk cutoff of 0.057. In this scenario, the high-risk category would have a PPV of 20.4%, a reasonably high PPV, and the combined medium/high-risk category would have a sensitivity of 90.1%. The low-risk category would have a negative predictive value of 99.6%. Daily, we would expect approximately 5 patients to be high risk (10% of all patients) and approximately 8 patients to be medium risk (15% of patients), a reasonably small number for clinicians to evaluate.

Such an approach would guide the clinician through the clinical decision support tool. Clinicians would know they need to focus on the patients with high risk and could safely devote less attention to the patients at low risk. The patients at moderate risk would require clinical judgement. Moreover, such thresholds would allow the clinical decision support developer to relay expected performance to the clinicians. When clinicians know that a high-risk alert represents a 20% risk of an event (still more likely for no event to occur), they can manage their expectations. In our experience, relaying such metrics is an important component of avoiding alert fatigue. These thresholds can also be easily adapted as workflows evolve.

Clinical decision support tools for in-hospital events constitute an important component of clinical care. As the work by Churpek et al2 illustrates, these models are of exceedingly high quality. However, to make the models clinically useful, we need to be thoughtful in how we implement them. Multiple decision rules that guide the clinician through the tool are important in ensuring that the tools are used properly and do not lead to clinician distrust.

Back to top
Article Information

Published: August 11, 2020. doi:10.1001/jamanetworkopen.2020.13101

Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2020 Goldstein BA et al. JAMA Network Open.

Corresponding Author: Benjamin A. Goldstein, PhD, Department of Biostatistics and Bioinformatics, Duke University, 2424 Erwin Rd, Ste 9023, Durham, NC 27705 (ben.goldstein@duke.edu).

Conflict of Interest Disclosures: None reported.

Goldstein  BA, Navar  AM, Pencina  MJ, Ioannidis  JPA.  Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review.   J Am Med Inform Assoc. 2017;24(1):198-208. doi:10.1093/jamia/ocw042PubMedGoogle ScholarCrossref
Churpek  MM, Carey  KA, Edelson  DP,  et al.  Internal and external validation of a machine learning risk score for acute kidney injury.   JAMA Netw Open. 2020;3(8):e2012892. doi:10.1001/jamanetworkopen.2020.12892Google Scholar
Liangos  O, Wald  R, O’Bell  JW, Price  L, Pereira  BJ, Jaber  BL.  Epidemiology and outcomes of acute renal failure in hospitalized patients: a national survey.   Clin J Am Soc Nephrol. 2006;1(1):43-51. doi:10.2215/CJN.00220605PubMedGoogle ScholarCrossref
Coca  SG, Yusuf  B, Shlipak  MG, Garg  AX, Parikh  CR.  Long-term risk of mortality and other adverse outcomes after acute kidney injury: a systematic review and meta-analysis.   Am J Kidney Dis. 2009;53(6):961-973. doi:10.1053/j.ajkd.2008.11.034PubMedGoogle ScholarCrossref
van der Sijs  H, Aarts  J, Vulto  A, Berg  M.  Overriding of drug safety alerts in computerized physician order entry.   J Am Med Inform Assoc. 2006;13(2):138-147. doi:10.1197/jamia.M1809PubMedGoogle ScholarCrossref
Bedoya  AD, Clement  ME, Phelan  M, Steorts  RC, O’Brien  C, Goldstein  BA.  Minimal impact of implemented early warning score and best practice alert for patient deterioration.   Crit Care Med. 2019;47(1):49-55. doi:10.1097/CCM.0000000000003439PubMedGoogle ScholarCrossref
Wentzensen  N, Eldridge  RC.  Invited commentary: clinical utility of prediction models for rare outcomes—the example of pancreatic cancer.   Am J Epidemiol. 2015;182(1):35-38. doi:10.1093/aje/kwv028PubMedGoogle ScholarCrossref
O’Brien  C, Goldstein  BA, Shen  Y,  et al.  Development, implementation, and evaluation of an in-hospital optimized early warning score for patient deterioration.   MDM Policy Pract. 2020;5(1):2381468319899663. doi:10.1177/2381468319899663PubMedGoogle Scholar
Limit 200 characters
Limit 25 characters
Conflicts of Interest Disclosure

Identify all potential conflicts of interest that might be relevant to your comment.

Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.

Err on the side of full disclosure.

If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.

Not all submitted comments are published. Please see our commenting policy for details.

Limit 140 characters
Limit 3600 characters or approximately 600 words