[Skip to Navigation]
Sign In
May 27, 2022

Prognostic Modeling and Major Dataset Shifts During the COVID-19 Pandemic: What Have We Learned for the Next Pandemic?

Author Affiliations
  • 1Department of Public Health Sciences, The University of Virginia, Charlottesville
JAMA Health Forum. 2022;3(5):e221103. doi:10.1001/jamahealthforum.2022.1103

At the beginning of the COVID-19 pandemic, it was clear that existing recommendations for allocating scarce resources in large disasters were ill-suited for a worldwide respiratory-based pandemic.1 Yet, more than 2 years and 100 published models later, no consensus has emerged on a modeling approach to determine SARS-CoV-2 mortality risk or progression to severe disease. The methodologic shortfalls of these efforts have been well described,2 but there are additional important factors to consider.

Challenges With Prognostic Models for COVID-19 Outcomes

A major challenge was a concept that is used in artificial intelligence, dataset shift.3 A dataset shift occurs when a predictive model fails to perform well using new data because there is a mismatch between the new data and the development dataset that was used to create the model. Three reasons for major dataset shifts arose during the COVID-19 pandemic: (1) random variations in the types of medical care provided that influenced outcomes; (2) the introduction of new effective therapies; and (3) new variations in the characteristics of patients treated that altered their risk profiles (Figure).

Figure.  Challenges to Prognostic Modeling During the COVID-19 Pandemic
Challenges to Prognostic Modeling During the COVID-19 Pandemic

ECMO denotes extracorporeal membrane oxygenation machines; M-CURES refers to the Michigan Critical Care Utilization and Risk Evaluation System; and QCOVID models were commissioned by the UK New and Emerging Respiratory Virus Threats Advisory Group using the QResearch database (a collaboration of the University of Oxford and Egton Medical Information Systems). Photographs reproduced with purchased permission through an extended license from iStockphoto LP.

Most data collection efforts for prognostic modeling occurred during the initial 6 to 8 months of the COVID-19 pandemic when there were major random and severe disruptions to the routine delivery of hospital and intensive care. These disruptions ranged from the failure to fully recognize the serious consequences of the virus to worldwide shortages of personal protective equipment, ventilators, extracorporeal membrane oxygenation machines, and trained medical staff.

Among the most dramatic consequences of these shortfalls occurred in Bergamo, Italy, where a large onslaught of COVID-19 cases overwhelmed the health care system. Medical staff were rapidly exhausted; some died.4 While the tragedy in Bergamo was extreme, the COVID-19 pandemic created similar disruptions worldwide that had major repercussions on the quality of care provided and the outcomes achieved.

Next, health care protocols for treating patients with severe cases of COVID-19 changed rapidly as knowledge expanded. At first, intubation and mechanical ventilation were used. Then, it was recognized that progression to severe lung dysfunction was more prolonged in COVID-19 than in other types of respiratory failure. Patients were managed for longer durations with noninvasive ventilation techniques, such as continuous positive airway pressure and high-flow oxygen therapy. The prone position began to be used for severe pneumonia. During this time, a few important risk factors for poor outcomes consistently emerged: age, race and ethnicity, and severity and number of comorbid conditions, and for inpatients, the degree of difficulty with oxygenation.5

Features of Useful COVID-19 Prognostic Models

Two recently published models, 1 for hospitalized patients and another for populations, have used these features to predict COVID-19 outcomes more effectively.6,7 Both of these models may be instructive for how to approach modeling in the next pandemic. The Michigan Critical Care Utilization and Risk Evaluation System (M-CURES) aims to predict clinical deterioration (mortality or the need for mechanical ventilation, vasopressors, or high-flow nasal cannula) for COVID-19 hospital admissions.6 This model uses age, 6 physiologic variables associated with oxygenation, and 2 surrogate severity measures (head-of-bed position and position of the patient during blood pressure measurement). It was developed with data from 956 COVID-19 hospital admissions at an academic medical center during the first year of the pandemic. It was tested and externally validated with 8335 COVID-19 admissions at 12 other US academic medical centers in different regions. The M-CURES demonstrated reasonable discrimination and calibration, both in subgroups and over time when new therapies, such as monoclonal antibodies and antiviral agents, were introduced (Figure). Because 95% of hospital admissions did not deteriorate, a suggested use for the model was to conserve resources by encouraging early discharges among low-risk patients.

The methodology used to develop M-CURES is attractive—it uses a small number of variables from electronic health records that are consistently defined across multiple medical centers. Data collection for M-CURES stopped in February 2021, before the widespread introduction of vaccines and the emergence of variants, ie, the more severe Delta variant and the milder but more communicable Omicron variant. Thus, the M-CURES model may need modification to adjust for these dramatic changes in the case mix of COVID-19 hospital admissions.8

The QCOVID models were commissioned by the UK’s New and Emerging Respiratory Virus Threats Advisory Group and used data from the QResearch database (University of Oxford and Egton Medical Information Systems), which included 10.5 million patients in the UK.7 The QCOVID-1 model provided very accurate risk stratification for the likelihood of COVID-19 infection with a severe outcome (hospital admission or death). This model used a proportional hazards model and widely available risk factors, such as age, a measure of socioeconomic status, and the number and severity of comorbid conditions.

The QCOVID-1 model has already influenced public policy. In 2021, it was used to identify individuals in the UK at high risk of severe COVID-19 outcomes, adding 1.5 million individuals to a shielded patient list that prioritized them for vaccination.7 Subsequently, QCOVID-2 (unvaccinated) and QCOVID-3 (vaccinated) models adjusted for the substantial effect that vaccines had on the risk of dying from COVID-19. In this postvaccine modeling effort, it was necessary to include deaths occurring after only 1 vaccine dose because deaths became so infrequent after 2 doses.9 This experience is a dramatic example of how introduction of new successful interventions can render even the best modeling efforts obsolete.

Implications for the Next Pandemic

What are the implications for the next pandemic? First, it will be important to prioritize the quality of data collected vs the volume of data. The COVID-19 pandemic encouraged the formation of new consortiums of investigators that rapidly collected data in large electronic databases from health care systems worldwide. Unlike the M-CURES approach, which used clinical knowledge and data-driven selection to focus on a few consistently defined variables, other approaches used many features that were not always well defined. The QCOVID models also relied on well-established comorbidity descriptions and standard demographic variables.

Together, the M-CURES and QCOVID models provided evidence of accurate and useful risk stratification approaches for hospital admissions and general populations, respectively. Both approaches should continue to be adapted and updated so they can be rapidly deployed and used during the next pandemic.

Back to top
Article Information

Published: May 27, 2022. doi:10.1001/jamahealthforum.2022.1103

Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2022 Knaus WA. JAMA Health Forum.

Corresponding Author: William A. Knaus, MD, Department of Public Health Sciences, The University of Virginia, 142 River Ranch Rd, Edwards, CO 81632 (wknaus@virginia.edu).

Conflict of Interest Disclosures: None reported.

Maves  RC, Downar  J, Dichter  JR,  et al; ACCP Task Force for Mass Critical Care.  Triage of scarce critical care resources in COVID-19: an implementation guide for regional allocation.   Chest. 2020;158(1):212-225. doi:10.1016/j.chest.2020.03.063 PubMedGoogle ScholarCrossref
Wynants  L, Van Calster  B, Collins  GS,  et al.  Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal.   BMJ. 2020;369:m1328. doi:10.1136/bmj.m1328 PubMedGoogle ScholarCrossref
Finlayson  SG, Subbaswamy  A, Singh  K,  et al.  The clinician and dataset shift in artificial intelligence.   N Engl J Med. 2021;385(3):283-286. doi:10.1056/NEJMc2104626Google ScholarCrossref
Horowitz  J. The lost days that made Bergamo a Coronavirus tragedy. The New York Times. Published Nov. 29, 2020. Accessed April 27, 2022. https://www.nytimes.com/2020/11/29/world/europe/coronavirus-bergamo-italy.html
Gupta  RK, Marks  M, Samuels  THA,  et al; UCLH COVID-19 Reporting Group.  Systematic evaluation and external validation of 22 prognostic models among hospitalised adults with COVID-19: an observational cohort study.   Eur Respir J. 2020;56(6):2003498. doi:10.1183/13993003.03498-2020 PubMedGoogle ScholarCrossref
Kamran  F, Tang  S, Otles  E,  et al.  Early identification of patients admitted to hospital for COVID-19 at risk of clinical deterioration: model development and multisite external validation study.   BMJ. 2022;376:e068576. doi:10.1136/bmj-2021-068576 PubMedGoogle ScholarCrossref
Clift  AK, Coupland  CAC, Keogh  RH,  et al.  Living risk prediction algorithm (QCOVID) for risk of hospital admission and mortality from Coronavirus 19 in adults: national derivation and validation cohort study.   BMJ. 2020;371:m3731. doi:10.1136/bmj.m3731 PubMedGoogle ScholarCrossref
Habib  AR, Lo  NC.  Predicting COVID-19 outcomes.   BMJ. 2022;376:354. doi:10.1136/bmj.o354 PubMedGoogle ScholarCrossref
Hippisley-Cox  J, Coupland  CAC, Mehta  N,  et al.  Risk prediction of COVID-19 related death and hospital admission in adults after COVID-19 vaccination: national prospective cohort study.   BMJ. 2021;374(2300):n2244. doi:10.1136/bmj.n2244PubMedGoogle ScholarCrossref