At the beginning of the COVID-19 pandemic, it was clear that existing recommendations for allocating scarce resources in large disasters were ill-suited for a worldwide respiratory-based pandemic.1 Yet, more than 2 years and 100 published models later, no consensus has emerged on a modeling approach to determine SARS-CoV-2 mortality risk or progression to severe disease. The methodologic shortfalls of these efforts have been well described,2 but there are additional important factors to consider.
Challenges With Prognostic Models for COVID-19 Outcomes
A major challenge was a concept that is used in artificial intelligence, dataset shift.3 A dataset shift occurs when a predictive model fails to perform well using new data because there is a mismatch between the new data and the development dataset that was used to create the model. Three reasons for major dataset shifts arose during the COVID-19 pandemic: (1) random variations in the types of medical care provided that influenced outcomes; (2) the introduction of new effective therapies; and (3) new variations in the characteristics of patients treated that altered their risk profiles (Figure).
ECMO denotes extracorporeal membrane oxygenation machines; M-CURES refers to the Michigan Critical Care Utilization and Risk Evaluation System; and QCOVID models were commissioned by the UK New and Emerging Respiratory Virus Threats Advisory Group using the QResearch database (a collaboration of the University of Oxford and Egton Medical Information Systems). Photographs reproduced with purchased permission through an extended license from iStockphoto LP.
Most data collection efforts for prognostic modeling occurred during the initial 6 to 8 months of the COVID-19 pandemic when there were major random and severe disruptions to the routine delivery of hospital and intensive care. These disruptions ranged from the failure to fully recognize the serious consequences of the virus to worldwide shortages of personal protective equipment, ventilators, extracorporeal membrane oxygenation machines, and trained medical staff.
Among the most dramatic consequences of these shortfalls occurred in Bergamo, Italy, where a large onslaught of COVID-19 cases overwhelmed the health care system. Medical staff were rapidly exhausted; some died.4 While the tragedy in Bergamo was extreme, the COVID-19 pandemic created similar disruptions worldwide that had major repercussions on the quality of care provided and the outcomes achieved.
Next, health care protocols for treating patients with severe cases of COVID-19 changed rapidly as knowledge expanded. At first, intubation and mechanical ventilation were used. Then, it was recognized that progression to severe lung dysfunction was more prolonged in COVID-19 than in other types of respiratory failure. Patients were managed for longer durations with noninvasive ventilation techniques, such as continuous positive airway pressure and high-flow oxygen therapy. The prone position began to be used for severe pneumonia. During this time, a few important risk factors for poor outcomes consistently emerged: age, race and ethnicity, and severity and number of comorbid conditions, and for inpatients, the degree of difficulty with oxygenation.5
Features of Useful COVID-19 Prognostic Models
Two recently published models, 1 for hospitalized patients and another for populations, have used these features to predict COVID-19 outcomes more effectively.6,7 Both of these models may be instructive for how to approach modeling in the next pandemic. The Michigan Critical Care Utilization and Risk Evaluation System (M-CURES) aims to predict clinical deterioration (mortality or the need for mechanical ventilation, vasopressors, or high-flow nasal cannula) for COVID-19 hospital admissions.6 This model uses age, 6 physiologic variables associated with oxygenation, and 2 surrogate severity measures (head-of-bed position and position of the patient during blood pressure measurement). It was developed with data from 956 COVID-19 hospital admissions at an academic medical center during the first year of the pandemic. It was tested and externally validated with 8335 COVID-19 admissions at 12 other US academic medical centers in different regions. The M-CURES demonstrated reasonable discrimination and calibration, both in subgroups and over time when new therapies, such as monoclonal antibodies and antiviral agents, were introduced (Figure). Because 95% of hospital admissions did not deteriorate, a suggested use for the model was to conserve resources by encouraging early discharges among low-risk patients.
The methodology used to develop M-CURES is attractive—it uses a small number of variables from electronic health records that are consistently defined across multiple medical centers. Data collection for M-CURES stopped in February 2021, before the widespread introduction of vaccines and the emergence of variants, ie, the more severe Delta variant and the milder but more communicable Omicron variant. Thus, the M-CURES model may need modification to adjust for these dramatic changes in the case mix of COVID-19 hospital admissions.8
The QCOVID models were commissioned by the UK’s New and Emerging Respiratory Virus Threats Advisory Group and used data from the QResearch database (University of Oxford and Egton Medical Information Systems), which included 10.5 million patients in the UK.7 The QCOVID-1 model provided very accurate risk stratification for the likelihood of COVID-19 infection with a severe outcome (hospital admission or death). This model used a proportional hazards model and widely available risk factors, such as age, a measure of socioeconomic status, and the number and severity of comorbid conditions.
The QCOVID-1 model has already influenced public policy. In 2021, it was used to identify individuals in the UK at high risk of severe COVID-19 outcomes, adding 1.5 million individuals to a shielded patient list that prioritized them for vaccination.7 Subsequently, QCOVID-2 (unvaccinated) and QCOVID-3 (vaccinated) models adjusted for the substantial effect that vaccines had on the risk of dying from COVID-19. In this postvaccine modeling effort, it was necessary to include deaths occurring after only 1 vaccine dose because deaths became so infrequent after 2 doses.9 This experience is a dramatic example of how introduction of new successful interventions can render even the best modeling efforts obsolete.
Implications for the Next Pandemic
What are the implications for the next pandemic? First, it will be important to prioritize the quality of data collected vs the volume of data. The COVID-19 pandemic encouraged the formation of new consortiums of investigators that rapidly collected data in large electronic databases from health care systems worldwide. Unlike the M-CURES approach, which used clinical knowledge and data-driven selection to focus on a few consistently defined variables, other approaches used many features that were not always well defined. The QCOVID models also relied on well-established comorbidity descriptions and standard demographic variables.
Together, the M-CURES and QCOVID models provided evidence of accurate and useful risk stratification approaches for hospital admissions and general populations, respectively. Both approaches should continue to be adapted and updated so they can be rapidly deployed and used during the next pandemic.
Published: May 27, 2022. doi:10.1001/jamahealthforum.2022.1103
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2022 Knaus WA. JAMA Health Forum.
Corresponding Author: William A. Knaus, MD, Department of Public Health Sciences, The University of Virginia, 142 River Ranch Rd, Edwards, CO 81632 (email@example.com).
Conflict of Interest Disclosures: None reported.
Knaus WA. Prognostic Modeling and Major Dataset Shifts During the COVID-19 Pandemic: What Have We Learned for the Next Pandemic? JAMA Health Forum. 2022;3(5):e221103. doi:10.1001/jamahealthforum.2022.1103
Artificial Intelligence Resource Center