Customize your JAMA Network experience by selecting one or more topics from the list below.
Numerous mathematical models are being produced to forecast the future of coronavirus disease 2019 (COVID-19) epidemics in the US and worldwide. These predictions have far-reaching consequences regarding how quickly and how strongly governments move to curb an epidemic. However, the primary and most effective use of epidemiological models is to estimate the relative effect of various interventions in reducing disease burden rather than to produce precise quantitative predictions about extent or duration of disease burdens. For predictions, “models are not crystal balls,” as Ferguson noted in a recent overview of the role of modeling.1
Nevertheless, consumers of epidemiological models, including politicians, the public, and the media, often focus on the quantitative predictions of infections and mortality estimates. Such measures of potential disease burden are necessary for planners who consider future outcomes in light of health care capacity. How then should such estimates be assessed?
Although relative effects on infections associated with various interventions are likely more reliable, accompanying estimates from models about COVID-19 can contribute to uncertainty and anxiety. For instance, will the US have tens of thousands or possibly even hundreds of thousands of deaths? The main focus should be on the kinds of interventions that could help reduce these numbers because the interventions undertaken will, of course, determine the eventual numerical reality. Model projections are needed to forecast future health care demand, including how many intensive care unit beds will be needed, where and when shortages of ventilators will most likely occur, and the number of health care workers required to respond effectively. Short-term projections can be crucial to assist planning, but it is usually unnecessary to focus on long-term “guesses” for such purposes. In addition, forecasts from computational models are being used to establish local, state, and national policy. When is the peak of cases expected? If social distancing is effective and the number of new cases that require hospitalization is stable or declining, when is it time to consider a return to work or school? Can large gatherings once again be safe? For these purposes, models likely only give insight into the scale of what is ahead and cannot predict the exact trajectory of the epidemic weeks or months in advance. According to Whitty, models should not be presented as scientific truth; they are most helpful when they present more than what is predictable by common sense.2
Estimates that emerge from modeling studies are only as good as the validity of the epidemiological or statistical model used; the extent and accuracy of the assumptions made; and, perhaps most importantly, the quality of the data to which models are calibrated. Early in an epidemic, the quality of data on infections, deaths, tests, and other factors often are limited by underdetection or inconsistent detection of cases, reporting delays, and poor documentation, all of which affect the quality of any model output. Simpler models may provide less valid forecasts because they cannot capture complex and unobserved human mixing patterns and other time-varying characteristics of infectious disease spread. On the other hand, as Kucharski noted, “complex models may be no more reliable than simple ones if they miss key aspects of the biology. Complex models can create the illusion of realism, and make it harder to spot crucial omissions.”3 A greater level of detail in a model may provide a more adequate description of an epidemic, but outputs are sensitive to changes in parametric assumptions and are particularly dependent on external preliminary estimates of disease and transmission characteristics, such as the length of the incubation and infectious periods.
In predicting the future of the COVID-19 pandemic, many key assumptions have been based on limited data. Models may capture aspects of epidemics effectively while neglecting to account for other factors, such as the accuracy of diagnostic tests; whether immunity will wane quickly; if reinfection could occur; or population characteristics, such as age distribution, percentage of older adults with comorbidities, and risk factors (eg, smoking, exposure to air pollution). Some critical variables, including the reproductive number (the average number of new infections associated with 1 infected person) and social distancing effects, can also change over time. However, many reports of models do not clearly report key assumptions that have been included or the sensitivity to errors in these assumptions.
Predictive models for large countries, such as the US, are even more problematic because they aggregate heterogeneous subepidemics in local areas. Individual characteristics, such as age and comorbidities, influence risk of serious disease from COVID-19, but population distributions of these factors vary widely in the US. For example, the population of Colorado is characterized by a lower percentage of comorbidities than many southern states. The population in Florida is older than the population in Utah. Even within a state, key variables can vary substantially, such as the prevalence of important prognostic factors (eg, cardiovascular or pulmonary disease) or environmental factors (eg, population density, outdoor air pollution). Social distancing is more difficult to achieve in urban than in suburban or rural areas. In addition, variation in the accuracy of disease incidence and prevalence estimates may occur because of differences in testing between areas. Consequently, projections from various models have resulted in a wide range of possible outcomes. For instance, an early estimate suggested that COVID-19 could account for 480 000 deaths in the US,4 whereas later models quoted by the White House Coronavirus Task Force indicated between 100 000 and 240 000 deaths, and more recent forecasts (as of April 12) suggest between 60 000 and 80 000 deaths.
A recent model from the Institute of Health Metrics and Evaluation has received considerable attention and has been widely quoted by government officials.5 On the surface, the model yields specific predictions of the day on which COVID-19 deaths will peak in each state and the cumulative number of deaths expected over the next 4 months (with substantial uncertainty intervals). However, caveats in these projections may not be widely appreciated by the public or policy makers because the model has some important but opaque limitations. For instance, the predictions assumed similar effects from social distancing as were observed elsewhere in the world (particularly in Hubei, China), which is likely optimistic. The projected fatality model was not based on any epidemiological science and depended on current data on the reported prior increasing number of fatalities in each region—data that are widely acknowledged to be undercounted and poorly reported6—and did not consider the possibility of any second wave of infections. Although the Institute of Health Metrics and Evaluation is continuously updating projections as more data become available and they adapt their methods,7 long-term mortality projections already have shown substantial volatility; in New York, the model predicted a total of 10 243 COVID-19 deaths on March 27, 2020, but the projected number of deaths had increased to 16 262 by April 4, 2020—a 60% increase in a matter of days. Some original projections were quickly at the edge of earlier uncertainty bands that were apparently not sufficiently wide.
Models can be useful tools but should not be overinterpreted, particularly for long-term projections or subtle characteristics, such as the exact date of a peak number of infections. First, models need to be dynamic and not fixed to allow for important and unanticipated effects, which makes them only useful in the short term if accurate predictions are needed. To paraphrase Fauci: models do not determine the timeline, the virus makes the timeline.
Second, necessary assumptions should be clearly articulated and the sensitivity to these assumptions must be discussed. Other factors that are already known or thought to be associated with the pandemic, but not included in the model, should be delineated together with their qualitative implications for model performance. Third, rather than providing fixed, precise numbers, all forecasts from these models should be transparent by reporting ranges (such as CIs or uncertainty intervals) so that the variability and uncertainty of the predictions is clear. It is crucial that such intervals account for all potential sources of uncertainty, including data reporting errors and variation and effects of model misspecification, to the extent possible. Fourth, models should incorporate measures of their accuracy as additional or better data becomes available. If the projection from a model differs from other published predictions, it is important to resolve such differences. Fifth, the public reporting of estimates from these models, in scientific journals and especially in the media, must be appropriately circumspect and include key caveats to avoid the misinterpretation that these forecasts represent scientific truth.
Models should also seek to use the best possible data for local predictions. It is unlikely that epidemics will follow identical paths in all regions of the world, even when important factors such as age distribution are considered. Local data should be used as soon as those data become available with reasonable accuracy. For projections of hospital needs, data on clinical outcomes among patients in local settings are likely to enable more accurate conclusions than poorly reported mortality data from across the world.
At a time when numbers of cases and deaths from COVID-19 continue to increase with alarming speed, accurate forecasts from mathematical models are increasingly important for physicians; epidemiologists; politicians; the public; and, most importantly, for individuals responsible for organizing care for the populations they serve. Given the unpredictable behavior of severe acute respiratory syndrome coronavirus 2, it is best to acknowledge that short-term projections are the most that can be expected with reasonable accuracy. Always assuming the worst-case scenario at state and national levels will lead to inefficiencies and competition for beds and supplies and may compromise effective delivery and quality of care, while assuming the best-case scenario can lead to disastrous underpreparation.
Modeling studies have contributed vital insights into the COVID-19 pandemic, and will undoubtedly continue to do so. Early models pointed to areas in which infection was likely widespread before large numbers of cases were detected; contributed to estimating the reproductive number, case fatality rate, and how long the virus had been circulating in a community; and helped to establish evidence that a significant amount of transmission occurs prior to symptom onset. Mathematical models can be profoundly helpful tools to make public health decisions and ensure optimal use of resources to reduce the morbidity and mortality associated with the COVID-19 pandemic, but only if they are rigorously evaluated and valid and their projections are robust and reliable.
Corresponding Author: Nicholas P. Jewell, PhD, Department of Medical Statistics, London School of Hygiene & Tropical Medicine, London, United Kingdom (email@example.com).
Published Online: April 16, 2020. doi:10.1001/jama.2020.6585
Conflict of Interest Disclosures: Drs Jewell, Lewnard, and Jewell reported currently having a paid contract with Kaiser Permanente to advise them regarding hospital demand associated with coronavirus disease 2019 cases. No other disclosures were reported.
Jewell NP, Lewnard JA, Jewell BL. Predictive Mathematical Models of the COVID-19 Pandemic: Underlying Principles and Value of Projections. JAMA. 2020;323(19):1893–1894. doi:10.1001/jama.2020.6585
Artificial Intelligence Resource Center