[Skip to Navigation]
Sign In

Featured Clinical Reviews

April 16, 2020

Predictive Mathematical Models of the COVID-19 Pandemic: Underlying Principles and Value of Projections

Author Affiliations
  • 1Department of Medical Statistics, London School of Hygiene & Tropical Medicine, London, United Kingdom
  • 2Division of Epidemiology & Biostatistics, School of Public Health, University of California, Berkeley
  • 3MRC Centre for Global Infectious Disease Analysis, Abdul Latif Jameel Institute for Disease and Emergency Analytics, and Department of Infectious Disease Epidemiology, Imperial College, London, United Kingdom
JAMA. 2020;323(19):1893-1894. doi:10.1001/jama.2020.6585

Numerous mathematical models are being produced to forecast the future of coronavirus disease 2019 (COVID-19) epidemics in the US and worldwide. These predictions have far-reaching consequences regarding how quickly and how strongly governments move to curb an epidemic. However, the primary and most effective use of epidemiological models is to estimate the relative effect of various interventions in reducing disease burden rather than to produce precise quantitative predictions about extent or duration of disease burdens. For predictions, “models are not crystal balls,” as Ferguson noted in a recent overview of the role of modeling.1

Nevertheless, consumers of epidemiological models, including politicians, the public, and the media, often focus on the quantitative predictions of infections and mortality estimates. Such measures of potential disease burden are necessary for planners who consider future outcomes in light of health care capacity. How then should such estimates be assessed?

Although relative effects on infections associated with various interventions are likely more reliable, accompanying estimates from models about COVID-19 can contribute to uncertainty and anxiety. For instance, will the US have tens of thousands or possibly even hundreds of thousands of deaths? The main focus should be on the kinds of interventions that could help reduce these numbers because the interventions undertaken will, of course, determine the eventual numerical reality. Model projections are needed to forecast future health care demand, including how many intensive care unit beds will be needed, where and when shortages of ventilators will most likely occur, and the number of health care workers required to respond effectively. Short-term projections can be crucial to assist planning, but it is usually unnecessary to focus on long-term “guesses” for such purposes. In addition, forecasts from computational models are being used to establish local, state, and national policy. When is the peak of cases expected? If social distancing is effective and the number of new cases that require hospitalization is stable or declining, when is it time to consider a return to work or school? Can large gatherings once again be safe? For these purposes, models likely only give insight into the scale of what is ahead and cannot predict the exact trajectory of the epidemic weeks or months in advance. According to Whitty, models should not be presented as scientific truth; they are most helpful when they present more than what is predictable by common sense.2

Estimates that emerge from modeling studies are only as good as the validity of the epidemiological or statistical model used; the extent and accuracy of the assumptions made; and, perhaps most importantly, the quality of the data to which models are calibrated. Early in an epidemic, the quality of data on infections, deaths, tests, and other factors often are limited by underdetection or inconsistent detection of cases, reporting delays, and poor documentation, all of which affect the quality of any model output. Simpler models may provide less valid forecasts because they cannot capture complex and unobserved human mixing patterns and other time-varying characteristics of infectious disease spread. On the other hand, as Kucharski noted, “complex models may be no more reliable than simple ones if they miss key aspects of the biology. Complex models can create the illusion of realism, and make it harder to spot crucial omissions.”3 A greater level of detail in a model may provide a more adequate description of an epidemic, but outputs are sensitive to changes in parametric assumptions and are particularly dependent on external preliminary estimates of disease and transmission characteristics, such as the length of the incubation and infectious periods.

In predicting the future of the COVID-19 pandemic, many key assumptions have been based on limited data. Models may capture aspects of epidemics effectively while neglecting to account for other factors, such as the accuracy of diagnostic tests; whether immunity will wane quickly; if reinfection could occur; or population characteristics, such as age distribution, percentage of older adults with comorbidities, and risk factors (eg, smoking, exposure to air pollution). Some critical variables, including the reproductive number (the average number of new infections associated with 1 infected person) and social distancing effects, can also change over time. However, many reports of models do not clearly report key assumptions that have been included or the sensitivity to errors in these assumptions.

Predictive models for large countries, such as the US, are even more problematic because they aggregate heterogeneous subepidemics in local areas. Individual characteristics, such as age and comorbidities, influence risk of serious disease from COVID-19, but population distributions of these factors vary widely in the US. For example, the population of Colorado is characterized by a lower percentage of comorbidities than many southern states. The population in Florida is older than the population in Utah. Even within a state, key variables can vary substantially, such as the prevalence of important prognostic factors (eg, cardiovascular or pulmonary disease) or environmental factors (eg, population density, outdoor air pollution). Social distancing is more difficult to achieve in urban than in suburban or rural areas. In addition, variation in the accuracy of disease incidence and prevalence estimates may occur because of differences in testing between areas. Consequently, projections from various models have resulted in a wide range of possible outcomes. For instance, an early estimate suggested that COVID-19 could account for 480 000 deaths in the US,4 whereas later models quoted by the White House Coronavirus Task Force indicated between 100 000 and 240 000 deaths, and more recent forecasts (as of April 12) suggest between 60 000 and 80 000 deaths.

A recent model from the Institute of Health Metrics and Evaluation has received considerable attention and has been widely quoted by government officials.5 On the surface, the model yields specific predictions of the day on which COVID-19 deaths will peak in each state and the cumulative number of deaths expected over the next 4 months (with substantial uncertainty intervals). However, caveats in these projections may not be widely appreciated by the public or policy makers because the model has some important but opaque limitations. For instance, the predictions assumed similar effects from social distancing as were observed elsewhere in the world (particularly in Hubei, China), which is likely optimistic. The projected fatality model was not based on any epidemiological science and depended on current data on the reported prior increasing number of fatalities in each region—data that are widely acknowledged to be undercounted and poorly reported6—and did not consider the possibility of any second wave of infections. Although the Institute of Health Metrics and Evaluation is continuously updating projections as more data become available and they adapt their methods,7 long-term mortality projections already have shown substantial volatility; in New York, the model predicted a total of 10 243 COVID-19 deaths on March 27, 2020, but the projected number of deaths had increased to 16 262 by April 4, 2020—a 60% increase in a matter of days. Some original projections were quickly at the edge of earlier uncertainty bands that were apparently not sufficiently wide.

Models can be useful tools but should not be overinterpreted, particularly for long-term projections or subtle characteristics, such as the exact date of a peak number of infections. First, models need to be dynamic and not fixed to allow for important and unanticipated effects, which makes them only useful in the short term if accurate predictions are needed. To paraphrase Fauci: models do not determine the timeline, the virus makes the timeline.

Second, necessary assumptions should be clearly articulated and the sensitivity to these assumptions must be discussed. Other factors that are already known or thought to be associated with the pandemic, but not included in the model, should be delineated together with their qualitative implications for model performance. Third, rather than providing fixed, precise numbers, all forecasts from these models should be transparent by reporting ranges (such as CIs or uncertainty intervals) so that the variability and uncertainty of the predictions is clear. It is crucial that such intervals account for all potential sources of uncertainty, including data reporting errors and variation and effects of model misspecification, to the extent possible. Fourth, models should incorporate measures of their accuracy as additional or better data becomes available. If the projection from a model differs from other published predictions, it is important to resolve such differences. Fifth, the public reporting of estimates from these models, in scientific journals and especially in the media, must be appropriately circumspect and include key caveats to avoid the misinterpretation that these forecasts represent scientific truth.

Models should also seek to use the best possible data for local predictions. It is unlikely that epidemics will follow identical paths in all regions of the world, even when important factors such as age distribution are considered. Local data should be used as soon as those data become available with reasonable accuracy. For projections of hospital needs, data on clinical outcomes among patients in local settings are likely to enable more accurate conclusions than poorly reported mortality data from across the world.

At a time when numbers of cases and deaths from COVID-19 continue to increase with alarming speed, accurate forecasts from mathematical models are increasingly important for physicians; epidemiologists; politicians; the public; and, most importantly, for individuals responsible for organizing care for the populations they serve. Given the unpredictable behavior of severe acute respiratory syndrome coronavirus 2, it is best to acknowledge that short-term projections are the most that can be expected with reasonable accuracy. Always assuming the worst-case scenario at state and national levels will lead to inefficiencies and competition for beds and supplies and may compromise effective delivery and quality of care, while assuming the best-case scenario can lead to disastrous underpreparation.

Modeling studies have contributed vital insights into the COVID-19 pandemic, and will undoubtedly continue to do so. Early models pointed to areas in which infection was likely widespread before large numbers of cases were detected; contributed to estimating the reproductive number, case fatality rate, and how long the virus had been circulating in a community; and helped to establish evidence that a significant amount of transmission occurs prior to symptom onset. Mathematical models can be profoundly helpful tools to make public health decisions and ensure optimal use of resources to reduce the morbidity and mortality associated with the COVID-19 pandemic, but only if they are rigorously evaluated and valid and their projections are robust and reliable.

Back to top
Article Information

Corresponding Author: Nicholas P. Jewell, PhD, Department of Medical Statistics, London School of Hygiene & Tropical Medicine, London, United Kingdom (nicholas.jewell@lshtm.ac.uk).

Published Online: April 16, 2020. doi:10.1001/jama.2020.6585

Conflict of Interest Disclosures: Drs Jewell, Lewnard, and Jewell reported currently having a paid contract with Kaiser Permanente to advise them regarding hospital demand associated with coronavirus disease 2019 cases. No other disclosures were reported.

Adam  D.  Special report: the simulations driving the world’s response to COVID-19.   Nature. 2020;580(7803):316-318.PubMedGoogle ScholarCrossref
Whitty  CJ.  What makes an academic paper useful for health policy?   BMC Med. 2015;13:301.PubMedGoogle ScholarCrossref
@AdamJKucharski. Indeed, as this (aptly titled) piece suggests, complex models may be no more reliable than simple ones if they miss key aspects of the biology. Complex models can create the illusion of realism, and make it harder to spot crucial omissions https://www.pnas.org/content/103/33/12221. April 1, 2020. Accessed April 13, 2020. https://twitter.com/AdamJKucharski/status/1245336665691807744
Lawler  J. What healthcare providers need to know: preparing for the COVID-19. American Hospital Association webinar. February 26, 2020. Accessed April 13, 2020.
Murray  CJL; IHME COVID-19 health service utilization forecasting team. Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator-days and deaths by US state in the next 4 months. MedRxiv. Preprint posted March 30, 2020. doi:10.1101/2020.03.27.20043752
Foresti  CCL. The real death toll for COVID-19 is at least 4 times the official numbers. Politico. March 26, 2020. Accessed March 31, 2020. https://www.corriere.it/politica/20_marzo_26/the-real-death-toll-for-covid-19-is-at-least-4-times-the-official-numbers-b5af0edc-6eeb-11ea-925b-a0c3cdbe1130.shtml?refresh_ce-cp
COVID-19 resources. Institute for Health Metrics and Evaluation website. Updated April 10, 2020. Accessed April 13, 2020. http://www.healthdata.org/covid
2 Comments for this article
Prof Gunachandran, M. Sc., | Prof. of Forensic Science and Forensic Practitioner, Tamil Nadu India
The authors have rightly approached the issues related to the reliability of predictions based on statistical approaches to disease epidemics. Biological affinity of virus to human cells, in vitro immunological behaviour of the virus, and degree of immune resistance in individuals eg. are the most likely intrinsic factors influencing the spread of the disease. Age, morbidity, proximity, influx of international passengers, community culture of distancing or isolation, and environment are some other factors affecting the trajectory of the virus spread. In the absence of reliable scientific or medical data on this virus, the model will bear a weak resemblance to the reality of the epidemic. Normally such predictions may serve as an alert to policy makers and health administrators, but in the case of COVID 19, it has triggered more panic and anxiety among citizens worldwide as evidenced by the panic buying of home needs and medicines. This panic and anxiety may promote rumours and depression and thus may affect people's confidence, physiology, and immune system. Statistical interpretation is inevitable in understanding research or trial findings but model studies but in the context of a viral epidemic cannot be viewed as reliably as election or stock market or business demand predictions.
Ignores Computer Models and Assurances They are Reliable
Harold Thimbleby, PhD | Swansea University, Wales
This article perpetuates a worrying blind spot. Despite their central role, computers — and the crucial point, whether they are reliable — are ignored.

For instance, the authors say, “Estimates that emerge from modeling studies are only as good as the validity of the epidemiological or statistical model used; the extent and accuracy of the assumptions made; and, perhaps most importantly, the quality of the data to which models are calibrated.”

Most importantly? In fact, the data is the easy bit.

What does the computer program do with the data? The program is the key
part of the process that produces actual predictions. It’s strange, therefore, for programs not to have central place in any discussion of models and their uses.

Programs contain assumptions and coded-in data. Programs are often not easy to follow; they are not as transparent as data.

Almost all programs have bugs, and how does the program — and the programmer — manage the errors they will introduce into model results?

Given that it must be very tempting to “debug” programs by tinkering with them until they just produce the results one expects, a very serious question all epidemiologists must face is: what rigorous methods did you use to develop the programs to ensure they have integrity? Otherwise, you may have “debugged” your programs to get the results you wanted rather than the true results.

Very few epidemiological models have been made available for scrutiny, and those that have that I have seen (downloaded from refereed papers) are really quite amateurish. For instance, all of them have undocumented "magic numbers" in the code that should have been reviewed as data. The unfortunate conclusion is that current epidemiological model results, and advice based on them, aren’t really trustworthy, regardless of the data they use.

Making programs available for scrutiny is essential for science, but it is not sufficient just to make code available. Just having the code is like being asked to fly an airplane without instructions or manuals: it won’t work! Indeed, most documentation is no more helpful than the few signs found on aircraft (EXIT, SEAT BELTS) are to helping fly a plane! In addition to the code, then, there must be adequate documentation. There needs to be an explanation why the code works, and how to use it. To put it bluntly, if even the authors can't explain their own code, why should anyone believe that it works as they claim?

Given the very serious nature of modelling the COVID pandemic and informing public policy appropriately, JAMA might like to take an initiative, like requiring open source development and full repository information in published papers, including adequate documentation. This will help epidemiologists develop more rigorous models, and help us work together to overcome COVID sooner.

- Harold Thimbleby, harold@thimbleby.net