Customize your JAMA Network experience by selecting one or more topics from the list below.
Identify all potential conflicts of interest that might be relevant to your comment.
Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.
Err on the side of full disclosure.
If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.
Not all submitted comments are published. Please see our commenting policy for details.
Angulo FJ, Finelli L, Swerdlow DL. Estimation of US SARS-CoV-2 Infections, Symptomatic Infections, Hospitalizations, and Deaths Using Seroprevalence Surveys. JAMA Netw Open. 2021;4(1):e2033706. doi:10.1001/jamanetworkopen.2020.33706
Accounting for underreporting, what is the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) disease burden in the US?
In this cross-sectional study using data from public health surveillance of reported coronavirus disease 2019 cases and seroprevalence surveys, an estimated 46 910 006 SARS-CoV-2 infections, 28 122 752 symptomatic infections, 956 174 hospitalizations, and 304 915 deaths occurred in the US through November 15, 2020.
Findings of this study suggest that although more than 14% of the US population was infected with SARS-CoV-2 by mid-November, a substantial gap remains before herd immunity can be reached.
Estimates of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) disease burden are needed to help guide interventions.
To estimate the number of SARS-CoV-2 infections, symptomatic infections, hospitalizations, and deaths in the US as of November 15, 2020.
Design, Setting, and Participants
In this cross-sectional study of respondents of all ages, data from 4 regional and 1 nationwide Centers for Disease Control and Prevention (CDC) seroprevalence surveys (April [n = 16 596], May, June, and July [n = 40 817], and August [n = 38 355]) were used to estimate infection underreporting multipliers and symptomatic underreporting multipliers. Community serosurvey data from randomly selected members of the general population were also used to validate the underreporting multipliers.
Main Outcomes and Measures
SARS-CoV-2 infections, symptomatic infections, hospitalizations, and deaths. The median of underreporting multipliers derived from the 5 CDC seroprevalence surveys in the 10 states that participated in 2 or more surveys were applied to surveillance data of reported coronavirus disease 2019 (COVID-19) cases for 5 respective time periods to derive estimates of SARS-CoV-2 infections and symptomatic infections, which were summed to estimate SARS-CoV-2 infections and symptomatic infections in the US. Estimates of infections and symptomatic infections were combined with estimates of the hospitalization ratio and fatality ratio to derive estimates of SARS-CoV-2 hospitalizations and deaths. External validity of the surveys was evaluated with the April CDC survey by comparing results to 5 serosurveys (n = 22 118) that used random sampling of the general population. Internal validity of the multipliers from the 10 specific states was assessed in the August CDC survey by comparing multipliers from the 10 states to all states. A sensitivity analysis was conducted using the interquartile range of the multipliers to derive a high and low estimate of SARS-CoV-2 infections and symptomatic infections. The underreporting multipliers were then used to adjust the reported COVID-19 infections to estimate the full SARS-COV-2 disease burden.
Adjusting reported COVID-19 infections using underreporting multipliers derived from CDC seroprevalence studies in April (n = 16 596), May (n = 14 291), June (n = 14 159), July (n = 12 367), and August (n = 38 355), there were estimated medians of 46 910 006 (interquartile range [IQR], 38 192 705-60 814 748) SARS-CoV-2 infections, 28 122 752 (IQR, 23 014 957–36 438 592) symptomatic infections, 956 174 (IQR, 782 509–1 238 912) hospitalizations, and 304 915 (IQR, 248 253–395 296) deaths in the US through November 15, 2020. An estimated 14.3% (IQR, 11.6%-18.5%) of the US population were infected by SARS-CoV-2 as of mid-November 2020.
Conclusions and Relevance
The SARS-CoV-2 disease burden may be much larger than reported COVID-19 cases owing to underreporting. Even after adjusting for underreporting, a substantial gap remains between the estimated proportion of the population infected and the proportion infected required to reach herd immunity. Additional seroprevalence surveys are needed to monitor the pandemic, including after the introduction of safe and efficacious vaccines.
Estimates of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections are needed to understand how interventions can be titrated to reopen society.1 Seroprevalence data provide an estimate of the proportion of the population who has been infected, and these data can be used for monitoring progress toward herd immunity. The Centers for Disease Control and Prevention (CDC) indicates that there have been 10 846 373 reported coronavirus disease 2019 (COVID-19) cases and 244 810 deaths in the US through November 15, 2020 with 1 037 962 reported cases within the last 7 days of that date (an average of 148 280 reported cases per day).2 The number of reported cases is an underestimate of the true number of persons with infection because many persons with symptomatic COVID-19 either do not seek medical care or are not tested and therefore are not included in tallies of COVID-19 infections reported to public health authorities.3 Furthermore, an estimated 40% of individuals with SARS-CoV-2 infection are asymptomatic and unlikely to be tested and reported.4 In this study, data from seroprevalence surveys were used to adjust for underreporting of COVID-19 infections and thereby derive estimates of the number of SARS-CoV-2 infections, symptomatic infections, hospitalizations, and deaths in the US as of November 15, 2020.
Data were used from 2 types of cross-sectional seroprevalence studies that tested blood specimens for SARS-CoV-2 antibodies: community serosurveys with blood samples collected in April 2020 from randomly selected members of the general population5-9 and 5 seroprevalence surveys conducted by CDC that tested residual diagnostic blood specimens from large commercial laboratories.10,11 The CDC seroprevalence surveys used a SARS-CoV-2–specific enzyme-linked immunosorbent assay that has a reported specificity of more than 99% and sensitivity of 96% for detection of antibodies against the prefusion-stabilized form of SARS-CoV-2 spike protein.12 This study did not involve primary data collection or patient interviews and was determined not to involve human subjects; therefore, per US Department of Health and Human Services regulation under the 45 CFR 46 Common Rule, the study was exempt from institutional review. This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline for cross-sectional studies.
The first 4 CDC seroprevalence surveys were conducted with specimens from persons in 10 specific states (California, Connecticut, Florida, Louisiana, Minnesota, Missouri, New York, Pennsylvania, Utah, and Washington); the fifth seroprevalence survey was conducted nationwide. For each community serosurvey and for each state participating in the CDC seroprevalence surveys, the seroprevalence estimate was multiplied by the population to estimate the number of infections, and the proportion of reported infections was calculated by dividing the reported cases by the estimated infections. The infection underreporting multiplier is the inverse of the proportion of infections that were reported. The CDC estimated symptomatic proportion (60%) was used to derive the symptomatic underreporting multiplier for each seroprevalence survey.4 The median and interquartile range (IQR) of the underreporting multipliers were calculated for the community serosurveys and each CDC seroprevalence survey.
Both external and internal validity of the multipliers derived from the CDC seroprevalence surveys were assessed. The median and IQR of the underreporting multipliers from the April community serosurveys were compared with the April CDC seroprevalence survey to assess external validity. Underreporting multipliers derived using data from the August CDC seroprevalence survey restricted to the 10 specific states were compared with multipliers derived using the data from the August survey from all states to assess internal validity.
The median underreporting multipliers derived from seroprevalence survey results from the 10 specific states in each of the 5 CDC seroprevalence surveys were used for 5 time periods for the pandemic in the US that aligned with dates of the 5 surveys conducted in 2020: January 21 to April 30, May 1 to May 31, June 1 to June 30, July 1 to July 31, and August 1 to November 15. The underreporting multipliers derived from the 10 specific states were multiplied by the number of COVID-19 cases reported in the US during the respective time periods to derive estimates of the number of persons with SARS-CoV-2 infection and symptomatic SARS-CoV-2 infection for each period, which were then summed to estimate the overall number of infections and symptomatic infections during the pandemic in the US.
The number of COVID-19 hospitalizations was estimated by multiplying the estimated symptomatic infections by the CDC estimated symptomatic case hospitalization ratio of 3.4%, and the number of COVID-19 deaths was estimated by multiplying the infection fatality ratio by 0.65%.4 A sensitivity analysis was conducted using the IQR of the underreporting multipliers (rather than the median) from each CDC seroprevalence survey to derive high and low estimates of the underreporting multipliers and high and low estimated numbers of SARS-CoV-2 infections and symptomatic SARS-CoV-2 infections.
Community serosurveys were conducted in California, Florida, Georgia, Indiana, and New York with blood samples collected from 22 118 randomly selected participants from April 10 to May 3, 2020 (Table 1). The median infection underreporting multiplier for the 5 community serosurveys was 10.6× (IQR, 9.1× to 16.9×), and the median symptomatic underreporting multiplier was 6.4× (IQR, 5.5× to 10.1×).
The 5 CDC seroprevalence surveys tested residual diagnostic blood specimens collected from March 23 to May 3 (survey 1), April 20 to June 7 (survey 2), May 19 to June 27 (survey 3), July 3 to July 17 (survey 4), and July 9 to August 12 (survey 5), 2020 (Table 2). Of the 10 specific states, 7 participated in 5 surveys, 8 in 4 surveys, 9 in 3 surveys, and 10 in 2 surveys. In the first 4 surveys, there was a mean of 1689 (range, 824-3264) residual blood specimens tested per state in each survey. Survey 1 included 16 596 blood specimens from the 10 specific states, survey 2 included 14 291 from 9 of the 10 specific states, survey 3 included 14 159 from 8 of the 10 specific states, and survey 4 included 12 367 from 7 of the 10 specific states. Survey 5 of the CDC seroprevalence included 38 355 blood specimens collected in all states except Hawaii, South Dakota, and Wyoming from July 9 to August 12, 2020; 40 states, including 9 of the 10 specific states, collected specimens from July 28 to August 12, 2020. There was a median of 850 specimens (range, 107-1005 specimens) collected per state in the 47 states.
Three of the 5 community serosurveys were conducted in states (California, Florida, and New York) that were among the 10 specific states. The median infection multiplier from the community serosurveys in April was 10.6× (IQR, 9.1× to 16.9×), and the median symptomatic multiplier from the community serosurveys for the same time was 6.4× (IQR, 5.5× to 10.1×), similar to estimates of the infection multiplier (10.8× [IQR, 9.4× to 11.7×]) and the symptomatic multiplier (6.5× [IQR, 5.6× to 7.0×]) from the first CDC seroprevalence survey conducted predominantly in April.
When looking at the multipliers across the other CDC seroprevalence surveys, the median infection multiplier in survey 2 conducted predominantly in May was 4.5× (IQR, 3.9× to 9.9×) and the symptomatic multiplier was 2.7× (IQR, 2.4× to 5.9×). In survey 3 conducted predominantly in June, the median infection multiplier was 5.4× (IQR, 3.7× to 6.5×) and the symptomatic multiplier was 3.2× (IQR, 2.2× to 3.9×). In survey 4 conducted in July, the median infection multiplier was 3.9× (IQR, 3.4× to 5.5×) and the symptomatic multiplier was 2.4× (IQR, 2.1× to 3.3×). For the 10 specific states in survey 5 conducted predominantly in August, the median infection multiplier was 3.2× (IQR, 2.5× to 4.0×) and the symptomatic multiplier was 1.9× (IQR, 1.5× to 2.4×). The infection and symptomatic multipliers for all 47 states that participated in survey 5 were 4.0× and 2.4×, respectively, similar to the median multipliers for the 10 specific states in survey 5. Across the 5 seroprevalence surveys, there was a decrease in the underreporting multipliers and an increase in uniformity of the underreporting multipliers between the states later in the pandemic.
We used the underreporting multipliers from the 5 CDC seroprevalence surveys to adjust public health surveillance data of reported COVID-19 reported cases for 5 time periods (Table 3). There were an estimated 46 910 006 SARS-CoV-2 infections, 28 122 752 symptomatic infections, 956 174 hospitalizations, and 304 915 deaths in the US through November 15, 2020 (Figure); within the last 7 days of that date, there were an estimated 3 321 478 infections. In the sensitivity analysis using the IQR of the multipliers, the ranges of estimates were 38 192 705 to 60 814 748 SARS-CoV-2 infections, 23 014 957 to 36 438 591 symptomatic infections, 782 509 to 1 238 912 hospitalizations, and 248 253 to 395 296 deaths. These data indicate that 14.3% (range, 11.6%-18.5%) of the US population (ie, 328 239 523) was infected with SARS-CoV-2 and 8.6% (range, 7.0%-11.1%) had a symptomatic infection, with an infection hospitalization ratio of 2.0% (range, 1.6%-2.5%) and symptomatic fatality ratio of 1.1% (range, 0.8%-1.3%) through November 15, 2020.
By mid-November of 2020, a substantial proportion of the US population was infected with SARS-CoV-2. Given that the critical proportion for herd immunity for SARS-CoV-2 (ie, the proportion of the population that needs to have SARS-CoV-2 antibodies to disrupt ongoing transmission) is approximately 60% based on an estimated SARS-CoV-2 reproduction number of 2.5,4 the US population remains a long way from herd immunity even with millions of new infections each week. The number of estimated COVID-19 deaths is also remarkably more than the reported deaths in the US through November 15, 2020, supporting the conclusion that approximately 35% of COVID-19 deaths are not reported.13
Reported COVID-19 cases do not represent the full SARS-CoV-2 disease burden.14 Case reports are dependent on patients seeking health care, availability and type of care (eg, telemedicine), and testing availability. Using data from seroprevalence surveys and surveillance is a common strategy for estimating underreporting and disease burden. This approach was used successfully throughout the 2009 novel influenza A pandemic.15 For example, in a simple model combining several sources of surveillance data, the CDC estimated 55 million symptomatic infections of 2009 pandemic influenza A (pH1N1) had occurred in the US by December 2009.16 This finding was consistent with an estimate from a seroprevalence survey for same period estimating 59 million infections (including asymptomatic infections).17
There are several methodologic issues, including time between infection, antibody development, antibody waning, and reporting of laboratory-confirmed infections, that must be considered when using seroprevalence surveys to derive underreporting multipliers to adjust surveillance data to estimate disease burden. In the New York statewide serosurvey, Rosenberg et al9 used the number of reported cases 1 week before the start of the survey to estimate the underreporting multiplier. In the Indiana statewide serosurvey, Menachemi et al8 included polymerase chain reaction test results to account for recent infections. Neither of these approaches (using an earlier reported number of cases to account for the antibody development lag or using current reported number with seroprevalence and polymerase chain reaction results to account for the antibody development lag and recent infections) accounts for the inherent delays in reporting cases. A sensitivity analysis of the CDC seroprevalence surveys suggested that using the number of reported cases at the end of the survey period provides a useful estimate of the underreporting multipliers, particularly early in the pandemic.10 Therefore, the number of reported cases on the last day of the seroprevalence survey was used in the COVID-19 disease burden estimation in this study.
Multipliers will change over time as the proportion of persons with infection tested, diagnosed, and reported changes, so a conservative approach was used in this disease burden estimation. In the 10 specific states that participated in the early and late CDC seroprevalence surveys, the median infection underreporting multiplier declined from 10.8× to 3.2×, and the symptomatic underreporting multiplier declined from 6.5× to 1.9×. Therefore, 5 different sets of multipliers based on the 5 CDC seroprevalence surveys were used in this estimation. Using the early (and larger) underreporting multipliers later in the pandemic would have resulted in an overestimation of the number of infected and symptomatic people and increased the estimated number of hospitalizations and deaths.
This study has limitations. All seroprevalence surveys should be evaluated for selection bias.18 Conducting serosurveys with a random sampling design of the general population may be difficult to do in a pandemic, but a random sampling design of the general population yields a seroprevalence estimate least likely to be affected by selection bias. The CDC seroprevalence surveys used residual diagnostic blood specimens from large commercial diagnostic laboratories. The CDC seroprevalence surveys therefore rely on convenience sampling, which can contribute to selection bias; for example, more severely ill people (those hospitalized or visiting health care professionals) may be more likely to be tested. The external validity of the CDC seroprevalence surveys was evaluated by comparing the multipliers derived from the first seroprevalence survey with available contemporaneous community serosurveys that used random sampling designs of the general population that minimize selection bias. Of note, there were 3 states in common between the 5 community serosurveys and the 10 states that participated in the first CDC seroprevalence survey. The similarity of the median and IQR of the multipliers derived from the CDC seroprevalence survey to those derived from the community serosurveys (ie, symptomatic underreporting multiplier 6.4× [IQR, 5.5× to 10.1×] compared with 6.5× [IQR, 5.6× to 7.0×]) supports the conclusion of limited selection bias in the CDC seroprevalence surveys.
An additional limitation of the approach to estimate the COVID-19 disease burden in the US using underreporting multipliers derived from the 10 specific states is that seroprevalence results from these states may not be nationally representative. The internal validity of using the 10 specific states to represent national data was therefore evaluated using the fifth CDC seroprevalence survey. The national infection and symptomatic underreporting multipliers derived from the 47 states that participated in the fifth survey (4× and 2.4×, respectively) were similar to, although higher than, the median multipliers derived from the 10 specific states in the fifth survey (3.2× and 1.9×, respectively), suggesting that the multipliers derived from the 10 specific states are nationally representative and conservative (resulting in a lower disease burden).
Another limitation of this analysis is that the derivation of the multipliers used summary data from the seroprevalence surveys; the data sets of the seroprevalence surveys were not available. For each seroprevalence survey, medians of the multipliers of the states that participated in the survey were used to calculate the number of infections. Standard errors of the medians were too large to warrant derivation of CIs around the median. To explore the impact of the variation of the multipliers in each survey on the estimates of infections, a sensitivity analysis was conducted using the multipliers’ IQR (rather than median multiplier). A key finding of the sensitivity analysis is that the multipliers’ IQR became increasingly narrow as the pandemic progressed. This finding, coupled with the progressively diminishing multipliers with each passing month, probably reflects more widespread access to SARS-CoV-2 testing (and therefore smaller underreporting multipliers with less geographic variation) in later months. The narrowing of the multipliers’ IQR later in the pandemic is particularly important during the final time period of this study (eg, the IQR for the symptomatic underreporting multiplier is 1.5× to 2.4× after August 1) because 58% of the almost 11 million reported COVID-19 infections in the US by mid-November occurred after August 1. The small variability in the underreporting multipliers derived from the 10 specific states in the August seroprevalence survey used to adjust 58% of the reported infections provides increased confidence in the overall disease burden estimates.
Finally, using seroprevalence surveys to derive estimates of underreporting of COVID-19 infections assumes that all persons infected with SARS-CoV-2 will have detectable antibodies at the time of the seroprevalence survey. The CDC seroprevalence surveys used a laboratory test with a high accuracy for detection of SARS-CoV-2 antibodies,12 reducing the likelihood of failing to identify antibodies in persons infected. Furthermore, the seroprevalence results were adjusted for laboratory test characteristics.10 However, the CDC seroprevalence surveys may have underestimated the number of infected persons due to waning antibodies. Although the longevity of antibodies in persons infected with SARS-CoV-2 is not fully understood, waning antibodies have been reported in some persons infected,19,20 indicating that seroprevalence surveys may not detect some persons previously infected, particularly later in the pandemic, which would result in underestimation of the COVID-19 disease burden in this report.
In this cross-sectional study, estimates of underreporting multipliers were derived and combined with surveillance data to adjust reported surveillance data for underreporting. Results suggest that although more than 14% of the US population may have been infected with SARS-CoV-2 as of mid-November 2020, there remains a substantial gap between the estimated proportion of the population infected and the proportion infected that is required for herd immunity. Additional seroprevalence surveys are warranted to monitor the pandemic, including after the development of safe and efficacious vaccines.
Accepted for Publication: November 24, 2020.
Published: January 5, 2021. doi:10.1001/jamanetworkopen.2020.33706
Open Access: This is an open access article distributed under the terms of the CC-BY-NC-ND License. © 2021 Angulo FJ et al. JAMA Network Open.
Corresponding Author: Frederick J. Angulo, DVM, PhD, Medical Development and Scientific/Clinical Affairs, Pfizer Vaccines, 4024 NE Alameda St, Portland, OR 97212 (firstname.lastname@example.org).
Author Contributions: Dr Angulo had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: All authors.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: All authors.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: All authors.
Obtained funding: Angulo.
Administrative, technical, or material support: Angulo, Finelli.
Supervision: All authors.
Conflict of Interest Disclosures: Dr Angulo reported being employed by Pfizer Vaccines and owning stock and stock options in Pfizer. Dr Finelli reported being employed by Merck & Co Inc and may own stock in the company. Dr Swerdlow reported being employed by Pfizer Vaccines and owning stock and stock options in Pfizer, as well as providing overviews of severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome epidemiology to a consulting firm for a minimal honorarium.
Funding/Support: This work was supported by Pfizer and Merck.
Role of the Funder/Sponsor: Pfizer and Merck had a role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, and approval of the manuscript; and approved the decision to submit the manuscript for publication.