Association of Social and Demographic Factors With COVID-19 Incidence and Death Rates in the US

Key Points Question Are population-level social factors associated with coronavirus disease 2019 (COVID-19) incidence and mortality? Findings In this cross-sectional study including 4 289 283 COVID-19 cases and 147 074 COVID-19 deaths, county-level sociodemographic risk factors as assessed by the Social Vulnerability Index were associated with greater COVID-19 incidence and mortality. Meaning These findings suggest that to address inequities in the burden of the COVID-19 pandemic, these sociodemographic risk factors and their root causes must be addressed.


Introduction
While some have referred to coronavirus disease 2019 (COVID-19) as "the great equalizer," early reports from hard-hit areas in the US suggest that the disease has a disproportionate burden associated with the longstanding social determinants of health, including racial/ethnic and socioeconomic disparities. [1][2][3] In Michigan, one of the first states to report COVID-19 data by race/ ethnicity and demographic characteristics, African American individuals initially experienced 31% of the state's 57 397 cases despite representing only 14% of the state's population. 4 Mortality rates are also higher among African American, Hispanic, and Native American individuals with COVID-19. 3,[5][6][7] In New York, New York, the early epicenter of the US COVID-19 outbreak, rates of hospitalizations and deaths were highest in the Bronx, the borough with the highest proportion of members of racial/ ethnic minority groups and households living in poverty. 8 Emerging data on COVID-19 disparities are concerning. What underlying factors can explain this inequitable burden of the pandemic on low-income and minority communities? A myriad of factors have been posited, from biological, to medical risk factors, such as diabetes and lung disease, to social risk factors, such as low socioeconomic status, crowded housing, and necessary use of public transportation. 3,9 As states and localities vary in whether and how frequently they report COVID-19 data by race/ethnicity and other sociodemographic characteristics, [1][2][3]5,8,9 our knowledge of risk factors at the individual level is mostly limited to anecdotal reports and ecological studies.
Prior to the outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), individuals from racial/ethnic minority groups in the US were already more likely to live in areas with significant social disadvantage, characterized by high unemployment and poverty rates, unaffordable housing, and poor health care infrastructure, which may explain the higher disease burden in these communities. Data on these types of social risks or vulnerabilities are readily available and may be examined at the level of small geographic areas, such as counties or census tracts.
Researchers have begun to explore the association between sociodemographic risk factors at the individual and community levels and COVID-19 incidence. 10,11 Our study objective was to extend this work by assessing which sociodemographic risk factors were associated with COVID-19 outcomes over time.

Methods
In this cross-sectional study, we investigated the association between county-level social risk factors and COVID-19 cases and deaths, as well as weekly changes in cumulative incidence and mortality, using publicly available data sets as of July 29, 2020. As a start date, we used January 20, 2020, the date of the first documented case of COVID-19 in the US. The University of Michigan institutional review board deemed the study not regulated, as it involved only analysis of publicly available deidentified data and therefore exempt from informed consent. We followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline. eTable 1 in the Supplement details each data element and source.

Social Vulnerability Index
The Social Vulnerability Index (SVI) was developed by the Centers for Disease Control and Prevention to provide a composite measure of community susceptibility to adversities in the face of health shocks, including disease outbreaks. 13 The SVI is comprised of 4 subindices created using American Community Survey data from 2014 to 2018 on socioeconomic status (including poverty rate, unemployment rate, per capita income, and educational attainment); household composition and disability (including percentages of persons aged Ն65 years or Յ17 years, civilian noninstitutionalized population with disability, and single parent households with children aged <18 years); racial/ethnic minority status and language (includes percentages of individuals who are members of racial/ethnic minority groups [Hispanic, African American, American Indian or Alaska Native, Asian, Native Hawaiian or other Pacific Islander, or other race], and with limited Englishspeaking ability); and housing type and transportation (including percentages of housing in structures with 10 or more units, mobile homes, occupied housing units with more people than rooms, households without vehicles, and persons in institutionalized group quarters). Each subindex is a percentile rank. The overall SVI is calculated by adding up individual indices and converting the summated score into a percentile rank, ranging from 0 to 1, with higher values indicating greater vulnerability to a natural disaster; we rescaled by multiplying by 10 to aid interpretation. Thus, reporting of results of a 1-unit change refer to 0.1 on the original scale.

Other Social Risk Factors
Other county-level sociodemographic measures not included in the SVI were obtained from 2014 to 2018 American Community Survey 5-year estimates. These included more detailed race/ethnicity data (percentages of individuals who are African American, Hispanic/Latinx, American Indian/ Alaskan Native, or Asian); percentage of persons younger than 65 years without health insurance, percentage of individuals 16 years or older using public transport for work commuting, and the Gini index of income inequality (range from 0, indicating complete equality, to 1, indicating complete inequality). Data on food insecurity, defined as the percentage lacking access to a reliable source of food, were obtained from the Robert Wood Johnson Foundation's 2020 county health rankings. 14

Population Health and Health Care Covariates
County-level numbers of primary care physicians and hospital beds were obtained from the Health Resources and Services Administration's Area Health Resources File. 15 Number of intensive care unit beds was obtained from the Henry J. Kaiser Family Foundation. 16 County-level mean life expectancy and percentage of the adult population (age Ն20 years) with obesity, defined as a body mass index (calculated as weight in kilograms divided by height in meters squared) greater than or equal to 30, were obtained from the Robert Wood Johnson Foundation's 2020 county health rankings. 14

Population and Testing Covariates
To calculate population density, data on county area in square miles were collected from the US Census Cartographic Boundary File. Urbanicity was defined using the 2013 Rural-Urban Continuum (Beale) codes from the US Department of Agriculture. 17 Statewide total number of COVID-19 tests performed as of July 29, 2020 was obtained from The COVID Tracking Project. 18

Statistical Analysis
County-level COVID-19 incidence and mortality are reported per 100 000 population. Heat maps display total incidence and mortality rate as of July 29, 2020, superimposed over SVI for each county.
Cross-sectional data on the total number of COVID-19 cases and deaths reported as of July 29, 2020, were used to estimate the association of county characteristics with incidence and mortality on this date. We used mixed-effects negative binomial regression to model the COVID-19 case count.
A mixed-effects zero-inflated negative binomial regression was used to model death counts, since 29% of the counties reported zero deaths. The number of cases per 100 000 population was used to

JAMA Network Open | Public Health
Association of Sociodemographic Factors With COVID-19 Incidence and Death Rates in the US estimate excess zeros in the zero-inflated negative binomial regression that modeled mortality, with the assumption that counties with smaller incidence rates were more likely to experience no COVID-19 deaths. All cross-sectional models included a random intercept for state. Initial bivariate analyses estimated if incidence and mortality were associated with population density, urbanicity, and COVID-19 testing rate. In subsequent analyses, each individual county characteristic was included in separate regression models to avoid potential collinearity issues. Subsequently, serial cross-sectional data on the total number of COVID-19 cases and deaths, reported weekly between March 25 to July 29, 2020, were used to estimate the association between county SVI and weekly change in incidence and death rates. Similar to the single cross-sectional models, we used a mixed-effects negative binomial regression to model changes in weekly COVID-19 cases and a mixed-effects zero-inflated negative binomial regression to model weekly changes in COVID-19 deaths. Serial cross-sectional models included a random intercept accounting for countylevel repeated measures and an interaction between centered time (in weeks) and SVI (overall or subindices) to assess whether sociodemographic risk factors were associated with weekly changes in incidence and mortality in a separate regression for each index.
Incidence rate ratios (IRRs) and estimated probabilities were used to interpret associations. All models controlled for population density, urbanicity, and COVID-19 testing rate, based on a priori conceptual framework and initial bivariate analyses.
We conducted sensitivity analyses excluding data from counties in the top 5 states with the highest COVID-19 incidence rate as of July 29, 2020. For serial cross-sectional models, we checked for robustness by adding state fixed effects to regressions.
Analyses were performed using R statistical software (glmmTMB 19 and ggeffects 20 packages) version 3.6.2 (R Project for Statistical Computing). Bonferroni adjustment was applied for multiple comparisons, considering 2-sided P < .001 statistically significant.

Results
As of July 29, 2020, there were 4 289 283 COVID-19 cases and 147 074 deaths due to COVID-19 in 3137 US counties. The top 5 states with highest incidence rates were Arizona, Connecticut, Delaware, the District of Columbia, and Rhode Island. Figure 1 shows a US heat map with the color gradient representing the overall SVI overlaid with COVID-19 incidence rate and death rate for each county. County-level incidence and mortality rates showed a different pattern than state rankings. The county with the highest incidence rate in the country, Trousdale County, Tennessee (16 348 cases per 100 000 population), had an overall SVI score of 0.52, and the county with the highest death rate, Hancock County, Georgia (398 deaths per 100 000 population) had an overall SVI score of 0.80 (eTable 2 in the Supplement).
In initial bivariate analyses, demographic factors (population density and urbanicity), but not COVID-19 testing rate, were significantly associated with COVID-19 incidence and mortality (eTable 3 in the Supplement). All subsequent analyses adjusted for these 3 covariates per our a priori conceptual framework.
Counties with greater SVI scores, or greater sociodemographic disadvantage, had higher COVID-19 incidence and mortality rates. For example, a midsize metropolitan county with SVI score

Association of SVI With Weekly Change in Cumulative COVID-19 Incidence and Mortality
In serial cross-sectional analyses, we found significant associations between SVI (overall and all 4 subindices) and weekly cumulative change in COVID-19 incidence and mortality (  Higher SVI score (range, 0-1) indicates greater socioeconomic disadvantage. We also plotted estimated weekly cumulative incidence and mortality for a midsize metropolitan county and overall SVI score of either 0.1, 0.5, or 1.0 to investigate how differing levels of sociodemographic disadvantage were associated with different COVID-19 growth curves ( Figure 2). The graph shows that counties with higher SVI scores had greater rates of increase in weekly cumulative incidence and mortality rates.

Individual Sociodemographic Characteristics and COVID-19 Incidence and Mortality Rate
We also assessed the cross-sectional association of individual sociodemographic measures with COVID-19 incidence and mortality (

Association of Population Health and Health Care Measures With COVID-19 Incidence and Mortality Rate
Most of the population health or health care availability measures were not significantly associated with COVID-19 incidence (  Incidence and mortality are presented per 100 000 population from March 25 to July 29, 2020, in a county with a Rural-Urban Continuum Code mean of 6, defined as urban population of 2500 to 19 999, adjacent to a metropolitan area. Graphs present the estimated incidence and mortality per 100 000 with population density and state-wide COVID-19 testing rate fixed at their means, and county random effect fixed at 0. For estimated mortality, the deaths per 100 000 was fixed at the mean. Solid lines show the association between time and cumulative COVID-19 incidence and mortality. Analytic sample excluded the counties spanning New York, New York.

Sensitivity Analyses
In sensitivity analyses excluding the 5 states with the highest COVID-19 incidence rates, the overall SVI and subindices remained associated with COVID-19 incidence and mortality rates, as did most of the individual sociodemographic measures examined. Similarly, in serial cross-sectional analyses, inclusion of state fixed effects did not alter our findings.

Discussion
In this cross-sectional study of US county-level sociodemographic risk factors conducted 6 months into the country's SARS-CoV-2 outbreak, we found significant associations with COVID-19 incidence and mortality across all social domains. The SVI, particularly the minority status and English language proficiency subindex, and many sociodemographic factors examined were associated with incidence and mortality rates. Our analysis revealed that racial/ethnic minority status was significantly associated with COVID-19 incidence and mortality, reinforcing anecdotal evidence on disparities from the field and those preliminarily reported by state health departments. 2,3,5,6,8 Our serial crosssectional analysis also found that social factors were associated with the rate of increase of COVID-19 cases and deaths in US counties. This extends the work of previous cross-sectional studies that examined the association of social risk factors earlier in the COVID-19 pandemic. 10,11 These findings suggest that a significant component of COVID-19 racial/ethnic disparities are associated with community-level social factors. That is, the racial/ethnic disparities apparent in descriptive statistics are revealing underlying disparities in myriad social factors at the macro and mezzo levels known to be associated with disparities in health outcomes, including structural racism.
Communities with socioenvironmental conditions, such as crowded housing and reliance on public transportation, are at higher risk of disease transmission owing to difficulty maintaining social distancing. 21 Residents of low-income and racial/ethnic minority communities are also more likely to have essential worker occupations, which put them at higher risk of person-to-person SARS-CoV-2 exposure and transmission. 22,23 In addition, while certain racial/ethnic minority groups are known to have higher rates of medical risk factors for COVID-19 morbidity and mortality, such as diabetes, hypertension, and lung disease, 24,25 prior literature indicates that it is the upstream social determinants of health that are the underlying factors associated with observed disparities in these chronic health conditions. 26

Policy Implications
Our county-level analysis also suggests different factors for what geographic areas are associated with higher risk for COVID-19, as the counties with the highest incidence rates were not necessarily in the states with the highest number of cases and were often rural counties with predominantly White populations. This underscores that COVID-19 is not simply a problem of people of certain races/ethnicities or socioeconomic position living in certain cities. Longstanding social inequities-and the living and working conditions they generate-are associated with many aspects of the pandemic, including its severity and timing.
Furthermore, the profound economic impact of the pandemic, including the stay-at-home orders and other public health emergency policies necessary to contain SARS-CoV-2, will likely only exacerbate inequities in the social determinants of health, including unemployment, poverty, food insecurity, poor quality education, lack of health insurance or paid sick leave, and lack of access to the internet. These social conditions, driven by public policy choices, in turn may create a vicious cycle of increasing disease transmission and perpetuation of inequities in COVID-19 outcomes if underlying social factors are not considered and addressed.
SARS-CoV-2 neither created the conditions for health disparities nor did it reveal previously unrecognized social inequality. Rather, this pandemic has exacerbated longstanding racial/ethnic, social, political, and economic inequities in the US to once again ensure that the most marginalized and underresourced communities experience the worst outcomes. 27 The difficult but crucial task for local, regional, and national policy makers will be to address, in addition to downstream health care issues, the numerous upstream and midstream social factors associated with health and health care disparities. 27 Our findings demonstrate that the SVI and other sociodemographic measures may be used to target counties or other small geographic areas for expanded COVID-19 testing and treatment sites, as well as identification of sociodemographic risk factors for the purpose of linkage to resources, such as housing or food assistance. As limited English proficiency was significantly associated with COVID-19 outcomes, public service announcements in other languages could be used to disseminate public health guidelines regarding mask use, social distancing, and other mitigation strategies. Local and state public health departments should also coordinate with community-based social service organizations to assess and address social needs that might contribute to SARS-CoV-2 transmission.

JAMA Network Open | Public Health
Association of Sociodemographic Factors With COVID-19 Incidence and Death Rates in the US Additionally, social welfare policies aimed at increasing income, housing, and food security are essential for addressing COVID-19 along with other health disparities.

Limitations
Our study has several limitations. First, our county-level analysis investigates population-level risk but may obscure important individual-level risk factors, including social and medical risks. Second, disease incidence is dependent on testing, and to our knowledge, there is no definitive research on whether disparities in access to testing exist in the US. We adjusted for state-level testing rates, but nationwide county-level data on testing were not available for analysis. Third, our assessment of mortality is dependent on accurate reporting and likely underestimates the full number of COVID-19related deaths. 5,[28][29][30] If underresourced counties with greater social risk factors had lower access to testing and thus fewer deaths attributed to COVID-19, this would likely bias our results to the null.
Fourth, our analysis only captures and thus represents the first 6 months of the COVID-19 pandemic in the US. Longer-term analysis might reveal different findings regarding the association between COVID-19 outcomes and county-level social factors. Fifth, we were not able to incorporate into our study design factors that may have influenced early COVID-19 spread in the US, such as international travel, although these factors were likely less relevant at the time of our analysis 6 months into the US pandemic. Sixth, some early reports suggested that essential workers, such as health care, emergency services, law enforcement, grocery store, transportation, and other workers, were at a higher risk of COVID-19. 23 We did not incorporate this important labor market consideration, as it was out of scope for our analysis of county-level social risk factors. Additionally, we focused our analysis at the county level owing to data availability, but important variation in social risks exists within counties. Future research should examine smaller geographic areas to more closely identify communities with greater social vulnerability and risk of COVID-19. Furthermore, we focused on disease incidence and mortality, but the association of social factors with hospitalizations and other outcomes are worthy of examination. 31,32

Conclusions
This cross-sectional study found that a wide range of social factors, including socioeconomic status, racial/ethnic minority status, family or household composition, and environmental factors, were significantly associated with COVID-19 incidence and mortality, which are also largely considered the driving forces associated with the racial/ethnic and social disparities that are already apparent in the COVID-19 pandemic in the US. To truly bend the curve of disparities in COVID-19 and any future epidemics or pandemics, these social risk factors and their root causes must be addressed through bold policy action and societal investment.