Association of Social Distancing, Population Density, and Temperature With the Instantaneous Reproduction Number of SARS-CoV-2 in Counties Across the United States

Key Points Question How is the instantaneous reproduction number of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) associated with social distancing, wet-bulb temperature, and population density in counties across the United States? Findings In this cohort study of 211 counties in 46 states, social distancing, temperate weather, and lower population density were associated with a decrease in the instantaneous reproduction number of SARS-CoV-2. Of these county-specific factors, social distancing appeared to have the most substantial association with a reduction in SARS-CoV-2 transmission. Meaning In this study, the instantaneous reproduction number of SARS-CoV-2 varied substantially among counties; the associations between the reproduction number and county-specific factors could inform policies to reduce SARS-CoV-2 transmission in selective and heterogeneous communities.


Introduction
Coronavirus disease 2019  is the result of the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This virus has caused a pandemic resulting in more than 3.2 million cases of COVID-19 by May 1, 2020, with more than 236 000 deaths worldwide. By the same date, there were more than 1 million individuals with COVID-19 in the United States, resulting in more than 64 000 deaths. The rapid evolution of this pandemic led to the widespread implementation of social distancing measures across most of the United States and the world.
The transmissibility of SARS-CoV-2, like other viral pathogens, is estimated by the reproduction number (R). An infectious pathogen's R value represents the number of people that will be infected by an individual who has the infection. An R value that exceeds 1 will result in increasing numbers of incident cases as each individual with the infection transmits it to more than 1 other individual. When the R value is below 1, the transmission of that pathogen will eventually cease as each patient will transmit infection to less than 1 person. Therefore, the R value is an important measure to estimate when attempting to predict the evolution of an outbreak. It is often assumed that R is constant for each pathogen; however, R most certainly varies by location and by time, which is referred to as the instantaneous R (R t ). [1][2][3] At the individual level, variation in R t is likely dependent on being in environments where exposure risk is high or of intense duration, such as among high-exposure workers in health care or mass transit settings or for families in densely crowded living conditions. At the community level, variation in R t may also include population density (as a proxy for increased likelihood of crowded conditions), temperature and/or humidity (given their effects on viral propagation), policies such as social distancing, and the number of susceptible individuals.
Models that rely on fixed assumptions for R t are unlikely to capture local heterogeneity in transmission. Our objective was to examine how time-varying changes in social distancing and weather within counties of different population densities might be associated with changing R t values across counties in the United States. Understanding how these time-varying factors might influence the R t of SARS-CoV-2 could allow policy makers to implement targeted interventions to decrease R t in heterogeneous communities.

Methods
Using publicly deidentified data, this study was determined to be exempt from institutional review board review 4 and informed consent. This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline for cohort studies. 5

Setting and Participants
We selected 211 counties, representing 178 892 208 of 326 289 971 US residents (54.8%), based on the following characteristics: had at least 1 case of COVID-19 as of February 25, 2020, and either contained at least 1 city with population exceeding 100 000 residents or the state capital. For states with no counties containing a city with 100 000 persons, the most populated county in that state was selected. We excluded counties with average daily case rates of less than 5 and counties with fewer than 3 days with daily case rates of more than 5 during the analysis period of February 25 to April 23, 2020. We considered time-0 for each county to be the date on which they achieved the minimum threshold of disease activity. A total of 211 counties, representing 46 states and the District of Columbia, met these criteria.

Exposures
There were 3 a priori exposure variables: social distancing practice, population density, and daily mean wet-bulb temperatures. Social distancing was measured using a data set of daily cellular telephone movement, provided by Unacast, that allows comparison of the association of social distancing policies with individuals' movement within a county. 7,8 Based on an a priori assumption and confirmed by preliminary analyses that included cellular telephone measurement of overall distance traveled, we used a social distancing variable that measured percentage change in visits to nonessential businesses (eg, restaurants, hair salons) within each county compared with visits in a 4-week baseline period between February 10 and March 8, 2020. We used a rolling average of the percentage of visits 3 to 14 days before time-0, based on the lag observed between change in social distancing and mean R t estimates across the counties (eFigure 1 in the Supplement) and on an incubation period of at least 3 days. 9 Population density of each county was obtained from US Census data and is expressed as number of people per square mile. Log transformation was performed to achieve a normal distribution because of substantial skewedness in density for the largest cities.
It has been demonstrated that humidity and temperature both play a role in the seasonality of influenza; viral transmission is most efficient at lower humidity and temperature. It is proposed that colder, drier air damages respiratory mucosa and thickens secretions, impairing the protective capacity of mucociliary clearance, and that higher humidity physically limits the distance respiratory droplets can travel. 10,11 For this reason, the primary weather variable we used was wet-bulb temperature, a metric that captures the complex thermodynamic relationship of temperature and humidity, has been shown to predict human health events with more precision than temperature and humidity separately, and avoids the associated problem of collinearity. 12,13 Wet-bulb temperatures were obtained from the National Oceanic and Atmospheric Administration Local Climatological Data.

Covariates
County-level covariates that may confound the association between the exposures of interest and R t were considered and included demographic factors (eg, age distribution, insurance status, and socioeconomic status) and health-related factors associated with COVID-19 severity (eg, proportion of individuals with hypertension, obesity, or diabetes and proportion of individuals who smoke). 14 Demographic and health characteristics were abstracted from the US Census, American Community Survey, Behavioral Risk Factor Surveillance System, Esri Business Analyst, and Multi-Resolution Land Characteristics Consortium. [15][16][17][18] From 70 covariates, we examined the correlation between each pair of factors and calculated the variance inflation factor to quantify multicollinearity among variables (eFigure 2 in the Supplement). Among highly correlated variables, we chose covariates based on their potential association with viral transmission and/or the probability of an individual with infection becoming symptomatic and obtaining a diagnostic test. The final covariates included proportion of residents older than 65 years, with incomes less than 200% of the poverty level, and with diabetes. Variables for obesity, smoking, and uninsured population were removed because of collinearity with other variables.

Statistical Analysis
Our analysis followed a 2-step procedure. First, we calculated the R t for SARS-CoV-2 using the method of Cori et al 1 with a moving average window of 3 days. This method has been applied in the dynamic estimation of R t from Wuhan, China. 19 The generation time of SARS-CoV-2 was assumed to follow a γ distribution, with mean (SD) of 7.5 (3.4) days according to a previous epidemiological survey of the first 425 cases in Wuhan, China. 9 In the early days of a county's outbreak, when the ratio of cases to tests was unstable, kernel smoothing with a box kernel and bandwidth of 7 days was performed to account for the likelihood that cases from prior days would accumulate when testing capacity increased. 20, 21 We stopped smoothing 1 day after the county's ratio of daily cases to tests first fell in the interval of 5% to 90% or after March 20, 2020, whichever came first, to avoid overprocessing the data.
Next, we fit a hierarchical linear mixed-effects model with random intercepts for each county and metropolitan area to evaluate the association between exposures and R t after a log transformation, adjusting for covariates. Population density was standardized after log transformation because of the highly skewed distribution. Temperature associations were estimated using a distributed lag nonlinear model, which considers bidimensional exposure-lag response associations between wet-bulb temperature and log(R t ). [22][23][24] We considered a lag period of 4 days to 14 days before case identification to reflect the incubation period of SARS-CoV-2 and to reduce bias introduced by daily weather affecting an individual's decision to seek a test. The final cross-basis term included natural cubic splines defined by 3 internal knots at the 10th, 75th, and 90th percentile of temperature ranges observed during the period, corresponding to 1°C, 13°C, and 19°C, and 2 knots in the lag dimension at 7 and 11 days. The number and placement of spline knots were based on an a priori assumption of a relatively simple association between temperature and SARS-CoV-2 transmission and on minimization of the Akaike information criterion. The temperature knots permitted flexibility in the model nonlinearity at higher and lower temperatures, within the moderate range used in this study. 25 We included an interaction term between population density and temperature, assuming that temperatures would influence transmissibility differently in densely populated counties. 26 The relative change in R t was expressed as the cumulative exposure response relative to 11°C. Interactions between population density and social distancing were included in the linear mixed-effects model, given the hypothesis that the association of social distancing with the R t might be greater in highly dense areas. We also controlled for potential time effect using a cubic polynomial of days in outbreak. Statistical significance of the associations was determined using the maximum likelihood ratio test at the nominal level of P < .05. All tests were 2-tailed.
We ran 3 sensitivity analyses. First, the model was re-estimated every 2 weeks (a total of 3 times) during a period of 1 month, checking for stability in estimates of associations for primary covariates. Second, the model fit was evaluated by calculating in-sample R 2 in a randomly selected 70% of counties over 100 replicates. Finally, to address concerns regarding potential bias owing to the exclusion of counties with later outbreaks or with less overall population density, we relaxed our inclusion criteria to permit counties with active outbreaks and a total population of at least 100 000 residents (as opposed to having cities with at least 100 000 residents), and we re-estimated the associations of social distancing and wet-bulb temperatures with R t during the study period. Given concerns regarding limited representation of temperatures at the upper and lower ranges across counties of different population density, we did not include interaction terms between population density and temperature. Analyses were performed with R version 3.6.0 (R Project for Statistical Computing) using the EpiEstim and dlnm packages. 27

Results
Geographic locations and characteristics of the 211 counties are demonstrated in Figure 1 and Table 1. We estimated R t over a total of 6588 county-days for the 211 counties; 17 outlier county days were removed because of an extremely low estimate of R t at low case thresholds (ie, R t < 0.05). The estimated R t by county varied greatly, but it tended to be highest earlier the epidemic, reaching a peak R t of 7.8 before declining later in the period (eFigure 1 in the Supplement). Adjusting for countylevel covariates, social distancing, population density, and temperature were associated with R t ( Table 2). The estimated R t in the context of a 50% decrease in visits to nonessential businesses was 54% (95% CI, 51%-57%; P < .001) of the R t in the setting of normal visit intensity, corresponding to a 46% decrease in the overall R t . Compared with counties in the bottom quartile of population density, the 21 counties in the top decile of density had a 15% increase (95% CI, 9%-22%; P < .001) in relative R t .
The nonlinear association of lagged temperature between 4 and 14 days is reported in Table 2 and Figure 2. Compared with the minimum estimated R t at 11°C, relative R t increased across the coldest temperatures to a relative R t at 0°C of 2.13 (95% CI, 1.89-2.40). A smaller peak of the relative R t to 1.61 (95% CI, 1.41-1.84) was estimated at 20°C, before declining again at higher temperatures.
These findings were robust to the addition of 183 less densely populated counties with at least 100 000 residents (eTable and eFigure 3 in the Supplement).
The standardized association of social distancing, population density, and temperature on R t , appears in Figure 3. Assuming social distancing of 35% (ie, halfway between estimates during the US shelter-in-place phase and normal activity), 2 counties (0.9%) were estimated to have a R t less than 1.0 at 2°C, and 114 counties (54.0%) were estimated to have a R t of less than 1.0 at 11°C. When visits to nonessential businesses were reduced to the national mean of 70%, the number of counties estimated to have R t less than 1.0 increased to 63 (29.9%) and 202 (95.7%) at 2°C and 11°C, respectively. At this 70% reduction in visits to nonessential businesses, 28 and 52 of 53 counties

Discussion
In this analysis of 211 US counties, change in social distancing, population density, and wet-bulb daily temperature were associated with the rate of SARS-CoV-2 transmission within a county, as measured by estimated R t . Our analysis indicates that of these 3 factors, implementation of social distancing has been the most significant in reducing transmission. In addition, the mitigating association of increased social distancing and moderate increases in wet-bulb daily temperature were most Abbreviations: NA, not applicable; R t , instantaneous reproduction number.
a Estimates and variation obtained through mixed-effects linear models using a log transformed R t , log population density, and distributed lag nonlinear models to estimate temperature effects. Postestimation was performed to convert variables into meaningful units of change. Ratios of R t are compared with the reference groups, adjusting for proportion of residents older than 65 years, with incomes less than 200% of the poverty level, and with diabetes. Marginal R 2 was 0.50; conditional R 2 was 0.61. b Visits to nonessential businesses obtained from Unacast. The referent value was the average visits to nonessential business before March 9, 2020. c Population density was categorized at the 25th, 50th, 75th, and 90th percentiles, which corresponded to 471, 1022, 1846, and 3951 people per square mile. Cumulative exposure-response association between mean daily wet-bulb temperatures and instantaneous reproduction number using a lag period of 4 to 14 days before case identification. The line represents the estimated instantaneous reproduction number at each point along the temperature range compared with 11°C . The shaded areas represent the 95% CIs. The wet-bulb temperature range for the counties included in the analysis was −9°C to 25°C.

JAMA Network Open | Infectious Diseases
Social Distancing, Population Density, Temperature, and the Reproduction Number of SARS-CoV-2 dramatic in counties with higher population density, which had high R t values, consistent with higher R estimates from around the world. 28 The underlying mechanism for the associations of social distancing and population density on the estimated R t for SARS-CoV-2 is likely associated with increased droplet transmission and potentially airborne transmission when individuals are in closer proximity to each other. 29,30 However, the association of population density with COVID-19 outcomes may not be limited to transmission. The densest counties also had the highest number of deaths per 100 000 people. The association of more severe disease in higher density areas is hypothesized to be associated with the inoculum effect. The inoculum effect suggests that individuals exposed to a higher viral load at the time of infection will have more severe illness; this is supported by epidemiologic studies for other viruses, 31,32 in particular for SARS-CoV-1. [33][34][35] These data support the concept that during a pandemic, people living in highly dense counties are more likely to transmit SARS-CoV-2 and to be exposed to higher inoculums of SARS-CoV-2. This translates not only to more cases of COVID-19 but also to a higher case fatality rate. Data assessing the inoculum effect for SARS-CoV-2 are needed to confirm this hypothesis.
Our analysis, which used well-established methods developed to examine the association of temperature with human health, is also among the first to consider the association of temperature and humidity with SARS-CoV-2 transmission. 22,25 We found that combined temperature and humidity associations (proxied by wet-bulb temperatures) were nonlinear with the R t and insufficient alone to mitigate R t values below 1 in the absence of considerable social distancing. The nonlinear associations we observed were such that R t ratios decreased, as hypothesized, when wet-bulb temperatures increased to 11°C. Beyond 11°C, there was a modest increase in relative R t ratios before the ratios began declining again at higher temperatures.
The nonlinear associations we observed, particularly within the temperate range, are consistent with the inverse relationship of temperature and transmission in animal models for influenza and other coronaviruses. 10,36 Those studies also found that higher humidity was associated with increased viability and transmission of influenza through fomites, even as aerosol transmission was mitigated with warming temperatures. If SARS-CoV-2 has similar properties to influenza at higher humidity, it could explain our association of wet-bulb temperatures higher than 11°C with a modest increase in R t.  Beyond the direct association of temperature and humidity with virus stability, viability, and propagation, it is also possible that changes in temperature and humidity alter human activity. For example, at higher temperatures people may be more likely to congregate in public locations, such as beaches and festivals. Therefore, an increase in R t at higher temperatures could also be explained in part by a decrease in social distancing not measured by our social distancing variable. Regardless of the etiology, we remain cautious in interpreting the association of higher temperature and humidity with R t beyond the temperate range we observed in this analysis. The coming months will allow for additional assessment of R t during more prolonged periods of higher temperature and humidity.
These additional observations will help to confirm or refute the attenuated associations at higher temperatures observed in this study.
To date, projections of COVID-19 outcomes have considered large areas, such as countries, provinces, and US states. These models have provided useful information to set expectations for deaths in the United States, to identify potential gaps in health services, such as intensive care unit beds and ventilators, and to guide initial wide-ranging social distancing recommendations from federal and state governments. Our county-level analysis has allowed us to better examine relevant contributions of social distancing, population density, and seasonal weather changes on a given county's R t . This approach gives valuable information on risk of transmission to inform area-specific public policy decisions. It will be important to examine whether the introduction of these associations to models can accurately estimate the likelihood of viral transmission in the future, given that these factors will continue to change. To the degree that predictive modeling efforts are successful, it may demonstrate some validity of using this approach, which incorporates random effects for counties alongside seasonal changes to inform future epidemics.

Limitations
There are always limitations in observational studies. Generalizability remains a concern, particularly given our focus on larger counties. The 45% of US residents not captured in our analysis were residing in smaller, rural counties, and as such, our models are not applicable to these areas. It is reassuring that the inclusion of an additional 183 more geographically dispersed and less densely populated counties replicated our findings. Second, temperature associations we observed might have been confounded by time period in the analysis, given that outbreaks occurred during spring in parallel with changing weather. However, the addition of time to the model did not appreciably change our results. Third, increases in testing capacity might have biased the models by inflating the total cases reported within each county. It is possible that differences in diagnostic test availability could contribute to the variation detected by the random effects across counties. However, our estimate of R t depended on the rate of change of cases, rather than on the absolute number of cases, and during the period in which this analysis was conducted, test positivity rates, a proxy for testing capacity, were flat. 37 Furthermore, we smoothed early outbreak case incidences to account for early limited access to diagnostic tests. We intentionally did not include testing capacity as a covariate, so as not to overfit the model (eg, controlling for a factor that was also associated with rising viral transmission itself). Fourth, as the random county and metropolitan area intercepts explained additional variation, there are likely other unmeasured county factors that we did not capture. Our proxy for social distancing used cellular telephone records and may not have captured all movement and gathering within a county; it requires further validation as a proxy for the distancing associations we were measuring. Other unmeasured factors might include commuter automobile traffic, public transportation usage, and domestic and international flights, which had decreased during the study period. It is clear that early local epidemics were seeded by international travel that contributed to early transmission in some locations. 38 Further investigation will be needed as communities reopen to examine the association of these additional time-varying factors with risk of SARS-CoV-2 transmission.