Key PointsQuestion
Are county-level cell phone location data associated with the rate of change of coronavirus disease 2019 (COVID-19) cases?
Finding
In this cohort study, greater reductions in cell phone activity in the workplace, transit stations, and retail locations and greater increases in activity at the residence were associated with a lower incidence of COVID-19 cases 5, 10, and 15 days later.
Meaning
Using county-level cell phone location data may aid in assessing activities that may presage increases or decreases in COVID-19 cases.
Importance
It is unknown how well cell phone location data portray social distancing strategies or if they are associated with the incidence of coronavirus disease 2019 (COVID-19) cases in a particular geographical area.
Objective
To determine if cell phone location data are associated with the rate of change in new COVID-19 cases by county across the US.
Design, Setting, and Participants
This cohort study incorporated publicly available county-level daily COVID-19 case data from January 22, 2020, to May 11, 2020, and county-level daily cell phone location data made publicly available by Google. It examined the daily cases of COVID-19 per capita and daily estimates of cell phone activity compared with the baseline (where baseline was defined as the median value for that day of the week from a 5-week period between January 3 and February 6, 2020). All days and counties with available data after the initiation of stay-at-home orders for each state were included.
Exposures
The primary exposure was cell phone activity compared with baseline for each day and each county in different categories of place.
Main Outcomes and Measures
The primary outcome was the percentage change in COVID-19 cases 5 days from the exposure date.
Results
Between 949 and 2740 US counties and between 22 124 and 83 745 daily observations were studied depending on the availability of cell phone data for that county and day. Marked changes in cell phone activity occurred around the time stay-at-home orders were issued by various states. Counties with higher per-capita cases (per 100 000 population) showed greater reductions in cell phone activity at the workplace (β, −0.002; 95% CI, −0.003 to −0.001; P < 0.001), areas classified as retail (β, −0.008; 95% CI, −0.011 to −0.005; P < 0.001) and grocery stores (β, −0.006; 95% CI, −0.007 to −0.004; P < 0.001), and transit stations (β, −0.003, 95% CI, −0.005 to −0.002; P < 0.001), and greater increase in activity at the place of residence (β, 0.002; 95% CI, 0.001-0.002; P < 0.001). Adjusting for county-level and state-level characteristics, counties with the greatest decline in workplace activity, transit stations, and retail activity and the greatest increases in time spent at residential places had lower percentage growth in cases at 5, 10, and 15 days. For example, counties in the lowest quartile of retail activity had a 45.5% lower growth in cases at 15 days compared with the highest quartile (SD, 37.4%-53.5%; P < .001).
Conclusions and Relevance
Our findings support the hypothesis that greater reductions in cell phone activity in the workplace and retail locations, and greater increases in activity at the residence, are associated with lesser growth in COVID-19 cases. These data provide support for the value of monitoring cell phone location data to anticipate future trends of the pandemic.
The first case of coronavirus disease 2019 (COVID-19) was reported from Wuhan, China, in December 2019. The city of Wuhan was placed under lockdown. Based on available reports, the rate of case increase started to decline within 2 to 3 weeks. Outbreaks in other major Chinese cities were also successfully contained with such lockdowns.1
COVID-19 subsequently spread to other parts of the world.2 The US federal government declared a state of emergency in early 2020. Stay-at-home measures were advised to a different degree and at different times across the US. As of May 25, 2020, the US was the nation with the highest number of reported COVID-19 cases and deaths worldwide. It remains unclear how stay-at-home orders have affected human behavior in the US and what might affect the response to them.
In 2019, nearly 81% of US adults owned a smartphone.3 Smartphones are capable of transmitting location data, and many US prediction models are using, or plan to use, cell phone location data as a tool to help summarize human behavior as a means to understand disease spread and inform policy. However, evidence to support cell phone location data as a marker of decline in growth rate of cases is lacking. Initial work described associations between aggregate location data and social distancing; however, to our knowledge, a study of associations with disease growth rates while considering regional confounding factors is lacking.4,5
We evaluated associations between county-specific and state-specific characteristics and the percentage change in each county in cell phone activity in multiple categories of place (ie, workplace, residence) during the period after stay-at-home measures were advised. We estimated the associations between cell phone activity on a given day and the rate of growth in new cases 5, 10, and 15 days later. We hypothesized that counties that demonstrated a greater decline in cell phone activity in the workplace and retail locations and a greater increase in residential places would show a slower percentage increase in cases 5 days later.
We conducted a cohort study that incorporated publicly available data, including daily reported cases of COVID-19 from January 22, 2020, to May 11, 2020. We accessed the number of new daily reported COVID-19 cases per capita in each US county from the Coronavirus COVID-19 Global Cases dashboard hosted by the Center for Systems Science and Engineering at Johns Hopkins University (Baltimore, Maryland).6
Google has made available aggregated and anonymized cell phone location data from users with a Google account on their cell phone who opted in to have their location data available to Google Location History. These data are reported by Google as a percentage change from baseline activity in which the baseline is defined as the median value for that day of the week from a 5-week period between January 3 and February 6, 2020. These data were analyzed by individual county, and residence location data represent the percentage change in the time spent at the residence compared with the baseline. Other location data (eg, workplace, retail) aim to represent the change in total visitors to such locations.7 We did not perform any calibration for individual county-level differences that might affect the accuracy of cell phone activity. For example, counties may vary by the association of seasonal changes with activity in certain categories of place and regarding the adequacy of representation of places of interest within the region.
Counties were defined as rural based on data from the National Center for Health Statistics at the US Centers for Disease Control and Prevention (CDC).8 State-level data, including the proportion of self-reported race, age, and sex, were collected from 2018 American Community Survey from the US Census Bureau.9 Population density was calculated from data from the 2019 US Census Bureau population estimates.10 State obesity rates were obtained from the CDC’s Behavioral Risk Factor Surveillance System.11 Medical insurance data, health care resources, state spending on health care, median family income, and the percentage of people living in poverty were obtained from census data, publicly available data from other governmental bureaus, including the US Department of Education, and other publicly available sources, including the Kaiser Family Foundation database.12-14 The number of tests performed by each state was obtained from the COVID Tracking Project.15
The dates of stay-at-home orders for each state were determined from state health department websites. States that did not have formal stay-at-home orders (4 of 51 [8%]) were assumed to be following similar policy after April 7, 2020, based on the presence of policies across the US. This work was not considered human participants research by the University of Pennsylvania institutional review board and thus was exempt from approval.
Cell phone use in 6 categories of places (workplace, retail, transit stations, grocery stores, parks, and residences) was visualized over time by calculating the cross medians and then using the cross medians as knots to fit a cubic spline. The spline was graphed as a line plot to illustrate the changes in behavior throughout this period. County-level and state-level factors were explored to identify factors associated with higher and lower levels of cell phone activity in different categories of place compared with the baseline. Multivariable linear regression incorporating generalized estimating equations (exchangeable correlation matrices and robust estimators) was used to identify county-level and state-level factors independently associated with the change cell phone activity compared with the baseline in each category of place, clustering on county to account for multiple observations on different days. Counties and days with insufficient cell phone location data were excluded. This accounts for the difference in observations studied in each category of place. This modeling was used to inform the selection of confounders to be included in subsequent analyses. Quartiles of cell phone activity compared with the baseline in different categories of place were defined for each county in the period after the stay-at-home orders were issued in that state. The percentage change in cell phone activity compared with baseline within each quartile was summarized, with the highest quartile representing the greatest activity in that category of place and the lowest quartile representing the lowest level of activity.
The primary outcome for the study was the percentage change in new cases per capita 5 days from the current date. Only days during which the case rate was at least 0.1 per 100 000 residents were analyzed. A lag of 5 days was based on prior publications suggesting this as the median incubation period.16 Additional analyses also explored a lag of 10 and 15 days. The outcome was log-adjusted to approximate a normal distribution for linear models. Multivariable linear regression models incorporating generalized estimating equations were used to determine associations between cell phone activity in a single category of place on a given day and the percentage of growth in cases 5 days later, adjusting for a number of county-level and state-level confounders and clustering on county. To account for nonlinear associations with baseline case rates and time from the initiation of the stay-at-home orders, we included squared terms for these variables. Outcomes were estimated from regression models at 5, 10, and 15 days across quartiles, exponentiated, and then the differences in percentage growth by quartile were graphed. To determine the potential association of state testing capacity with estimates, sensitivity analyses were performed, adjusting for the total number of tests performed per capita on that day in that state. To test for an improvement in model fit with inclusion of the cell phone activity measures, we generated the QIC for each model to assess the model before and after the inclusion of each cell phone activity measure. Analyses were conducted using Stata, version 14.4 (StataCorp), and statistical significance was set at α = .05.
Factors Associated With Cell Phone Activity in Different Places
Figure 1 represents activity in the previously described 6 categories of place before and after stay-at-home orders were issued in individual states. Marked changes in activities began shortly before stay-at-home orders were initiated, including a reduction in activity in locations outside the residence and an increase in activity inside the residence.
Several county-level characteristics were associated with changes in activity after the initiation of the stay-at-home orders. For example, counties and days with higher cases rates (per 100 000) experienced greater reductions in cell phone activity at the workplace, retail stores, grocery stores, and transit activity and a greater increase in activity at the place of residence on the same day (Table 1; a positive coefficient represents greater activity in that category of place). As time from the stay-at-home orders increased, there was an increase in cell phone activity at the workplace, transit, retail, and grocery locations and a decrease in the activity at residence. For example, on average, there was a modest increase in retail activity of approximately 0.5% (95% CI, 0.48%-0.53%) per day from the time of the initial stay-at-home order. Rural counties demonstrated more modest percent reduction in cell phone activity at the workplace (5.7%), transit (5.1%), retail (2.9%) and grocery (3.4%) locations compared with urban counties even after adjusting for county population, case rates, and other state-level characteristics. Rural counties demonstrated greater reductions in activity in parks (−25.3%) and less of an increase at the place of residence (−2.2%).
State-level factors were also associated with changes in cell phone activity compared with the baseline in different categories of place. For example, the reduction in workplace activity was more modest if the county was in a state that had a greater proportion of people older than 65 years, greater proportion of children younger than 18 years, higher proportion of African American individuals, higher proportion of people living in poverty, higher graduation rates, more hospital beds per capita, and lower population density.
Greater increases from the baseline in activity at places of residence were observed in states with a lower proportion of older adults, a higher population density, higher gross domestic product (GDP), lower rates of individuals without insurance, higher proportion of GDP spent on health care, and a lower proportion of people living in poverty. Factors associated with activity at retail and transit locations, grocery stores, and parks are summarized in Table 1.
Associations Between Place-Specific Cell Phone Activity and Case Growth Rates
While most counties demonstrated changes in activity after the initiation of the stay-at-home orders, there was some variability (Table 2). Mean (SD) activity in the highest quartile for workplace activity decreased by 25% (4.8%) compared with baseline activity (ie, a reduction of 25% from baseline activity). In contrast, the mean (SD) activity in the lowest quartile for workplace activity decreased 51% (6.3%) from baseline, suggesting a more substantial change. The range between the highest and lowest quartile for transit activity was greater, with the highest quartile seeing a mean (SD) reduction of only 6.5% (9.6%) and the lowest quartile seeing a reduction of 58.5% (10.1%).
Counties with the least decline in workplace, transit station, and retail activity and the least increase in time spent at the residence had a higher percentage growth in cases at 5, 10, and 15 days (Figure 2; eTable 1 in the Supplement). For example, counties in the highest quartile of retail activity had a 44% higher growth rate at 15 days compared with those in the lowest quartile. The inclusion of cell phone activity into regression models improved the model fit demonstrated by a reduction in the QIC (eTable 1 in the Supplement).
Conversely, greater residential activity was associated with lower growth rates at 5, 10, and 15 days. Counties in the highest quartile of residential activity had a 19% lower growth rate at 15 days compared with counties in the lowest quartile. Activity in parks and grocery stores showed more modest associations with the change in the rate of new cases.
A higher percentage growth in new cases at 5 and 15 days was observed in counties that were more urban and highly populated. Higher growth was also was associated with a lower baseline case rate, shorter time from initiation of the stay-at-home order, higher state population density, higher median family income, higher poverty rates, higher obesity rates, and lower rates of individuals lacking insurance (eTable 2 in the Supplement). Adjusting for the total number of tests performed by state per capita per day had no association with the estimates (data not shown).
We observed that urban counties with higher populations and a higher number of cases per unit population saw a larger relative decline in activity outside of place of residence and a greater increase in residential activity after the institution of stay-at-home orders. An increase in workplace activity and reduction in residential activity over time since the institution of stay-at-home orders suggests waning adherence to the orders over time. Additionally, this study demonstrated that cell phone activity in different counties is associated with case growth rates at 5, 10, and 15 days later, suggesting that these measures may be useful in monitoring and identifying areas at risk for more rapid rates of growth in new cases during the epidemic.
Some of the factors associated with changes in cell phone activity are intuitive. For example, reductions in workplace activity and increases in residential activity were higher in counties and on days where/when there was a greater number of new cases, and thus there was likely a higher perceived risk of infection. Counties with a high rate of cases were likely to also have more restrictions in place. Even with adjustment for case rates, rural and less populated counties demonstrated a more modest change in baseline activities. This may be because of a perceived lower risk of infection in rural settings, differences in normal activities, and possibly individual beliefs and preferences.
Several state characteristics were associated with cell phone activity in this study. While it is difficult to infer causation, certain associations are notable. For example, states with lower poverty rates; greater population density; a lower proportion of older residents, children, and African American residents; and a higher percentage of their GDP spent on health care demonstrated greater declines in workplace activity. We might hypothesize that individuals in higher-income areas have more resources to enable them to stay at home or that these states house jobs that provide more opportunity to work at home. However, this observation may be confounded because early outbreaks were reported in the metropolitan areas of New York, New Jersey, and California.
Perhaps the most important observation of this study was that a decrease in activity at the workplace, transit stations, and retail locations and an increase in activity at the place of residence was associated with a significant decline in COVID-19 cases at 5, 10, and 15 days. The immediate implication of these results is that it supports the use of cell phone data as a measure of adherence to stay-at-home advisories and may act as a prognostic measure that may help to identify areas at greatest risk for more rapid growth of the epidemic. Perhaps reassuringly, activity at grocery stores and in areas classified as parks was not strongly associated with rates of growth in cases. However, it is difficult to assess the direct effect of these individual activities.
While this study was not designed with the goal of identifying other factors associated with changes in growth rates, we did observe associations with several county-level and state-level factors. For example, higher percentage growth rates were observed in counties that had lower baseline case rates and were more urban. State factors associated with higher growth rates included higher population density, higher obesity rates, higher poverty rates, higher proportion of older adults, and lower rates of people without insurance. While it is difficult to imply causality for these individual associations, the findings support that there are county-level and state-level factors that are likely to affect growth rates. The associations between cell phone activity and growth rates were independent of these measured county-level and state-level characteristics.
A limitation of this study is the potential for selection bias and the ability to only study counties in which cell phone data were available. For example, Google derives data from users who have opted-in to have their information available through Google Location History, and days may be missing when there was not sufficient data to preserve privacy and for other reasons not specified.
The association that this type of selection bias might have with associations with growth rates is not predictable. While our study was able to adjust for many confounders at the county and state level, there may be other differences at the county level, including other precautionary measures and restrictions, that were occurring at the same time as social distancing, such as more personal protective equipment use in health care settings. Some states also advised or mandated mask wearing during the study period. In addition, because no county-level calibration was performed for cell phone data, misclassification of activity in particular categories of place is possible, although we might expect this misclassification to be largely nondifferential. The reporting of cases of COVID-19 may vary by county, and other mechanisms of data collection and data dashboards could conceivably yield different results. Finally, while the study validates the use of cell phone data to follow the effective adherence to stay-at-home advisories in a county, it does not provide a clear sense of risk to an individual who is exposed to one of these activities. In addition, because all counties were under some sort of restriction, this study does not speak to the overall association of the stay-at-home mandates with growth rates of the epidemic, but rather perhaps the effect of greater and lesser adherence to these orders.
Future study may be warranted to better understand the associations of cell phone data with growth in cases during the period after the loosening of stay-at-home restrictions. A larger question remains as to whether collection and sharing of individual data are valuable for public health initiatives aimed to halt the spread of disease. The potential benefits of such data would need to be weighed against the risks of violation of an individual’s privacy.
Our findings support the hypothesis that cell phone location data for a given day, at the workplace, in retail locations, and at the residence are associated with the rate of growth in cases 5, 10, and 15 days later. These data help to demonstrate the use of cell phone location data as a way to monitor the adherence to stay-at-home practices and potential to predict the future trends of the pandemic in association with the adequacy of social distancing across the US.
Accepted for Publication: July 8, 2020.
Corresponding Authors: Shiv T. Sehra, MD, Mount Auburn Hospital, 330 Mount Auburn St, Cambridge, MA 02138 (ssehra1@mah.harvard.edu); Joshua F. Baker MD, MSCE, Hospital of the University of Pennsylvania, 3400 Spruce St, 5 White Bldg, Philadelphia, PA 19104 (joshua.baker@pennmedicine.upenn.edu).
Published Online: August 31, 2020. doi:10.1001/jamainternmed.2020.4288
Author Contributions: Drs Sehra and Baker had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Sehra, Wiebe, Baker.
Acquisition, analysis, or interpretation of data: Sehra, George, Fundin, Baker.
Drafting of the manuscript: Sehra, George, Baker.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Baker.
Administrative, technical, or material support: Sehra, Fundin.
Supervision: Sehra, Wiebe.
Conflict of Interest Disclosures: Dr George reported grants from Bristol-Myers Squibb and personal fees from AbbVie outside the submitted work. Dr Baker reported personal fees from Gilead and Bristol-Myers Squibb outside the submitted work. No other disclosures were reported.
Funding/Support: Dr Baker receives funding through a Veterans Affairs Clinical Science Research & Development Merit Award (I01 CX001703).
Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Disclaimer: The contents of this work do not represent the views of the US Department of the Veterans Affairs or the United States government.
Additional Contributions: We thank Criswell Lavery, BA, University of Pennsylvania, for her help with acquiring data. She did not receive any monetary compensation for her help with this article.
1.Lau
H, Khosrawipour
V, Kocbach
P,
et al. The positive impact of lockdown in Wuhan on containing the COVID-19 outbreak in China.
J Travel Med. 2020;27(3):taaa037. doi:
10.1093/jtm/taaa037PubMedGoogle Scholar 2.Lai
CC, Shih
TP, Ko
WC, Tang
HJ, Hsueh
PR. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): The epidemic and the challenges.
Int J Antimicrob Agents. 2020;55(3):105924. doi:
10.1016/j.ijantimicag.2020.105924PubMedGoogle Scholar 5.Gao
S, Rao
J, Kang
Y, Liang
Y, Kruse
J. Mapping county-level mobility pattern changes in the United States in response to COVID-19. Accessed June 22, 2020.
https://arxiv.org/abs/2004.04544 16.Lauer
SA, Grantz
KH, Bi
Q,
et al. The Incubation period of coronavirus Disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application.
Ann Intern Med. 2020;172(9):577-582. doi:
10.7326/M20-0504PubMedGoogle ScholarCrossref