The internet provides ready access to information related to coronavirus disease 2019 (COVID-19). With a simple web search, individuals can find symptom checkers, locate testing sites, and get tips for keeping themselves safe.
However, online information seeking related to COVID-19 may carry privacy risks. Prior research has shown that web pages visited by individuals seeking health information frequently contain code that initiates data transfers to third parties, such as online advertisers.1 These transfers often include URLs of visited pages and users’ IP addresses. When third parties have code on multiple web pages, they can build detailed profiles of specific individuals’ browsing behaviors and interests. This practice, known as “web tracking,” can reveal sensitive information about individuals’ health conditions and concerns to parties who wish to profit from it.1
To better understand the privacy risks of online information seeking related to COVID-19, we assessed the prevalence and characteristics of web tracking on COVID-19–related web pages.
To identify web pages likely to be visited by individuals seeking COVID-19–related information, we used Google Trends to identify the top 25 search queries related to COVID and coronavirus in the US on May 15, 2020. We retrieved the top 20 URLs for each query using nonpersonalized Google searches.
We visited each unique web page using webXray, an automated tool that detects third-party tracking on websites.1 For each web page, we recorded data requests from third-party domains—that is, domains other than that of the website being visited. These requests are significant because they initiate data transfers from a user’s computer to third parties. We also recorded the presence of third-party cookies, data stored on a user’s computer, which often serve as persistent identifiers that allow users to be tracked across multiple websites.
We calculated the percentages and 95% confidence intervals of web pages that included any third-party data request or any third-party cookie and the median number of third-party data requests and third-party cookies per page, overall and by website type (categorized by top-level domain). We compared results across website types. Using webXray’s database of corporate owners of third-party domains, we calculated the most prevalent tracking entities. Analysis was conducted in R version 4.0.2 (R Foundation).
Overall, 535 of 538 (99%; 95% CI, 98%-100%) unique web pages included a third-party data request, with no significant differences by website type, while 477 (89%; 95% CI, 86%-91%) included a third-party cookie (Table 1). Compared with commercial web pages, third-party cookies were slightly less common, although still highly prevalent, among government and academic web pages. However, the median numbers of third-party data requests and third-party cookies per page were both higher on commercial web pages (77 requests; 130 cookies) than on government (8 requests; 4 cookies), nonprofit (16 requests; 7 cookies), or academic (14 requests; 10 cookies) web pages.
Most (95%; 95% CI, 93%-97%) web pages included a data request from a third-party domain owned by Google, while 7 other companies received data from at least 40% of web pages studied (Table 2).
This study found that 99% of COVID-19–related web pages included a third-party data request, and 89% included a third-party cookie. By comparison, a prior study of 1 million popular web pages found that 91% included a third-party data request and 70% included a third-party cookie.2
Third-party tracking was pervasive even among government and academic COVID-19–related web pages, on which visitors might reasonably expect greater privacy protections. Decision-makers at these institutions may be unaware of third-party tracking on their websites because they do not realize that tools used to monitor website traffic transmit data to third parties.
This study had limitations. First, only 2 mechanisms of third-party tracking were investigated. Because other means of third-party tracking exist, including some designed to evade automated capture, these findings likely underestimate the extent of third-party tracking. Second, because this study was limited to web pages that appeared in the top 20 results for a given Google query, findings may not generalize to web pages with lower search rankings or searches performed using other search engines.
Amid debate and legislative activity focused on the privacy implications of COVID-19 contact-tracing apps, these findings suggest that attention should also be paid to privacy risks of online information seeking.3
Corresponding Author: Matthew S. McCoy, PhD, Department of Medical Ethics & Health Policy, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Dr, Blockley Hall, Philadelphia, PA 19104 (email@example.com).
Accepted for Publication: August 10, 2020.
Published Online: September 8, 2020. doi:10.1001/jama.2020.16178
Author Contributions: Drs McCoy and Friedman had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: McCoy, Grande, Friedman.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: McCoy.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Libert, Buckler, Friedman.
Administrative, technical, or material support: McCoy, Libert, Grande.
Supervision: McCoy, Friedman.
Conflict of Interest Disclosures: Dr McCoy reported being an uncompensated member of the University of Pennsylvania’s Data Ethics Working Group, funded in part through industry gifts to the university. Dr Libert reported receipt of grants from the Defense Advanced Research Projects Agency, CyLab Security and Privacy Institute, and Carnegie Mellon University and consulting with litigants and regulators on matters related to online privacy. No other disclosures were reported.
T. An automated approach to auditing disclosure of third-party data collection in website privacy policies. In: Proceedings of the 2018 World Wide Web Conference
. International World Wide Web Conferences Steering Committee; 2018:207-216. doi:10.1145/3178876.3186087