[Skip to Navigation]
Comment & Response
April 2017

Uncertainties in Big Data When Using Internet Surveillance Tools and Social Media for Determining Patterns in Disease Incidence—Reply

Author Affiliations
  • 1Francis I. Proctor Foundation, Department of Ophthalmology, University of California, San Francisco, San Francisco
  • 2Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco
JAMA Ophthalmol. 2017;135(4):402-403. doi:10.1001/jamaophthalmol.2017.0140

In Reply Finding imperfections in big data is like “harpooning a blimp—it’s impossible to miss, and every thrust is likely to be fatal.”1 Dr Benke lists reasons why social media may not perfectly represent, for example, the true burden of a disease. There are several reasons to choose from, as the Sommer Editorial2 indicates. Existing streams of data, including Google searches and Tweets used in our report, are neither population-based nor unbiased.3 Even more troubling are the instances in which activity would pass current deep-learning filters but have no relation to the true outcome of interest. Interest in “pink eye” could have risen because of a 2008 movie of that name. No one is saying that increased activity surrounding “smallpox” in January 2005 suggested a true underlying epidemic of the disease.4

Add or change institution