[Skip to Navigation]
Sign In
JAMA Forum Archive, 2012-2019: Health policy commentary from leaders in the field
JAMA Forum

Risk, Benefit, and Fairness in a Big Data World

There is much hope and even more hype that research using big data derived from a large volume of electronic health records, in combination with claims and a variety of health-related sources, can improve the access to, quality of, and cost of health care. The ability of computers to learn from large databases to the point that they can outperform clinicians could result in lower health care costs and improve access and quality. But patients could foil these efforts by refusing to share their data for research if they perceive more harms than benefits.

Image: Aleutie/Getty Images

The major risk to patients is the exposure of their personal health information. Although much of the research can be done with deidentified data, the combining of information into large datasets increases the potential for the data to be reidentified and used in ways patients never would have intended.

Privacy Concerns

Exposure of information is a risk that is hard to quantify, both in how likely it is to occur and in what kind of negative consequences might ensue. Nevertheless, privacy is highly valued, and there is growing concern about the use of health data because of very real risks of discrimination in health insurance, life insurance, and employment. These practices are now largely limited by law, but the laws are being challenged and the sophistication of big data use by commercial entities suggests these concerns are not unfounded.

Current day human subject protections for medical research emerged with the publication of the Belmont Report. It was written in the early 1970s, when the potential benefits and harms of research using large health care–related databases could not have been anticipated. The Belmont Report identified 3 primary principles for ethical use of human subjects for research purposes: beneficence (do what is best for the person), autonomy (respect the individual’s own values and opinions, implemented by informed consent), and justice (ascertain that the risks of the study do not outweigh the benefits). The principles outlined in the report were subsequently codified in law as the standards for informed consent and approval by institutional review boards (IRBs). Institutional review boards are tasked with assessing the “risks” and “benefits” consistent with the beneficence and justice principles.

Research using big data is most similar to, but also distinct from, epidemiology studies. There is less potential for participants in epidemiology studies to directly benefit from the research than for participants in clinical trials in which they might gain timely access to new effective treatments. Participants in epidemiology studies may still derive some benefit by knowing that they have engaged in an altruistic act that can help future patients. While this benefit is minimal, so are the perceived harms, and IRBs often approve epidemiological studies without a high standard of informed consent.

In the case of research with big data, the potential benefits to participants may be even smaller than they are in epidemiological studies. Since patients whose information is included in research using big data are likely to be unaware of how their health care information is being used, they may be deprived of even the altruistic satisfaction that comes with contributing to research.

Potential for Harms

The potential for harms in big data research are also greater than they are in most epidemiological studies. This is related not only to the privacy risk and its ensuing consequences, but also to the potential for study participants to feel exploited.  

The developers of products derived from the use of big data are aiming to sell them for large profits. In fact, much of the research published to date is done by large private data companies such as Google, without any commitment to acknowledge the contributions of study participants or to make the derived products available to the public. Individuals might rightfully feel taken advantage of if they suspect that their health information is being monetized, without their consent, for the financial benefit of private investors.

There is little precedent for considering this kind of “harm,” but it was a central element of the highly visible case of Henrietta Lacks. The patient’s family sued on her behalf for harms it perceived related to her not being a party to the substantial financial benefits derived from her biopsied cervical tissue. While there may be differences in what it means to contribute one’s cells versus one’s health care data, patients can be exploited by the health care research process. This might explain the growing reluctance of many to participate in health care–related research.

If justice is to serve as a defining principle for research that uses big data as it has for other forms of health care–related research, then it may be important to develop a model for distributing benefits in a way that includes study participants. Sharing the financial benefits available from the mining of big data in a fair, respectful, and inclusive way with patients could help to counter-balance perceived harms of data privacy invasion and exploitation. Providing a means for patients to financially benefit from research could also increase their willingness to participate.

To share financial benefits with patients requires resolving some practical matters, including  determining what constitutes fair compensation and establishing a method for distributing the financial benefits. One approach might be to create an upfront license for consenting patients that results in a royalty payment every time their data are included in a research study. Alternatively, consenting patients could be offered an equity distribution for products derived from their data being used to create the product. There are no doubt other options and permutations to consider, but the main point is that if big data research is to conform to the principles that apply to other forms of health care research, it needs to offer greater benefits to participants to offset the potential harms.

About the authors:
Christine Cassel, MD, is the Presidential Chair and Visiting Professor in the Department of Medicine at UCSF, where she is chairing the Ethics and Policy Advisory Committee for big data projects using machine learning to improve quality and safety. She is also working on projects in aging and longevity, the role of technology in health care, and biomedical ethics. From 2016 to  2018, Dr Cassel was Planning Dean for the new Kaiser Permanente School of Medicine. From 2013 to 2016 she was the President and CEO of the National Quality Forum; prior to that, she  served as president and CEO of the American Board of Internal Medicine and the ABIM Foundation. (Image: Kaiser Foundation Health Plan, Inc. and Hospitals)
Andrew Bindman, MD, is professor of medicine and epidemiology & biostatistics based within the Philip R. Lee Institute for Health Policy Studies at the University of California, San Francisco (UCSF). He is a former Director of the Agency for Healthcare Research and Quality and a member of the National Academy of Medicine. (Image: Ted Grudzinski/AMA)