Customize your JAMA Network experience by selecting one or more topics from the list below.
The promise that big data will transform health care has yet to be fulfilled. Even though essentially all medical care now leaves an electronic trail, data are underused as a way to create knowledge about the safety and effectiveness of medical treatments, the generalizability of care practices, and the effects of medical coverage policies. National leaders have called for greater use of electronic data to generate evidence.
For example, the National Institutes of Health’s strategic plan for data science1 anticipates an important role for clinical data, and the 21st Century Cures Act requires that the US Food and Drug Administration (FDA) increase its use of evidence from clinical practice settings. The National Academy of Medicine has proposed a virtual health data trust,2 asserting that clinical data should be a core utility. Yet these calls for increasing data use are occurring even as public concern is rising about secondary use of personal information.3
Fulfilling the promise will be challenging for 3 reasons: confidentiality and proprietary concerns, the cost and work required to make raw data usable for analyses, and the need to create incentives for data holders that outweigh the disadvantages. Patients and health systems—insurers, hospitals, medical groups, and integrated delivery systems that generate and hold data—have legitimate concerns that sharing data may lead to unintended uses by unauthorized parties. No data sharing agreement can guarantee against this outcome. Related to this, organizations may wish to avoid disclosing data that may be valuable to competitors. In addition, clinical and administrative data are rarely usable for analyses of large populations without extensive curation by individuals with deep knowledge of local clinical care and informatics systems.
It is often unnecessary to share individual-level data. Instead, health systems can create data enclaves, enabling the sharing of information derived from the data rather than sharing the actual data. The Centers for Medicare & Medicaid Services Virtual Research Data Center is an example within which investigators can conduct approved analyses of Medicare and some Medicaid claims without ever taking possession of the data.4 Many enclaves require that employees of the health system, rather than external investigators, conduct analyses (Figure). All enclaves release only results.
Data enclaves address 2 major barriers to data sharing. First, they allow health systems to protect patients’ interests and their own by maintaining physical and operational control, permitting the systems to opt in or out of proposed analyses. Second, they obviate the need to build new secure systems. Data enclaves enable involved parties to minimize the risks of data sharing and focus on the benefits of proposed analyses, rather than on negotiations about data reuse, security provisions, and related considerations.
Multiple enclaves from different health systems can be linked to create distributed data networks in which the systems format their data identically, then execute identical analytic programs on their own data. Typically, data enclaves in a network need only share aggregate results, such as counts or coefficients. Examples of distributed data networks (and their sponsors) include the Vaccine Safety Datalink (Centers for Disease Control and Prevention), Sentinel System (FDA), Cancer Research Network (National Cancer Institute), Mental Health Research Network (National Institute of Mental Health), Addiction Research Network (National Institute on Drug Abuse), Cardiovascular Research Network (National Heart, Lung, and Blood Institute), and the Patient-Centered Outcomes Research Network (Patient-Centered Outcomes Research Institute).
Some of these data networks include the records of more than 100 million individuals. They have enabled important collaborations that would have been difficult or impossible if comprehensive individual-level data sets had to be shared. For example, distributed data from 3 national insurers was used to assess rotavirus vaccination in more than 500 000 infants, identifying a 1.5 per 100 000 population excess risk of subsequent intussusception.5
The need for extensive and costly data curation to make raw clinical and administrative data useful for population-level analyses is widely underappreciated. Data curation requires system- and question-specific knowledge. Activities range from simple tasks such as redacting patients’ names from scanned images to complex work such as detecting data anomalies that can lead to erroneous conclusions.
For example, in a multisite study of colorectal cancer, one health system initially appeared to have superior survival. Detailed investigation determined that this difference was an artifact related to censoring patients’ coverage at death, making them appear to have disenrolled from the study when they had actually died.
Data enclaves do not solve the need to support the financial and opportunity costs of data curation. Each health system has a limited number of individuals with appropriate expertise for data curation, and curating data may detract from their primary responsibilities. System-based researchers with the requisite skills may consider expanding data access to outsiders as a competitive threat.
Given these challenges, prospective external users need to make participation worthwhile to health systems. A prime benefit for health systems is the clinical and methodological expertise of collaborators that enhance the systems’ ability to use their own data.
For example, Health Care Systems Research Network groups have aligned their systems’ electronic data using a common data model with a standard format that enables multiple users to readily interpret data variables, conduct quality assurance, and share analytic code. Common data models provide platforms for creating useful clinical and operational tools. Systems that allow queries built on common data models have enabled medical groups to conduct rapid data analyses for quality improvement and to automatically fulfill public health reporting requirements.6,7
A key complementary approach to creating incentives for maintaining data enclaves will be to make information sharing a priority for patients, clinicians, and purchasers of health care. For example, these constituents could advocate for multicenter research on rare diseases or studies of alternative management approaches for common conditions such as hypertension or asthma. Health systems may be more willing to participate in information sharing if their constituents perceive clear benefits.
Advocates of better access to clinical data will engage data holders if their proposed work creates information that directly benefits health system leaders, clinicians, and patients. Wider adoption of data enclaves and distributed data networks could contribute to success. Efforts are more likely to succeed when they are collaborative, engage health system–based experts in interpreting the data, and yield clinical and operational enhancements. Patients, researchers, health systems, and the public all may stand to benefit from efforts to share information, even without sharing data.
Corresponding Author: Tracy Lieu, MD, MPH, Division of Research, Kaiser Permanente Northern California, 2000 Broadway, Oakland, CA 94612 (firstname.lastname@example.org).
Published Online: August 6, 2018. doi:10.1001/jama.2018.9342
Conflict of Interest Disclosures: Both authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest and none were reported.
Platt R, Lieu T. Data Enclaves for Sharing Information Derived From Clinical and Administrative Data. JAMA. Published online August 06, 2018. doi:10.1001/jama.2018.9342