With the routine use of electronic health records (EHRs) in hospitals, health systems, and physician practices, there has been rapid growth in the availability of health care data over the last decade. In addition to the structured data in EHRs, new methods such as natural language processing can derive meaning from unstructured data, permitting the capture of substantial clinical information embedded in clinical notes. Furthermore, the growth in the availability of registries and claims data and the linkages between all these data sources have created a big data platform in health care, vast in both size and scope.