[Skip to Content]
[Skip to Content Landing]
Views 2,378
Citations 0
Original Investigation
July 25, 2019

Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports

Author Affiliations
  • 1Division of Population Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts
  • 2Thoracic Oncology Program, Dana-Farber Cancer Institute, Boston, Massachusetts
  • 3Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
  • 4Department of Imaging, Dana-Farber Cancer Institute, Boston, Massachusetts
  • 5Department of Informatics, Dana-Farber Cancer Institute, Boston, Massachusetts
JAMA Oncol. 2019;5(10):1421-1429. doi:10.1001/jamaoncol.2019.1800
Key Points

Question  Can deep natural language processing of radiologic reports be used to measure real-world oncologic outcomes, including disease progression and response to therapy?

Findings  In a cohort study of 2406 patients with lung cancer, the findings suggested that deep learning models may estimate human curations of the presence of active cancer, cancer worsening/progression, and cancer improvement/response in radiologic reports with good discrimination (area under the receiver operating characteristic curve, >0.90). Statistically significant associations between these end points and overall survival were observed.

Meaning  Deep natural language processing may be able to extract clinically relevant oncologic end points from radiologic reports.


Importance  A rapid learning health care system for oncology will require scalable methods for extracting clinical end points from electronic health records (EHRs). Outside of clinical trials, end points such as cancer progression and response are not routinely encoded into structured data.

Objective  To determine whether deep natural language processing can extract relevant cancer outcomes from radiologic reports, a ubiquitous but unstructured EHR data source.

Design, Setting, and Participants  A retrospective cohort study evaluated 1112 patients who underwent tumor genotyping for a diagnosis of lung cancer and participated in the Dana-Farber Cancer Institute PROFILE study from June 26, 2013, to July 2, 2018.

Exposures  Patients were divided into curation and reserve sets. Human abstractors applied a structured framework to radiologic reports for the curation set to ascertain the presence of cancer and changes in cancer status over time (ie, worsening/progressing vs improving/responding). Deep learning models were then trained to capture these outcomes from report text and subsequently evaluated in a 10% held-out test subset of curation patients. Cox proportional hazards regression models compared human and machine curations of disease-free survival, progression-free survival, and time to improvement/response in the curation set, and measured associations between report classification and overall survival in the curation and reserve sets.

Main Outcomes and Measures  The primary outcome was area under the receiver operating characteristic curve (AUC) for deep learning models; secondary outcomes were time to improvement/response, disease-free survival, progression-free survival, and overall survival.

Results  A total of 2406 patients were included (mean [SD] age, 66.5 [10.8] years; 1428 female [59.7%]; 2170 [90.2%] white). Radiologic reports (n = 14 230) were manually reviewed for 1112 patients in the curation set. In the test subset (n = 109), deep learning models identified the presence of cancer, improvement/response, and worsening/progression with accurate discrimination (AUC >0.90). Machine and human curation yielded similar measurements of disease-free survival (hazard ratio [HR] for machine vs human curation, 1.18; 95% CI, 0.71-1.95); progression-free survival (HR, 1.11; 95% CI, 0.71-1.71); and time to improvement/response (HR, 1.03; 95% CI, 0.65-1.64). Among 15 000 additional reports for 1294 reserve set patients, algorithm-detected cancer worsening/progression was associated with decreased overall survival (HR for mortality, 4.04; 95% CI, 2.78-5.85), and improvement/response was associated with increased overall survival (HR, 0.41; 95% CI, 0.22-0.77).

Conclusions and Relevance  Deep natural language processing appears to speed curation of relevant cancer outcomes and facilitate rapid learning from EHR data.