Plots show cumulative post count (y-axis) at each date (x-axis) for time-to-detection analysis. A and B, Papulopustular (acneiform) eruption and nail and finger changes were first described in association with erlotinib (Tarceva; Genentech) in case reports published in September 2005 and September 2006, respectively.4,5 Inspire posts for these reactions appeared 5 and 3 months in advance of publication, respectively. Collectively, for these epidermal growth factor inhibitor (EGFRi)–associated reactions and for autoimmune blistering reactions and psoriasis flares on programmed cell death–1 inhibitor treatment, Inspire forum posts describing these ADRs preceded initial case reports by an average of 7 months (range, 3-9 months). C, Twenty-three distinct users described hypohidrosis in a causal relationship with erlotinib as early as 2006, with a significantly enriched proportional reporting ratio (1.90), implicating hypohidrosis as a novel, missed, rare ADR. The line at January 2017 indicates the initial clinical documentation. The vertical line at 2016 shows the last analyzed Inspire content.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Ransohoff JD, Nikfarjam A, Jones E, et al. Detecting Chemotherapeutic Skin Adverse Reactions in Social Health Networks Using Deep Learning. JAMA Oncol. 2018;4(4):581–583. doi:10.1001/jamaoncol.2017.5688
Adverse drug reactions (ADRs) occur in nearly all patients undergoing anticancer therapy, contributing to morbidity, therapy disruptions, and rising health care costs.1 Their identification and characterization are hampered by clinical trials that are underpowered to detect rare events, the division of patients across institutions, patient exclusion from trials, publication editorial delays, and lack of participation and planning in oncology clinical trials of medical disciplines outside of oncology. Postmarket drug surveillance platforms, such as US Food and Drug Administration (FDA) monitoring rely on voluntary, spontaneous reporting and lack temporal advantage over literature. Early recognition of ADRs could substantially improve health outcomes and decrease societal costs. Internet community health forums provide a mechanism for several hundred million individuals to discuss current health concerns and may serve as a resource for computational detection of ADRs. However, the language in social media is highly informal, and expressed medical concepts are often nontechnical, descriptive, and challenging to extract using dictionary-based methods.
Herein, we demonstrate proof-of-principle early detection of chemotherapeutic-associated skin ADRs from social health networks using a deep learning–based signal generation pipeline to capture how patients describe cutaneous eruptions in their own words and use statistical methods to quantify the association strength of our target drug-ADR pairs.
We extracted mentions of common and rare cutaneous ADRs from 8 million posts in the Inspire health forum (https://www.inspire.com/) related to the epidermal growth factor receptor (EGFR) inhibitor, erlotinib, or the immune checkpoint programmed cell death–1 (PD-1) inhibitors, nivolumab and pembrolizumab.
To detect ADR mentions, we used DeepHealthMiner (DHM),2 a deep learning named entity recognition tool, and mapped extracted mentions to relevant concepts in the Unified Medical Language System (UMLS). To quantify the drug-ADR association strength, a proportional reporting ratio (PRR) was calculated and compared with drug-ADR pairs with no known associations to calibrate the threshold at which the PRR represents true ADR signal.3 To establish time-to-detection comparisons against literature, we reviewed extractions, excluding noncausal drug-ADR mentions, and compared the frequency and timing of these detections against published clinical reports. An institutional review board protocol was not required by Stanford University.
Our system achieved a microaverage precision of 0.90 for named entity recognition of our target ADRs by manual validation. We report the PRR for each target drug-ADR pair and the distribution of the PRR values for 81 drug-ADR pairs with negative associations (median, 0.12; mean, 0.2; maximum, 1.4), which served as experimental negative controls. The PRR for more than 95% of negative drug-ADR pairs is less than 0.82; thus, a drug-ADR pair with PRR greater than 1 is likely to be a true-positive.
To temporally benchmark Inspire content against publications and clinical presentations, we compared causal drug-ADR mentions of erythematous eruption and nail changes with erlotinib, and psoriasis flares and blistering reactions with immune checkpoint inhibitors in the Inspire database with first-published clinical reports. Known ADRs were reported at frequencies comparable with those of published reports but with significantly enriched PRR scores (Table) and an average lead time of 7 months in advance of literature reporting4,5 (range, 3-9 months) (Figure, A and B). In addition, we detected 23 novel cases of hypohidrosis in patients receiving erlotinib (Figure, C) with an enriched PRR score of 1.90, which may represent a rare, missed ADR that has been present in online discussion for more than 11 years. EGFR is expressed in sweat glands and is involved in the hypohidrotic ectodermal dysplasia phenotype,6 suggesting a mechanism by which EGFR inhibition can produce hypohidrosis.
Several hundred million individuals discuss health-related issues in online forums, offering a robust resource for drug safety surveillance.5 Our deep learning pipeline extracts mentions of cutaneous ADRs with high precision from the highly informal text in social health networks, detecting ADRs with an average 7-month lead-time from clinical reports. In addition, it uncovered a novel cutaneous ADR, not previously reported. We demonstrate the capacity of deep learning–based methods to detect ADRs from online health forums, offering the potential for real-time pharmacosurveillance with rapid discovery of ADRs preceding FDA detection and published clinical reports.
Corresponding Author: Kavita Sarin, MD, PhD, Department of Dermatology, Stanford University School of Medicine, 450 Broadway St, Pavilion C, Second Floor—MC5334, Redwood City, CA 94063 (email@example.com).
Accepted for Publication: December 13, 2017.
Published Online: March 1, 2018. doi:10.1001/jamaoncol.2017.5688
Author Contributions: Drs Sarin and Shah had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Ransohoff, Nikfarjam, Kwong, Sarin, Shah.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Ransohoff, Nikfarjam, Sarin.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Ransohoff, Nikfarjam, Jones, Shah.
Obtained funding: Shah.
Administrative, technical, or material support: Nikfarjam, Loew.
Study supervision: Nikfarjam, Kwong, Sarin, Shah.
Conflict of Interest Disclosures: None reported.
Funding/Support: This work was partially supported by National Institutes of Health grant No. 5R01GM101430-05.
Role of the Funder/Sponsor: The funding source had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Additional Information: Ms Ransohoff and Dr Nikfarjam are co–first authors. Drs Kwong, Sarin, and Shah are senior authors.
Additional Contributions: We thank the Inspire team for making this dataset available to us for analysis: Peter Hartzler, AB, chief technical officer; Jeff Terkowitz, BA, senior director of product; and Kathryn Ticknor, MA, senior manager, research and insights. They were not compensated for their assistance.