Development and Validation of a Deep Learning Model for Detection of Allergic Reactions Using Safety Event Reports Across Hospitals

This cross-sectional study describes the development and evaluation of a deep learning model to identify allergic reaction in the free-text narrative of hospital safety reports.

This supplementary material has been provided by the authors to give readers additional information about their work.

eMethods. Efficiency and Productivity
We compared our deep learning approach to conventional keyword search in terms of manual review effort (efficiency) and positive case yield (productivity). To compare the efficiency, we determined the number of reports requiring manual review in each approach. For the ADNN model, we used a false negative rate of 0.5% as a cut-off to decide the number of reports, n, requiring manual validation; that is, we manually reviewed top-n reports identified from Datasets II-IV until the last 200 reviewed reports contained only one positive case. For the keyword-search approach, we used the 101 expert-curated keywords to identify all possible cases from Datasets II-IV and calculated the number of cases requiring manual review. To compare the productivity, we determined the numbers of true cases identified by the two approaches. For the ADNN model, it was the number of cases among the top-n reviewed reports manually labeled as allergic reactions. For the keyword search, we estimated the number of true cases as follows: for each dataset of Dataset III and Dataset IV, we split all reports extracted by keyword-search into two subsets, the first containing reports that overlapped with those also identified by ADNN and the second containing reports identified only by keyword search. For the first subset, we used the number of the positive cases manually reviewed when evaluating the model. For the second subset, we estimated the number of positive cases based on the precision of the keyword-search approach on 1000 randomly selected reports, which were manually reviewed. That is, we first manually reviewed 1000 randomly selected cases identified by keywords and calculated the precision. We then estimated the number of positive cases ( ) by multiplying the precision by the total of number of cases within each dataset, which is denoted as the following ∈ where ∈ is the number of positive cases among the 1000 randomly selected cases and N is the total number of cases within each subset. Lastly, we summed the numbers of positive cases from the two subsets as the number of positive cases extracted by keyword search. For Dataset II, because these reports did not contain any keywords, so the number of positive cases retrieved by keywords was 0. In addition, we calculated the precision (i.e., the proportion of true positives among the identified cases) of each approach in identifying allergic reactions for each dataset. Finally, we conducted an error analysis for both approaches and investigated major causes of errors.

Interpretability and Keyword Extension
The ADNN attention layer assigned each input word with a weight that measures how much attention the model gives and to which words when predicting allergic reactions. To identify the words with high attention weight, we selected reports with a greater than 0.8 probability of being allergic reactions. We extracted the words with an attention weight at least two standard deviations above the average weights within that report and generated a list of "high attention keywords" detected by the model. We compared the "high attention keywords" with the 101 expert-curated keywords to identify a list of new keywords extended by the model. We similarly identified a set of key phrases. For each selected report, we extracted the consecutive words with attention weight at least one standard deviation above the average weights within the report and aggregated key phrases from those reports.

Extraction of Common Allergic Reactions
There were 2378 validated allergic events in total in dataset II, III, and IV. We categorized these reactions into groups (Table 2). Each allergic reaction group includes one or more reaction keywords (Table 2, column "Included Keywords"). We calculated the frequency of each allergic reaction as following: we counted the number of allergic events that included any keyword(s) in the "included keywords" column as the high attention keyword (attention weight was at least two standard deviations above the average weights within the report,), and then divided the above number by all validated true allergic events (i.e. 2378). Then we ranked all the reactions and reported the top 10 most common allergic reactions.

eFigure 1. Overall Framework of the Deep Learning Model
Green, yellow and red circles represent the character-level embeddings, character-sequence representation, and word-level embeddings, respectively.
is the attention weight of the t-th word. The character-level representation encodes the character sequence of each word with a one-layer CNN. The output of character CNN is concatenated with the word embedding to build the word representation. The word representation is fed into a bidirectional LSTM to capture the context information of the sentence. An attention layer is applied on top of the word representation layer to calculate the attention weight for each word. The final report-level representation is the weighted sum of all the word vectors within the report, which considers the context of the report. The classifier was trained using the cross-entropy loss function and the Stochastic Gradient Descent (SGD) optimizer. The output of the classifier is a vector representing the probability of whether or not a report contained allergic reactions.

eFigure 2. Allergy Keywords
This graph illustrates the importance and frequency of allergic reaction keywords created by clinical experts and detected by the model. The word frequency was calculated by dividing the number of occurrences of a keyword by the total number of words in all reports with greater than 0.8 predicted probability of being an event of allergic reaction. The word importance is the average of a keyword's attention weights in the reports with greater than 0.8 predicted probability in which it appeared. Green squares represent the overlapping keywords identified by both experts and the model. Yellow triangles represent keywords that were only included in the expert-curated list. Blue circles represent extended keywords only identified by the model's attention mechanism. Additional details about these keywords are listed in eTable 1. Patient had bovie grounding pads on bilateral thighs. They were on the anterior-lateral portion of the thighs. Upon removal of the pads a reddened and purplish area was noted on the periphery where the grounding pads were. Area was not raised and patient wasn't complaining of any itching or pain.

Reasons for
3 Anti-allergy medications used for non-allergic reasons Pt was on a steroid taper of IV Solumedrol to be changed to prednisone po. Pt. recieved IV dose of Solumderol at 9AM and should have recieved po prednisone at 9pm. The order was approved but the next dose was scheduled for 9pm on 9/26/10 instead of the appropriate time of 9pm on 9/25/10. The patient then missed thier evening dose of prednisone.
3 Not an adverse event, but contains allergic reaction terms Patient stated no allergies to CT contrast dye and has had it in the past with no problems. No allergy was documented in the chart. Patient was given 17ml of iodinated contrast at 3:33pm and vomited immediately after. Patient drank water and said she felt ok and to continue with the exam. Radiologist believes this was a physiological reaction but wants the patient to wait 30 minutes after the exam to make sure he does not have a reaction. … 10 Report mentions body part (e.g., throat) commonly involved in allergic reaction, but not allergic reaction During the closing count, discovered missing half of the throat pack. 1 whole throat pack opened during the case and cut into half. the other half is missing. MD notified and denied inserting a throat pack to the patient. x-ray without evidence of throat pack on the patient.

5
Adverse reaction, but not an allergic reaction Pt underwent a permanent pacemaker insertion; subsequently developed a rash to left axilla area; area red with yellow pustules; patient complaints of intermitten pain to affected area; physician assistant, physician notified.
2 Not an adverse event, but the report contained discharge instructions that listed some allergic reactions that the patient should watch for Pt extremely impatient to leave hospital once he learned that he was medically stable to be discharged. Pt unwilling to wait additional minute to sign discharge instructions. Pt agreed to return to hospital if he experiences fever, increase in diarrhea, or other new symptoms such as shortness of breath, fever, rash, and other symptoms.
1 a 100 failure cases were randomly selected and manually reviewed b Reports modified slightly to anonymize patient, provider, and institution. bx: biopsy. PCA: patient care assistant. Pt: patient. Sxs: symptoms. VSS: vital sign stable.

c
The patient likely had shingles, a viral infection that causes a painful rash, and was treated with Neurontin.