Patients with limited English proficiency experience communication barriers to health care in English-speaking countries. Written communication improves comprehension,1 but pretranslated standard instructions cannot address patient-specific issues (eg, medication titration). Machine translation tools, including Google Translate (GT), have potential to improve communication with these patients, but prior studies showed limited accuracy; 1 study found that GT Spanish translations of patient education materials were 60% accurate, with 4% resulting in serious error.2
In 2017, GT changed its translation algorithm, claiming significant improvement.3 In this study, we assess the use of GT to translate emergency department (ED) discharge instructions into Spanish and Chinese.
We abstracted 100 free-texted ED discharge instructions and oversampled for medication changes and common complaints.4 We analyzed each sentence by content category; Flesch-Kincaid readability score; use of medical jargon,5 such as atypical use of normal words (eg, positive test result) or medical terminology; and presence of nonstandard English (spelling or grammar errors, abbreviations, colloquial English, proper nouns). Content categories included explanation of diagnosis and/or results, follow-up instructions, medication instructions, return precautions, and greeting.
Using GT we translated instructions into Spanish and Chinese, and then bilingual translators translated the text back into English.
The primary outcome was sentence translation accuracy, assessed for overall content accuracy, not word-for-word accuracy, and coded as a binary outcome. Two clinicians coded accuracy independently; a third adjudicated disagreements. A second translator reviewed back-translations deemed inaccurate to ensure these were not back-translator error.
Potential for harm from inaccurate translations was assessed by 2 clinicians (with a third adjudicating) using an established rating system: clinically nonsignificant, clinically significant, and life-threatening potential harm.6 For analyses, we used a binary variable (clinically significant/life-threatening vs clinically nonsignificant/no harm).
We used logistic regression analyses stratified by language to assess associations between sentence characteristics and accuracy and/or harm. Variables with significance of P < .20 in bivariate analyses were used in multivariable analyses.
The 100 sets of patient instructions contained 647 sentences. Overall, 594 (92%) and 522 (81%) sentences were accurately translated into Spanish and Chinese, respectively, by GT (Table 1). A minority of inaccurate translations had potential for clinically significant harm: in Spanish, 15 (28%) of 53 inaccuracies and 15 (2%) of 647 sentences; in Chinese, 50 (40%) of 125 inaccuracies and 50 (8%) of 647 sentences. Some errors were correct translations of errant English instructions, but overall, content was inaccurate owing to grammar or typographical errors (Table 2) that would readily have been overlooked or understood by a reader of the English text.
Only spelling and grammar anomalies were associated with inaccurate translations in multivariable analyses: Spanish (odds ratio [OR], 2.6; 95% CI, 1.1-5.8); Chinese (OR, 2.6; 95% CI, 1.3-5.0).
In multivariable analyses, potential harm was associated in Spanish with a Flesch-Kincaid reading level higher than eighth grade (OR, 4.0; 95% CI, 1.2-13.5) and follow-up instructions (OR, 3.5; 95% CI, 1.2-10.2); and in Chinese with medical terminology (OR, 2.4; 95% CI, 1.2-4.9), spelling or grammar anomalies (OR, 3.1; 95% CI, 1.4-7.2), and colloquial English (OR, 5.9; 95% CI, 1.4-24.7).
Discharge instructions were translated by the new GT algorithm with higher accuracy and fewer seriously harmful inaccuracies than previously,2 yet 2% of Spanish and 8% of Chinese sentence translations had potential for significant harm. While GT can supplement (not replace) written English instructions, machine-translated instructions should include a warning about potentially inaccurate translations.
Clinicians using GT can reduce potential harm by having patients read translations while receiving verbal instructions; being vigilant about spelling and grammar; and avoiding complicated grammar, medical jargon (eg, fingerstick), and colloquial English.
Study limitations include assessment of only 2 languages (though our inclusion of Chinese is a strength, since non-European languages are often less accurately translated by machines); no assessment of translation readability; and no comparison to human translators.
Google Translate can be used to translate clinician-entered, patient-specific ED instructions for Spanish- and Chinese-speaking patients. Potential for harm can be minimized by using clear communication practices. We recommend including English instructions and automated warnings regarding the use of machine translation.
Accepted for Publication: November 13, 2018.
Corresponding Author: Elaine C. Khoong, MD, MS, Division of General Internal Medicine, Department of Medicine at Zuckerberg San Francisco General Hospital, University of California, San Francisco, 1001 Potrero Ave, 1M, San Francisco, CA 94122 (elaine.khoong@ucsf.edu).
Published Online: February 25, 2019. doi:10.1001/jamainternmed.2018.7653
Author Contributions: Dr Khoong had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Khoong, Brown, Fernandez.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: All authors.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Khoong, Brown.
Administrative, technical, or material support: Steinbrook, Fernandez.
Study supervision: Fernandez.
Conflict of Interest Disclosures: None reported.
1.Johnson
A, Sandford
J, Tyndall
J. Written and verbal information versus verbal information only for patients being discharged from acute hospital settings to home.
Cochrane Database Syst Rev. 2003;4(4):CD003716. doi:
10.1002/14651858.CD003716PubMedGoogle Scholar 2.Khanna
RR, Karliner
LS, Eck
M, Vittinghoff
E, Koenig
CJ, Fang
MC. Performance of an online translation tool when applied to patient educational material.
J Hosp Med. 2011;6(9):519-525. doi:
10.1002/jhm.898PubMedGoogle ScholarCrossref 3.Wu
Y, Schuster
M, Chen
Z,
et al. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation.
https://arxiv.org/abs/1609.08144. Accessed January 17, 2019.