ChatGPT (generative pretrained transformer), an artificial intelligence–powered language model chatbot, has been described as an innovative resource for many industries, including health care.1 Lower health literacy and limited understanding of postoperative instructions have been associated with worse outcomes.2,3 While currently ChatGPT cannot supplant a human clinician, it can serve as a medical knowledge source. This qualitative study assessed the value of ChatGPT in augmenting patient knowledge and generating postoperative instructions for use in populations with low educational or health literacy levels.
We analyzed postoperative patient instructions for 8 common pediatric otolaryngologic procedures: tympanostomy tube placement, tonsillectomy and adenoidectomy, inferior turbinate reduction, tympanoplasty, cochlear implant, neck mass resection, microdirect laryngoscopy and bronchoscopy, and tongue-tie release. Stanford University Institutional Review Board deemed this study exempt from review and waived the informed consent requirement given the study design. We followed the SRQR reporting guideline.
Postoperative instructions were obtained from ChatGPT, Google Search, and Stanford University (hereafter, institution). This phrase was entered into ChatGPT: Please provide postoperative instructions for the family of a child who just underwent a [procedure]. Provide them at a 5th grade reading level. Similarly, this phrase was entered into Google Search: My child just underwent [procedure]. What do I need to know and watch out for? The first nonsponsored Google Search results were used for analysis. Results were extracted and blinded. To enable adequate blinding, we standardized all fonts and removed audiovisuals (eg, pictures). Two of us (N.F.A., Y.-J.L.) scored the instructions.
The primary outcome was the Patient Education Materials Assessment Tool–printable (PEMAT-P)4 score, which assessed the understandability and actionability of instructions for patients of different backgrounds and health literacy levels. As a secondary outcome, instructions were scored on whether they addressed procedure-specific items. We a priori generated a list of 4 items specific to each procedure that were deemed important for each instruction to mention; see the Table 1 footnote for these items.
Scores were compared using 1-way analysis of variance and Kruskal-Wallis tests with η2 (90% CI) as the appropriate effect size.5 Analysis was performed February 6, 2023 using R, version 4 (R Core Team).
Overall, understandability scores ranged from 73% to 91%; actionability scores, 20% to 100%; and procedure-specific items, 0% to 100% (Table 1). ChatGPT-generated instructions were scored from 73% to 82% for understandability, 20% to 80% for actionability, and 75% to 100% for procedure-specific items.
Institution-generated instructions consistently had the highest scores (Table 2). Understandability scores were highest for institution (91%) vs ChatGPT (81%) and Google Search (81%) instructions (η2, 0.86; 90% CI, 0.67-1.00). Actionability scores were lowest for ChatGPT (73%), intermediate for Google Search (83%), and highest for institution (92%) instructions (η2, 0.22; 90% CI, 0.04-0.55). For procedure-specific items, ChatGPT (97%) and institution (97%) instructions had the highest scores and Google Search had the lowest (72%) (η2, 0.23; 90% CI, 0-0.64).
Findings suggest that ChatGPT provides instructions that are helpful for patients with a fifth-grade reading level or different health literacy levels. However, ChatGPT-generated instructions scored lower in understandability, actionability, and procedure-specific content than Google Search– and institution-specific instructions. Despite these findings, ChatGPT may be beneficial for patients and clinicians, especially when alternative resources are limited.
Online search engines are common sources of medical information for the public: 7% of Google searches are health-related.6 However, ChatGPT has advantages over search engines: it is free, can be customized to different literacy levels, and provides succinct information. ChatGPT provides direct answers that are often well-written, detailed, and in if-then format, which give patients access to immediate information while waiting to reach a clinician.
Study limitations were that only a few procedures and resources were analyzed, and the analysis was performed only in English. ChatGPT limitations included lack of citations; inability of users to confirm the accuracy of the information or explore topics further; and a knowledge base with a 2021 end point, excluding the latest data, events, or practice.
Accepted for Publication: March 8, 2023.
Published Online: April 27, 2023. doi:10.1001/jamaoto.2023.0704
Corresponding Author: Noel Ayoub, MD, MBA, Stanford University School of Medicine, Department of Otolaryngology–Head & Neck Surgery, 801 Welch Rd, 2nd Floor, Stanford, CA 94304 (nfa@stanford.edu).
Author Contributions: Dr Ayoub had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Ayoub, Lee, Balakrishnan.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Ayoub, Lee.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Lee.
Administrative, technical, or material support: Ayoub, Lee, Grimm.
Supervision: Balakrishnan.
Conflict of Interest Disclosures: Dr Balakrishnan reported receiving royalties from Springer Inc outside the submitted work. No other disclosures were reported.
Data Sharing Statement: See the Supplement.
4.Shoemaker
SJ, Wolf
MS, Brach
C. The Patient Education Materials Assessment Tool (PEMAT) and User’s Guide. AHRQ Publication No. 14-0002-EF. Agency for Healthcare Research and Quality; 2013.