[Skip to Navigation]
Original Investigation
April 28, 2023

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum

Author Affiliations
  • 1Qualcomm Institute, University of California San Diego, La Jolla
  • 2Division of Infectious Diseases and Global Public Health, Department of Medicine, University of California San Diego, La Jolla
  • 3Department of Computer Science, Bryn Mawr College, Bryn Mawr, Pennsylvania
  • 4Department of Computer Science, Johns Hopkins University, Baltimore, Maryland
  • 5Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla
  • 6Human Longevity, La Jolla, California
  • 7Naval Health Research Center, Navy, San Diego, California
  • 8Division of Blood and Marrow Transplantation, Department of Medicine, University of California San Diego, La Jolla
  • 9Moores Cancer Center, University of California San Diego, La Jolla
  • 10Department of Biomedical Informatics, University of California San Diego, La Jolla
  • 11Altman Clinical Translational Research Institute, University of California San Diego, La Jolla
JAMA Intern Med. Published online April 28, 2023. doi:10.1001/jamainternmed.2023.1838
Key Points

Question  Can an artificial intelligence chatbot assistant, provide responses to patient questions that are of comparable quality and empathy to those written by physicians?

Findings  In this cross-sectional study of 195 randomly drawn patient questions from a social media forum, a team of licensed health care professionals compared physician’s and chatbot’s responses to patient’s questions asked publicly on a public social media forum. The chatbot responses were preferred over physician responses and rated significantly higher for both quality and empathy.

Meaning  These results suggest that artificial intelligence assistants may be able to aid in drafting responses to patient questions.

Abstract

Importance  The rapid expansion of virtual health care has caused a surge in patient messages concomitant with more work and burnout among health care professionals. Artificial intelligence (AI) assistants could potentially aid in creating answers to patient questions by drafting responses that could be reviewed by clinicians.

Objective  To evaluate the ability of an AI chatbot assistant (ChatGPT), released in November 2022, to provide quality and empathetic responses to patient questions.

Design, Setting, and Participants  In this cross-sectional study, a public and nonidentifiable database of questions from a public social media forum (Reddit’s r/AskDocs) was used to randomly draw 195 exchanges from October 2022 where a verified physician responded to a public question. Chatbot responses were generated by entering the original question into a fresh session (without prior questions having been asked in the session) on December 22 and 23, 2022. The original question along with anonymized and randomly ordered physician and chatbot responses were evaluated in triplicate by a team of licensed health care professionals. Evaluators chose “which response was better” and judged both “the quality of information provided” (very poor, poor, acceptable, good, or very good) and “the empathy or bedside manner provided” (not empathetic, slightly empathetic, moderately empathetic, empathetic, and very empathetic). Mean outcomes were ordered on a 1 to 5 scale and compared between chatbot and physicians.

Results  Of the 195 questions and responses, evaluators preferred chatbot responses to physician responses in 78.6% (95% CI, 75.0%-81.8%) of the 585 evaluations. Mean (IQR) physician responses were significantly shorter than chatbot responses (52 [17-62] words vs 211 [168-245] words; t = 25.4; P < .001). Chatbot responses were rated of significantly higher quality than physician responses (t = 13.3; P < .001). The proportion of responses rated as good or very good quality (≥ 4), for instance, was higher for chatbot than physicians (chatbot: 78.5%, 95% CI, 72.3%-84.1%; physicians: 22.1%, 95% CI, 16.4%-28.2%;). This amounted to 3.6 times higher prevalence of good or very good quality responses for the chatbot. Chatbot responses were also rated significantly more empathetic than physician responses (t = 18.9; P < .001). The proportion of responses rated empathetic or very empathetic (≥4) was higher for chatbot than for physicians (physicians: 4.6%, 95% CI, 2.1%-7.7%; chatbot: 45.1%, 95% CI, 38.5%-51.8%; physicians: 4.6%, 95% CI, 2.1%-7.7%). This amounted to 9.8 times higher prevalence of empathetic or very empathetic responses for the chatbot.

Conclusions  In this cross-sectional study, a chatbot generated quality and empathetic responses to patient questions posed in an online forum. Further exploration of this technology is warranted in clinical settings, such as using chatbot to draft responses that physicians could then edit. Randomized trials could assess further if using AI assistants might improve responses, lower clinician burnout, and improve patient outcomes.

Add or change institution
5 Comments for this article
EXPAND ALL
Chatbots can simulate empathy through pre-programmed responses, they cannot truly understand the emotions.
Ediriweera Desapriya, PhD | Department of Pediatrics, faculty of medicine, UBC-BC Children's Hospital
Empathy, which is the ability to understand and share the feelings of others, is a complex emotional and cognitive process that involves more than just providing information. It involves active listening, genuine concern, and the ability to understand and respond to the emotional needs of patients. While chatbots can simulate empathy through pre-programmed responses, they cannot truly understand the emotions and needs of human users in the same way that a human healthcare professional can. While chatbots may not be able to fully replicate the human element of empathy, they can still be useful tools for training healthcare professionals and improving patient communication and engagement.

The study results suggest that longer responses from healthcare professionals are more popular and, therefore, there might be a correlation between the length of chatbot responses and their ratings. While it is true that longer responses may provide more information and be perceived as more informative, it is not necessarily true that longer responses are always better or more empathetic.

Furthermore, I have a question and a concern whether longer chatbot responses are simply a result of the machine having more time to respond, rather than providing more empathetic or informative responses and machines have a plenty of time (as compared to busy clinicians), it is important to ensure that the responses provided by chatbots are not simply longer for the sake of being longer, but rather, provide relevant and useful information to researchers.
CONFLICT OF INTEREST: None Reported
READ MORE
Authors' Conflicts of Interest Disclosure
Fiore Mastroianni, MD |

Several of the people who rated the responses are authors with financial conflicts of interest related to artificial intelligence or chat bot technology. They may be more likely to be able to recognize the kinds of responses produced by chat bots due to their work in the field. Moreover, they may stand to gain financially if the computer responses were found to be better.
CONFLICT OF INTEREST: None Reported
Intriguing investigation with significant limitations
Hong Sun, PhD | Principal Data Scientist, clinalytix department, Dedalus Healthcare
Thanks for reporting this interesting comparison! As a data scientist working with Medical artificial intelligence and a big fan of ChatGPT, I am not surprised to see its encouraging performance in this report. Nevertheless, I also find this article is cited as evidence that chatbot is surpassing human physicians in some social media, therefore, I would like to raise some limitations of this study:

Firstly, the answers from a Q&A forum are not representative of real clinical practice. In addition, the answer providers in the Q&A forum are also providing short answers off their clinical practice time, their performance
should not be considered as the normal level of physicians.

Secondly, the answers from chatbot are consistently long and detailed. It gives detailed explanations and guidance compared with those from human physicians. The sensitivity test that takes physician responses longer than the 75th percentile (≥62 words) is still much lower compared to the 211 [168-245] words from the chatbot. Given such a great gap in word counts, the evaluation of empathy is very biased.

The chatbot shows its potential to improve the communication between physicians and patients, I am wondering if it would be an interesting experiment for both questions and physicians' responses to be taken as inputs and ask the chatbot to generate a reply to the patients. This would allow an assessment of whether there is still added value from physicians in this Q&A forum setting.

CONFLICT OF INTEREST: None Reported
READ MORE
ChatGPT Training Sets
Catherine Mac Lean, MD, PhD | Hospital for Special Surgery, New York, NY
First off, kudos to the authors for an interesting, informative and timely article.

Can the authors comment on whether data from Reddit's r/AksDocs might have been included in ChatGPT's training set, and if so how this should inform interpretation of the study results. I directed this question to Chat GPT and got this response:

"As an AI language model, I don't have direct access to any particular subreddit, including r/AskDocs. However, it is possible that some of the text from that subreddit was included in the diverse range of sources used to train me, along
with text from many other websites and sources."

The authors may have better information.
CONFLICT OF INTEREST: None Reported
READ MORE
Chatbot versus physician performance.
Basil Fadipe, Mbbs | Justin Fadipe Centre . Dominica.
At one level the tentative results from this study are encouraging if not inspiring.
At another level however, it may be unwise to jump to any definitive conclusions too soon. Much of the interaction between the physician and patient in real life has as much to do with the physician taking cues not only from what the patient sitting in the room but also the unspoken words that may not infrequently carry even more clues than the spoken. The experienced clinician deploys all the five senses ( perhaps a sixth even ) to mine the clinical dilemma. Can
a chatbot ‘reach beneath the surface ‘ to similar effect?
CONFLICT OF INTEREST: None Reported
READ MORE
×