Association of the Usability of Electronic Health Records With Cognitive Workload and Performance Levels Among Physicians

Key Points Question Is enhanced usability of an electronic health record system associated with physician cognitive workload and performance? Findings In this quality improvement study, physicians allocated to perform tasks in an electronic health record system with enhancement demonstrated statistically significantly lower cognitive workload; those who used a system with enhanced longitudinal tracking appropriately managed statistically significantly more abnormal test results compared with physicians allocated to use the baseline electronic health record. Meaning Usability improvements in electronic health records appear to be associated with improved cognitive workload and performance levels among clinicians; this finding suggests that next-generation systems should strip away non–value-added interactions.


Introduction
The usability of electronic health records (EHRs) continues to be a major concern. 1-3 Usability challenges include suboptimal design of interfaces that have confusing layouts and contain either too much or too little relevant information as well as workflows and alerts that are burdensome.
Suboptimal usability has been associated with clinician burnout and patient safety events, and improving the usability of EHRs is an ongoing need. 4,5 A long-standing challenge for the US health care system has been to acknowledge and appropriately manage abnormal test results and associated missed or delayed diagnoses. [6][7][8][9][10][11] The unintended consequences of these shortcomings include missed and delayed cancer diagnoses and associated negative clinical outcomes (eg, 28% of women did not receive timely follow-up for abnormal Papanicolaou test results 8 ; 28% of women requiring immediate or short-term follow-up for abnormal mammograms did not receive timely follow-up care 9 ). Even in the EHR environment, with alerts and reminders in place, physicians continue to often inappropriately manage abnormal test results. [12][13][14][15][16][17][18][19][20][21] Some key remaining barriers to effective management of test results are suboptimal usability of existing EHR interfaces and the high volume of abnormal test result alerts, especially lesscritical alerts that produce clutter and distract from the important ones. 22,23 In addition, few organizations have explicit policies and decision support systems in their EHR systems for managing abnormal test results, and many physicians have developed processes on their own. [24][25][26] These issues are among the ongoing reasons to improve the usability of the EHR-based interfaces for the evaluation and management of abnormal test results.
We present the results of a quality improvement study to assess a relatively basic intervention to enhance the usability of an EHR system for the management of abnormal test results. We hypothesized that improvements in EHR usability would be associated with improvements in cognitive workload and performance among physicians.

Participants
This research was reviewed and approved by the institutional review board committee of the University of North Carolina at Chapel Hill. Written informed consent was obtained from all participants. The study was performed and reported according to the Standards for Quality Improvement Reporting Excellence (SQUIRE) guideline. 27 Invitations to participate in the study were sent to all residents and fellows in the school of medicine at a large academic institution, clearly stating the need for experience with using the Epic EHR software (Epic Systems Corporation) in reviewing test results to undergo the study's simulated scenarios. A $100 gift card was offered as an incentive for participation. Potential participants were given an opportunity to review and sign a consent document, which included information on study purpose, goals, procedures, and risks and rewards as well as the voluntary nature of participation and the confidentiality of data. Recruited individuals had the right to discontinue participation at any time. Forty individuals were recruited to participate, 2 of whom were excluded (eg, numerous cancellations), leaving 38 evaluable participants ( Table 1).

Study Design
From April 1, 2016, to December 23, 2016, 38 participants were enrolled and prospectively and blindly allocated to a simulated EHR environment: 20 were assigned to use a baseline EHR (without changes to the interface), and 18 were assigned to use enhanced EHRs (with changes intended to enhance longitudinal tracking of abnormal test results in the system) ( Figure). Abnormalities requiring an action included new abnormal test results and previously identified abnormal test results for patients who did not show up (without cancellation) for their scheduled appointment in which the findings would be addressed. The new abnormal test results included a critically abnormal mammogram (BI-RADS 4 and 5) and Papanicolaou test result with high-grade squamous intraepithelial lesion as well as noncritical results for rapid influenza test, streptococcal culture complete blood cell count, basic metabolic panel, and lipid profile, among others. The previously identified critical test results that required follow-up included abnormal mammogram (BI-RADS 4 and 5), Papanicolaou test result with high-grade squamous intraepithelial lesion, chest radiograph with 2 × 2-cm lesion in the left upper lobe, pulmonary function test result consistent with severe restrictive lung disease, and pathologic examination with biopsy finding of ascending colon consistent with adenocarcinoma.
The simulated scenarios were iteratively developed and tested by an experienced physician and human factors engineer (C.M. and L.M.) in collaboration with an Epic software developer from the participating institution. The process included functionality and usability testing and took approximately 12 weeks to complete. The experimental design was based on previous findings that attending physicians use the EHR to manage approximately 57 test results per day over multiple interactions. 22,23 Given that residents often manage a lower volume of patients, the present study was designed such that participants were asked to review a total of 35 test results, including 8 or 16 abnormal test results evenly distributed between study groups, in 1 test session. By organizational policies and procedures, participants were expected to review all results, acknowledge and follow-up on abnormal test results, and follow-up on patients with a no-show status (without cancellation) for their scheduled appointment aimed at addressing their previously identified abnormal test result. Reminders. It also clearly displayed information regarding patient status and policy-based decision support instructions for next steps (eg, "No show to follow-up appointment. Reschedule appointment in Breast Clinic").
The intervention was developed according to the classic theory of attention. 28 This theory indicates that cognitive workload varies continuously during the course of performing a task and that the changes of cognitive workload may be attributed to the adaptive interaction strategies of the operator exposed to task demands (eg, baseline or enhanced usability).

Main Outcomes and Measures Perceived Workload
The NASA-Task Load Index (NASA-TLX) is a widely applied and valid tool used to measure workload, [29][30][31][32][33][34] including the following 6 dimensions: (1) mental demand (How much mental and perceptual activity was required? Was the task easy or demanding, simple or complex?); (2) physical demand (How much physical activity was required? Was the task easy or demanding, slack or strenuous?); (3) temporal demand (How much time pressure did you feel with regard to the pace at which the tasks or task elements occurred? Was the pace slow or rapid?); (4) overall performance (How successful were you in performing the task? How satisfied were you with your performance?); (5) frustration level (How irritated, stressed, and annoyed [compared with content, relaxed, and complacent] did you feel during the task?); and (6)  been found to occur in EHR-based tasks requiring more cognitive workload. 37 The fundamental idea is that blink rate slows down under visual task demands that require more focused attention and working memory load, but this association might vary with the type of visual task demands. [38][39][40] For each participant, the time-weighted mean blink rate measured during the participant's review of all abnormal test results was calculated and then considered for data analysis.

Performance
For each participant, performance was quantified as the percentage of (new or previously identified) abnormal test results that were appropriately acted on (with possible scores ranging from 0%-100%). Appropriate action on abnormal test result was defined as the study participant ordering (compared with not ordering) a referral for further diagnostic testing (eg, breast biopsy for mass identified on an abnormal mammogram) to a subspecialty clinic (eg, breast clinic). In addition, per the policy and procedures of the institution in which the study took place, if patients missed their appointment for follow-up on critical test results, the participants were expected to contact (compared with not contact) schedulers to reschedule follow-up care. We also quantified the total amount of time that participants took to complete each simulated scenario.

Secondary Outcome and Measure
Fatigue can affect perceived and physiological workload and performance and thus can confound study results. [41][42][43] Because of the possible confounding association of fatigue, participants were asked to evaluate their own state of fatigue immediately before each simulated session using the fatigue portion of the Crew Status Survey. 44 The fatigue assessment scale included these levels: 1 (fully alert, wide awake, or extremely peppy), 2 (very lively, or responsive but not at peak), 3 (okay, or somewhat fresh), 4 (a little tired, or less than fresh), 5 (moderately tired, or let down), 6 (extremely tired, or very difficult to concentrate), and 7 (completely exhausted, unable to function effectively, or ready to drop). The Crew Status Survey has been tested in real and simulated environments and has been found to be both reliable and able to discriminate between fatigue levels. 44,45

Statistical Analysis
On the basis of the anticipated rate of appropriately identified abnormal test results in the literature [12][13][14][15][16][17][18][19][20][21] and the anticipated magnitude of the association of the enhanced EHR, we required a sample size of 30 participants, each reviewing 35 test results, to achieve 80% power to detect a statistically significant difference in cognitive workload and performance. Specifically, we performed sample size calculations at α = .05, assuming that we could detect a mean (SD) difference of 10 (10) in NASA-TLX scores, a mean (SD) difference of 5 (10) in blink rate, and a mean (SD) difference of 10% (15%) in performance.
Before data analyses, we completed tests for normality using the Shapiro-Wilk test and equal variance using the Bartlett test for all study variables (cognitive workload, performance, and fatigue).
Results indicated that all assumptions to perform parametric data analysis were satisfied (normality: all P > .05; equal variance: all P > .05).
We conducted a 2-sample t test to assess the association of enhanced usability of the EHR interface to manage abnormal test results with physician cognitive workload and performance. All data analyses were conducted from January 9, 2017, to March 30, 2018, using JMP 13 Pro software (SAS Institute Inc). Statistical significance level was set at 2-sided P = .05, with no missing data to report.

Discussion
Participants in the enhanced EHR group indicated physiologically lower cognitive workload and improved clinical performance. The magnitude of the association of EHR usability with performance we found in the present study was modest, although many such improvements tend to have substantial value in the aggregate. Thus, meaningful usability changes can and should be implemented within EHRs to improve physicians' cognitive workload and performance. To our knowledge, this research is the first prospective quality improvement study of the association of EHR usability enhancements with both physiological measure of cognitive workload and performance during physicians' interactions with the test results management system in the EHR.
The enhanced EHR was more likely to result in participants reaching out to patients and schedulers to ensure appropriate follow-up. Physicians who used the baseline EHR were more likely to treat the EHR (not treat the patient) by duplicating the referral, rather than to reach out to patients and schedulers to find out the issues behind the no-show. In the poststudy conversations with participants, most indicated a lack of awareness about policies and procedures for managing patients with a no-show status and justified their duplication of orders as safer medial practice. This result seems to be in line with findings from real clinical settings, suggesting that few organizations have explicit policies and procedures for managing test results and most physicians developed processes on their own. 25,26 The result from the baseline EHR group is in line with findings from real clinical settings that indicated physicians did not acknowledge abnormal test results in approximately 4% of cases. 19,20 The optimal performance in the enhanced EHR group is encouraging.
No significant difference was noted in the time to complete simulated scenarios and perceived workload between baseline and enhanced EHR groups, as quantified by the global NASA-TLX or by each dimension, while trending toward lower scores (

Limitations
This study has several limitations, and thus caution should be exercised in generalizing the findings.
First, the results are based on 1 experiment with 38 residents and fellows from a teaching hospital artificially performing a discrete set of scenarios. Larger studies could consider possible confounding factors (eg, specialty, training levels, years of EHR use, attendings or residents) and more accurately quantify the association of usability with cognitive workload and performance. Second, performing the scenarios in the simulated environment, in which the participants knew that their work was going to be assessed, may have affected participants' performance (eg, more or less attentiveness and vigilance as perceived by being assessed or by the possibility of real harm to the patient). To minimize this outcome, all participants were given a chance to discontinue their participation at any time, but participant-specific findings would remain confidential. None of the participants discontinued participation in the study, although 2 participants were excluded from the study as they were not able to meet the scheduling criteria. Third, we acknowledge that the cognitive workload and performance scores were likely affected by the setting (eg, simulation laboratory and EHR) and thus might not reflect the actual cognitive workload and performance in real clinical settings. A laboratory setting cannot totally simulate the real clinical environment, and some activities cannot be easily reproduced (eg, looking up additional information about the patient using an alternative software, calling a nurse with a question about a particular patient, or a radiologist or laboratory technician calling physicians and verbally telling them about abnormal images). We also recognize that the enhanced usability was not optimal as it was designed and implemented within the existing capabilities of the EHR environment used for training purposes.

JAMA Network Open | Health Informatics
Fourth, the intervention might have manipulated both the ease of access to information through a reorganized display and learning because it provided a guide to action by clearly showing information on patient status and policy-based decision support instructions for next steps. Future research could more accurately quantify the association of usability and learning with cognitive workload and performance. Nevertheless, the intervention provided the necessary basis to conduct this study. All participants were informed about the limitations of the laboratory environment before the study began.

Conclusions
Relatively basic usability enhancements to EHR systems appear to be associated with improving physician management of abnormal test results while reducing cognitive workload. The findings from this study support the proactive evaluation of other similar usability enhancements that can be applied to other interfaces within EHRs.