Flowchart of study sample selection. MEDVAMC indicates Michael E. DeBakey Veterans Affairs Medical Center (Houston, Tex).
Number of errors detected at various time points between index and return visits.
Singh H, Thomas EJ, Khan MM, Petersen LA. Identifying Diagnostic Errors in Primary Care Using an Electronic Screening Algorithm. Arch Intern Med. 2007;167(3):302-308. doi:10.1001/archinte.167.3.302
Diagnostic errors are the leading basis for malpractice claims in primary care, yet these errors are underidentified and understudied. Computerized methods used to screen for other types of errors (eg, medication related) have not been applied to diagnostic errors. Our objectives were to assess the feasibility of computerized screening to identify diagnostic errors in primary care and to categorize diagnostic breakdowns using a recently published taxonomy.
We used an algorithm to screen the electronic medical records of patients at a single hospital that is part of a closed health care system. A Structured Query Language–based program detected the presence of 1 of 2 mutually exclusive electronic screening criteria: screen 1, a primary care visit (index visit) followed by a hospitalization in the next 10 days; or screen 2, an index visit followed by 1 or more primary care, urgent care, or emergency department visits within 10 days. Two independent, blinded reviewers determined the presence or absence of diagnostic error through medical record review of visits with positive and negative screening results.
Among screen 1 and 2 positive visits, 16.1% and 9.4%, respectively, were associated with a diagnostic error. The error rate was 4% in control cases that met neither screening criterion. The most common primary errors in the diagnostic process were failure or delay in eliciting information and misinterpretation or suboptimal weighing of critical pieces of data from the history and physical examination. The most common secondary errors were suboptimal weighing or prioritizing of diagnostic probabilities and failure to recognize urgency of illness or its complications.
Electronic screening has potential to identify records that may contain diagnostic errors in primary care, and its performance is comparable to screening tools for other types of errors. Future studies that validate these findings in other settings could provide improvement initiatives in this area.
The complexity and severity of illness seen in primary care have increased in recent years, contributing to higher risk for medical errors in this setting.1 It is possible that diagnostic errors are the most common types of error in primary care1- 3 and the most expensive,4,5 being the leading basis for malpractice claims.1,4,6 Despite their importance, diagnostic errors are an underemphasized, underidentified, and understudied area of patient safety,7 perhaps because they are difficult to detect.6 Therefore, strategies to help identify diagnostic errors are critical to improving the quality and safety of primary care delivery.
The effectiveness of techniques such as spontaneous reporting and random medical record reviews to identify diagnostic errors is limited.8,9 In recent years, computerized methods have been developed to identify adverse drug events and other adverse events in hospitalized patients.8,10- 14 These methods include automated identification of specific events such as orders for antidotes, certain abnormal laboratory values, and unexpected stop orders as sentinels or triggers to initiate more detailed medical record review.13
Computerized techniques to selectively trigger medical record review have not been applied to diagnostic errors. However, the literature suggests candidate events that may be associated with diagnostic error and can be easily identified in the medical record. In emergency department studies, unscheduled return visits within 72 hours were associated with diagnostic errors in 20% of cases.15- 17 The Veterans Affairs (VA) peer review process has also previously found quality of care issues linked to admissions within 3 days after unscheduled ambulatory care visits.18 In view of these findings, we designed and tested a computerized screening algorithm to electronically identify primary care visits associated with subsequent return visits or hospitalizations within 10 days. We chose a longer time frame, given the lower acuity of illness in patients generally seen in primary vs emergency care. Although this approach would still require medical record review to verify the error, it would be more selective and efficient than review of unscreened records selected at random.
Our primary objective was to assess the feasibility and usefulness of computerized screening to help identify diagnostic errors in primary care. A secondary objective was to apply a method of classifying these errors by the points in the diagnostic process at which they occurred.
We focused on the academic primary care clinics of the Michael E. DeBakey Veterans Affairs Medical Center (MEDVAMC) in Houston, Tex, from August 1, 2004, through September 30, 2005. The clinic consists of a rotating group of 130 internal medicine residents who see patients in scheduled primary care follow-up clinic visits and walk-in unscheduled clinic visits, and 10 rotating staff physicians who supervise them. Of approximately 6000 patients, 1200 are assigned to staff for direct care and the remaining are assigned to residents. The study was approved by the Baylor College of Medicine Institutional Review Board and the MEDVAMC Research and Development Committee.
The VA uses an electronic medical record system for all of its health care delivery, and all appointments are electronically tracked. Because most VA patients continue to use the VA system as the source for their care, comprehensive follow-up data are available for primary care visits in most cases. The samples of visits in the present study were initially identified using the MEDVAMC data warehouse, a Structured Query Language (SQL) database (Microsoft Corp, Redmond, Wash) populated by monthly extracts from the electronic medical record. We selected visit records identified by “general medicine primary care” clinic stop codes within defined time periods. Using an SQL-based program, we electronically screened those records for events that could have been precipitated by a diagnostic error. All scheduled and unscheduled patient visits were evaluated for the presence of 1 of 2 mutually exclusive screening criteria: screen 1, a primary care visit (index visit) followed by a hospitalization in the next 10 days; and screen 2, a primary care visit (index visit) followed by 1 or more primary care visits, an urgent care visit, or an emergency department visit within 10 days, excluding index visits that were positive in screen 1. Control visits were a random sample of visits that did not meet screen 1 or screen 2 criteria.
Both screens excluded hospitalizations and return visits that occurred within 24 hours of the index visit. Our pilot work indicated that some visits, particularly from afternoon clinics, were often recorded as admissions only in early morning hours of the next day, and this exclusion may reduce false-positive results. A blinded reviewer performed a brief (<5 minutes) medical record review to exclude index visits during which a hospitalization was recommended for treatment or further workup as part of the diagnostic plan. Visits associated with a future elective hospitalization within 10 days were also excluded. Because we expected screen 1 to yield fewer screen positive visits than screen 2, we ran screen 1 on a larger time frame of primary care visits to yield roughly comparable sample sizes for visits identified by the 2 screening programs.
For the medical record review, we generated lists of primary care index visits that met screen 1, screen 2, and control criteria. Two chief residents and a third-year resident (all in internal medicine) from a local university medical center performed the medical record reviews. To reduce bias, we selected reviewers who were not affiliated with our institution and who were unfamiliar with its staff. The reviewers were blinded to the objectives of the study. A standardized data collection instrument was designed to minimize the likelihood of detection of study hypothesis and objectives, and the reviewers were trained during pilot testing to ensure comprehensive data collection and sound error assessment.
The reviewers determined the presence or absence of diagnostic error at the index visit by examining the electronic medical record for details about the index and subsequent visits. Documented information about non-VA visits was also evaluated during this review. Because medical records from 11 control patients lacked information about their outcomes, we contacted those patients by telephone to determine whether they subsequently received an alternative diagnosis at another health care institution. To assess the use of follow-up as a safety net for patients in primary care, we studied the characteristics of return primary care visits. We differentiated visits that were prearranged by the clinicians to follow-up on uncertain diagnoses or for close monitoring from those that were patient-initiated unscheduled return visits. Although not the primary objective of the study, the reviewers were also asked to note other errors in clinical management.
Diagnostic errors were defined as occurrences for which diagnosis was unintentionally delayed (sufficient information was available earlier), wrong (another diagnosis was made before the correct diagnosis), or missed (no diagnosis was ever made), as judged from the eventual appreciation of more definitive information.19 We trained reviewers using several case examples illustrating presence or absence of a diagnostic error. Though we used this explicit definition from the literature, the process involved implicit judgments. Reviewers denoted errors only if they were near certain about their judgments.
Two physicians independently reviewed each case, unaware of each other's decisions. To reduce the problem of hindsight bias,20,21 we did not ask reviewers to make assessments of patient harm. They were also asked to make reasonable judgments of diagnostic performance, based strictly on data either already available or easily available at the time of the index clinic visit. For example, if a patient was seen because of uncomplicated low back pain and a diagnosis of musculoskeletal strain was made based on the results of clinical examination and x-ray films, even if a later magnetic resonance image showed advanced degenerative disk disease, the original decision would not be considered a diagnostic error. However, if further data were not gathered to rule out spinal cord compression in a patient with back pain along with numbness and weakness in the lower extremities and a magnetic resonance image confirmed cord compression a few days later, this would have been considered an error. All disagreements about errors were resolved by discussion among the reviewers at the end of data collection procedures. Using a taxonomy proposed recently by Schiff et al,7 one of the investigators (H.S.) categorized all identified diagnostic errors to better understand the etiologic factors that directly contributed to the error.
Data were analyzed using Excel (Microsoft Corp, Redmond, Wash) and SAS (SAS Institute, Cary, NC) software. Baseline variables and visit characteristics were compared using nonparametric (χ2 test, Fisher exact test, and Kruskall-Wallis test) and parametric (t test for 2 independent samples) methods. We accepted a type I error rate of .05 for tests of statistical significance. Agreement between raters for presence or absence of a diagnostic error on medical record review (before discussion to resolve differences) was assessed using the κ statistic. A κ value greater then .40 was used to denote good agreement. To assess comorbid illness burden in the patient sample, the Charlson index was calculated using VA administrative databases.22
A flowchart details the identification of visits in screened and control populations (Figure 1). Screen 1 was applied electronically to all 15 580 primary care visits, yielding 211 medical records (1.35%), of which 139 met criteria for detailed review. Screen 2 was applied to 5267 primary care visits, yielding 175 screen 2 positive visits (overall yield = 3.32%) for review after electronic exclusion of 58 screen 1 positive visits. A random sample of 199 control visits was chosen for review.
Age, race/ethnicity, and a history of psychiatric disease were similar across both screen and control populations. As expected, the Charlson index for 2 comorbidities or more was higher for screen 1, which represented the hospitalized cohort of patients (P = .005; data not shown). The overall positive predictive value (PPV) of screen 1 was 16.1% (34 confirmed diagnostic errors found on detailed medical record review after reaching consensus through reviewer discussion). When we applied the prespecified exclusion criteria (planned hospitalization found on brief medical record review), the PPV improved to 24.4%. Screen 1 also revealed 12 other clinical management errors, including errors in care such as inappropriate antibiotic use, failure to increase or decrease medication dosages, failure to prescribe a medication, and failure to monitor laboratory values. These errors were included only if unrelated to the primary diagnostic error. Review of screen 2 medical records yielded 17 confirmed diagnostic errors (PPV = 9.7%) and 13 other clinical management errors. There were 8 diagnostic errors (PPV = 4%) and 5 other clinical management errors in the control group. Consensus could not be reached in 2 cases (1 in each of the 2 screens), and these were not classified as diagnostic errors. No errors in diagnosis were discovered at telephone follow-up of 11 patients. The PPV for screen 1 and screen 2 to predict any error (diagnostic or clinical management related) was 21.8% (improved to 33% with exclusion criteria) and 17%, respectively, compared with 6.5% for the control group. Before discussion to resolve differences about errors, the κ values between reviewer pairs were 0.26, 0.39, and 0.19.
We found that 19.4% of return visits in screen 1 and 24.5% of return visits in screen 2 were prearranged by clinicians (P = .27) and that there was no difference in these visits between cases associated with diagnostic errors vs those that were not (21.5% vs 22.4%, respectively; P = .89). Table 1 gives patient and visit characteristics for the 314 screen-positive visits broken down by whether diagnostic errors were ultimately detected by detailed medical record review.
The mean time to return visit in screen 1 and screen 2 was not statistically different (6.1 vs 5.6 days, respectively; P = .07). The number of diagnostic errors identified by different intervals between index and return visits tended to plateau after the 7-day cutoff (Figure 2), lending some support to the period we chose for the screening technique. However, this 10-day cutoff would typically aid only in the detection of those primary care errors that manifest clinically within a short duration.
Table 2 gives examples of the types of diagnostic errors and other clinical management errors found in the screened population. Errors were found in a wide spectrum of diseases commonly seen in an outpatient general medical setting. Table 3 gives the primary and secondary factors found responsible for the error as discerned by medical record review using the taxonomy proposed by Schiff et al.7 The most common primary errors in the diagnostic process were failure or delay in eliciting information and misinterpretation or suboptimal weighing of critical pieces of data from the history and physical examination. The most common secondary errors were suboptimal weighing or prioritizing of diagnostic probabilities and failure to recognize urgency of illness or its complications. Other common primary and secondary errors included failure to order or delay in ordering needed tests or to follow up on test results.
We used an innovative computerized screening technique to identify occurrences of diagnostic errors in primary care visits at a large academic medical center. We designed and tested a simple algorithm that, when applied to numerous electronic medical records, was found to be a feasible and useful mechanism to select cases enriched in diagnostic errors for further medical record review. In addition, we used a recently proposed taxonomy of diagnostic errors to identify factors responsible for errors. Using this taxonomy, we found that problems with taking patient histories and physical examination findings were primarily responsible for most errors. Our study offers a new information technology–based tool that has potential future use in the identification and study of diagnostic errors in primary care.
Despite being frequent and expensive (accounting for about 40% of all malpractice payments, on average, about $300 000 in 2003),1,3,4 diagnostic errors have received little formal study until recently. There is a compelling need to understand their true nature and frequency. Our proposed screening tool can be used to select “high-risk” medical records as an alternative to other methods to detect these errors, most of which are limited in their usefulness.8,9,23 Electronic screening would require less reviewer time and expense compared with manual review systems such as random medical record reviews. Moreover, this technique takes advantage of routinely available administrative data in electronic form.
Most electronic screening strategies are focused on the study of adverse drug events and nosocomial infections.8,10,11,13,14,24- 26 Computerized free text search for trigger words has also been proposed as a method for screening electronically stored discharge summaries for the presence of adverse events.12,27 Our study is one of few that focuses on error detection using a trigger method in outpatient care25,26 and, to our knowledge, is the only one to specifically address diagnostic errors. The prevalence of outpatient errors is underestimated,28 and a need to detect errors in diagnosis and clinical management using a trigger method has recently been highlighted.14 Despite low PPV (16.1% and 9.7% for screens 1 and 2, respectively), our screening technique compares favorably with other studies using administratively available electronic screens (PPV, 12%-34%).29,30 In the outpatient setting, it is superior to electronic screening for detection of adverse drug events, where the overall reported PPV for computerized screening has ranged from 7.5% to 8.8%25,26 Use of electronic screening coupled with targeted medical record review could provide a foundation for understanding the epidemiology and prevention of diagnostic errors in primary care31 and lead to important advances in patient safety outside the setting of medication- and hospital-associated adverse events.
Most errors in our study were primarily attributed to problems in history taking and physical examination findings using a taxonomy of errors proposed by Schiff et al (Table 3).7 This may not be surprising because, despite impressive advances in diagnostic technology, numerous factors threaten the importance of the traditional history taking and physical examination.32 Although our study focused on trainees, it adds to the work of Graber et al,19 who found that faulty data synthesis was the most common problem identified in diagnostic errors. We believe that this new taxonomy likely categorized some of the defects in data synthesis elicited by Graber and colleagues into different anatomical breakdowns in history taking and physical examination, such as those in inaccurate interpretation or misinterpretation and suboptimal weighing of critical pieces of data from the history taken or the physical examination findings. We also found problems with assessment to be the most common secondary factors involved in further support of their work.
This technique also helped identify other clinical management errors; thus its potential applications extend beyond aiding in diagnostic error identification. An added strength was our ability to maintain complete blinding of the reviewers to the study hypothesis. A poststudy questionnaire of all 3 data collectors confirmed that none of them detected the presence or absence of screens and were unaware of the study objectives.
Results of our study should be interpreted with caution. Because of the study population (eg, veterans, predominantly male) and setting (resident outpatient teaching clinic), the findings may not be generalizable outside the VA and to nonteaching settings. Future large-scale studies using different cutoff times are needed to validate the use of such screens before surveillance for errors can be implemented in systems outside the VA setting. Such screens will inevitably miss some errors (as seen by the presence of errors even in control cases), especially errors related to missed diagnosis of chronic conditions that are only diagnosed over a prolonged time (eg, cancer) and errors related to follow-up of patients. They will also underestimate the error rate if any patients sought medical care outside the VA setting. Another limitation is the low κ statistic; however, other investigators have reported similar κ values in studies using similar methods.33 Agreement for diagnostic errors tends to be much lower for other types of errors. We addressed this problem by using 2 independent reviews followed by a discussion to resolve disagreements. To reduce hindsight bias,20,21 a well-known problem in the study of diagnostic errors, we focused on the presence of errors rather than adverse events. However, the study protocol did not prevent reviewers from identifying patient harm, which could affect their judgments. In addition, we did not interview providers as part of the analysis, which may have limited our ability to determine the breakdown point in the diagnostic process.
Electronic screening of primary care visits is a feasible method to select cases enriched in diagnostic errors for further medical record review and may offer benefits over previous methods. Its PPV is comparable to that of similar methods used to identify adverse drug events. Further studies are needed to demonstrate its usefulness in research studies, in quality measurement, and in identifying targets for quality improvement.
Correspondence: Hardeep Singh, MD, MPH, Michael E. DeBakey VA Medical Center (152), 2002 Holcombe Blvd, Houston, TX 77030 (firstname.lastname@example.org).
Accepted for Publication: October 17, 2006.
Author Contributions: Dr Singh had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: Singh, Thomas, and Petersen. Acquisition of data: Singh. Analysis and interpretation of data: Singh, Thomas, Khan, and Petersen. Drafting of the manuscript: Singh and Khan. Critical revision of the manuscript for important intellectual content: Singh, Thomas, Khan, and Petersen. Statistical analysis: Singh, Khan, and Petersen. Obtained funding: Singh and Petersen. Administrative, technical, and material support: Petersen. Study supervision: Thomas and Petersen.
Financial Disclosure: None reported.
Funding/Support: This study was supported by National Institutes of Health NIH K12 Mentored Clinical Investigator Award grant K12RR17665 to Baylor College of Medicine (Dr Singh); by grant 1PO1HS1154401 from the Agency for Healthcare Research and Quality (Dr Thomas); by Robert Wood Johnson Foundation Generalist Physician Faculty Scholar grant 045444 (Dr Petersen); and by American Heart Association Established Investigator Award grant 0540043N (Dr Petersen).
Role of the Sponsors: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; or preparation, review, or approval of the manuscript.
Disclaimer: The views expressed in this article are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs.
Previous Presentation: This study was presented as a poster at the 29th Annual Meeting of the Society of General Internal Medicine; April 28, 2006; Los Angeles, Calif.
Acknowledgment: We thank Julie de la Houssaye, BS, for assistance with obtaining data from the VA data warehouse, and Annie Bradford, MA, for assistance with technical writing.