Patient enrollment and randomization (primary outcome).
Apkon M, Mattera JA, Lin Z, Herrin J, Bradley EH, Carbone M, Holmboe ES, Gross CP, Selter JG, Rich AS, Krumholz HM. A Randomized Outpatient Trial of a Decision-Support Information Technology Tool. Arch Intern Med. 2005;165(20):2388–2394. doi:10.1001/archinte.165.20.2388
Decision-support information technology is often adopted to improve clinical decision making, but it is rarely rigorously evaluated. Congress mandated the evaluation of Problem-Knowledge Couplers (PKC Corp, Burlington, Vt), a decision-support tool proposed for the Department of Defense’s new health information network.
This was a patient-level randomized trial conducted at 2 military practices. A total of 936 patients were allocated to the intervention group and 966 to usual care. Couplers were applied before routine ambulatory clinic visits. The primary outcome was quality of care, which was assessed based on the total percentage of any of 24 health care quality process measures (opportunities to provide evidence-based care) that were fulfilled. Secondary outcomes included medical resources consumed within 60 days of enrollment and patient and provider satisfaction.
There were 4639 health care opportunities (2374 in the Coupler group and 2265 in the usual-care group), with no difference in the proportion of opportunities fulfilled (33.9% vs 30.7%; P = .12). Although there was a modest improvement in performance on screening/preventive measures, it was offset by poorer performance on some measures of acute care. Coupler patients used more laboratory and pharmacy resources than usual-care patients (logarithmic mean difference, $71). No difference in patient satisfaction was observed between groups, and provider satisfaction was mixed.
This study provides no strong evidence to support the utility of this decision-support tool, but it demonstrates the value of rigorous evaluation of decision-support information technology.
Medical decision making integrates patient-specific data with medical knowledge under conditions of uncertainty. Computerized decision-support information technology (DSIT) tools are programs that help caregivers close gaps between knowledge and performance1,2 using various approaches, including alerts, feedback, interpretation, prognostic tools, and diagnostic aids. Although these tools are proliferating,3 few studies4- 6 have rigorously evaluated their impact on ambulatory care. The DSIT tools often escape the scrutiny applied to drugs or devices,7,8 yet they can be costly and can cause both harm9,10 and benefit.3,11
The Department of Defense is incorporating the DSIT tool Problem-Knowledge Couplers (PKC Corp, Burlington, Vt) into the second generation of its computerized medical record.12 To date, the Department of Defense has spent more than $15 million licensing this tool for use in pilot programs across the Military Health System,13 which provides health care to 8.9 million people in 75 military hospitals and more than 450 clinics. Couplers uses structured questions based on the patient’s chief complaint to elicit information from the patient and the provider. That information is linked to a proprietary database of medical knowledge that generates suggestions for appropriate patient care strategies, including options relevant to the patient’s diagnosis and treatment.14,15 To satisfy a congressional mandate to evaluate Couplers before widespread implementation, we conducted a randomized controlled trial (RCT) of the effect of this DSIT tool in ambulatory practice, examining its effect on quality, resource consumption, and patient and provider satisfaction.
A patient-level RCT was conducted to compare the impact of Couplers with that of usual care at 2 military treatment facilities (Ireland Army Community Hospital and Clinics, Fort Knox, Ky, and Mayport Branch Health Clinic, Mayport, Fla). We also evaluated historical control groups at each facility and a contemporaneous control group from a clinic not participating in the RCT. The facilities were chosen by the Department of Defense because of their strong leadership support for Couplers and ambulatory practices consistent with the Military Health System optimization model16 specifying 2 examination rooms and 3.5 support staff per primary caregiver.
Patients aged 18 years and older were eligible if they had scheduled appointments, could speak and read English, had not participated in Coupler sessions, were not scheduled for obstetric care, and had no emergency medical conditions. The study was approved by the institutional review boards of Walter Reed Army Medical Center, Washington, DC; National Naval Medical Center, Bethesda, Md; and Yale University. Informed consent was obtained from all participating patients and providers.
Couplers are available for a wide range of preventive health care needs and patient complaints, including common conditions. Patients randomized to use Couplers completed the one appropriate for their specific complaint or, when no condition-specific Coupler was appropriate, a generic History and Screening Coupler, replicating the Military Health System’s intended use for Couplers. Patients entered their medical histories into the Coupler tool with the assistance of a coordinator who was not associated with the study; 30 minutes was allocated for that process. Providers treating Coupler patients could enter additional information before reviewing Coupler outputs outlining diagnosis or treatment options. Patients in the usual-care group had no exposure to Couplers.
The primary study outcome, quality of care, was assessed using 24 health care process measures grouped into 2 categories: screening/prevention and acute/chronic disease management. Measures (“opportunities for quality care”) were selected before initiating the study and were drawn from recommendations of quality-focused national organizations, such as the Agency for Healthcare Research and Quality and the US Preventive Services Task Force. For each patient, the processes (or opportunities) indicated at the index visit were tabulated based on the patient’s medical characteristics. Structured medical record abstraction and data from the Military Health System’s electronic medical system (Composite Healthcare System) were used to determine whether each of these opportunities was satisfied (“fulfilled”) within 60 days of the index visit. The primary outcome was the overall proportion of opportunities fulfilled in each study group.
Resources consumed within 60 days of the index visit were determined from the Composite Healthcare System in 4 areas: ambulatory visits, laboratory testing, diagnostic imaging, and pharmacy use. We determined dollar values for the first 3 areas from the Centers for Medicare & Medicaid Services’ 2003 fee schedule using the relative value unit conversion rate of $36.7856.17 Where not coded in the Composite Healthcare System, Current Procedural Terminology codes were assigned for the midlevel service for each visit type. Drug values were based on the generic equivalent (when available), quantities dispensed, and average wholesale prices found in the 2003 Red Book database (Thomson Micromedex, Greenwood Village, Colo).
Patient satisfaction was assessed using a modification of an outpatient satisfaction survey developed by Press Ganey Associates Inc (South Bend, Ind)18 distributed to participants at their index visit in sealed, numbered envelopes to mask participants to the survey during the visit. Follow-up telephone calls were made when surveys were not returned within 10 business days. Patients who did not complete the survey within 30 days were considered “lost to follow-up” for assessing satisfaction.
We assessed provider satisfaction using a structured survey that asked whether they agreed that Couplers had a positive impact on 8 areas: (1) quality of care, (2) medical decision making (including impact on taking histories, conducting physical examinations, formulating diagnoses, and clinical management), (3) other benefits to patients, (4) patient satisfaction (as reported by providers), (5) patient-provider interaction, (6) time required for patient care, (7) quality of the medical knowledge base, and (8) software design and user interface.
We calculated the number of patients required to detect differences in fulfillment rates between the Coupler and usual care groups of 5.0% with a power of 80% at α = .05. Calculations used a method19 that accounted for clustering of outcome measures within patients assuming worst-case baseline fulfillment rates of 50%, opportunities found for each patient in a historical sample at the clinics, and an interpatient correlation of fulfillment rates of 0.3. We calculated needing 1680 patients (840 per group).
To determine whether providers’ exposure to Couplers may have affected the care of usual-care group patients, end points were also evaluated in a separate but similar clinic at one of the sites (the “external control” clinic). In the external control clinic, providers were not exposed to Couplers, although they knew that their patients may have been included in the quality measurement study. We also assessed all 3 clinics’ performance before the study by examining the medical records of a sample of historical patients at each (“historical control” groups) to account for baseline differences in quality between the study and external control clinics. Patients in the external control and historical control groups were randomly selected based on appointment times.
We tested the primary hypothesis that fulfillment rates differed between study groups by comparing the likelihood of fulfillment using a Mantel-Haenszel χ2 test of homogeneity, stratified by physician and adjusted for clustering by patient.20 Second, we evaluated differences in fulfillment rates between study groups for screening/prevention opportunities combined, for acute/chronic opportunities combined, and for individual opportunities. To account for clustering of outcomes within patients, we used cluster-adjusted χ2 tests but stratified by clinic given the inadequate number of individual opportunity types per physician to stratify by physician.19
To account for differences in patient characteristics and to improve precision of estimates of effects, we tested the effectiveness of Couplers using a 3-level hierarchical logistic regression model incorporating opportunity type and patient and provider characteristics, including group assignment (Coupler or usual care) as a patient characteristic.
We evaluated differences in median resource use in each category and overall using the Wilcoxon rank sum test of equality of distribution. We developed a hierarchical linear regression model with the logarithm of total resource use as the outcome to account for a skewed resource distribution. We examined differences in patient satisfaction between groups using summary scales for 4 domains (during visit, care provider, personal issues, and overall assessment). We calculated reliability estimates (Cronbach α) for each domain scale and compared the scales across study groups using Wilcoxon rank sum tests of equality of distribution. Physicians were also surveyed about the tool. All statistical calculations were performed using Stata 8 (StataCorp, College Station, Tex) and HLM 5 (Scientific Software International Inc, Lincolnwood, Ill).
A total of 2769 patients were screened at the 2 sites between April 22, 2004, and December 31, 2002 (Figure). Of these patients, 611 (22.1%) did not meet the eligibility criteria, 252 (9.1%) refused participation, and 4 (0.1%) were removed from the study at their request, resulting in a total of 1902 patients (88.1% of 2158 eligible patients) enrolled and randomized: 936 (49.2%) in the Coupler group and 966 (50.8%) in the usual-care group. Site 1 enrolled 998 patients (480 in the Coupler group and 518 in the usual-care group), and site 2 enrolled 904 patients (456 in the Coupler group and 448 in the usual-care group). Not all patients had a valid opportunity type for inclusion in the primary analysis; those with at least 1 valid opportunity (721 in the Coupler group and 704 in the usual-care group) differed from those with none: they were younger (P<.01), were more likely to be female (P<.001), and more often had acute or routine visit types (P<.001). Patients were analyzed on an intention-to-treat basis. There were no reported adverse events associated with this study.
The characteristics of the 1902 patients by randomization status and site are given in Table 1. The mean age, sex, and visit type were similar between the Coupler and usual-care groups. There were slightly more active-duty status patients in the usual-care group (44.0% vs 38.6%). The proportion of missing or incomplete medical records was similar in both groups (16%) and was largely due to patients moving away from study sites or maintaining possession of their medical records. The mean number of opportunities per patient, the proportion of patients with at least 1 screening/prevention opportunity, and the proportion with at least 1 acute/chronic opportunity were similar in the 2 groups.
There were 4639 health care opportunities (2374 in the Coupler group and 2265 in the usual-care group) among study patients at their index visit. There was no significant difference in the proportion of health care opportunities fulfilled in the Coupler and usual-care groups (33.9% vs 30.7%; P = .12) (Table 2). The hierarchical multivariable analysis (adjusting for age, sex, military status, visit type, opportunity type, and site) also found no significant effect of assignment to the Coupler group on the probability of health care opportunity fulfillment (odds ratio for opportunities being fulfilled in the Coupler group compared with the usual-care group: 1.14; 95% confidence interval, 0.95-1.38; P = .16) (Table 3).
In the analysis by opportunity subgroup (screening/prevention or acute/chronic), there were slightly more screening/prevention opportunities fulfilled in the Coupler group vs the usual-care group (34.8% vs 30.4%; P = .03), with dietary and exercise counseling being primarily responsible for the difference (Table 2). In contrast to screening/prevention opportunities, there were slightly fewer acute/chronic opportunities fulfilled in the Coupler group vs the usual-care group (27.7% vs 32.6%), although this difference was not significant. In the analysis by opportunity, 5 opportunities showed significant differences (Table 2).
Comparisons between a sample of historical patients at the site 2 study clinic and historical patients at the site 2 external control clinic found no differences in patient characteristics, number of opportunities, or opportunity fulfillment. However, the proportion of total opportunities fulfilled in the period before the study was much lower for patients in the RCT and external control clinics (12.2% and 16.5%, respectively) compared with the study period RCT clinic usual-care group (42.5%) and external control clinic (45.7%). The proportion of total opportunities fulfilled in the previous period was also lower at site 1, although the increase during the study period was more modest than at site 2 (10.9%-18.6%).
Queries of the Composite Healthcare System database identified data for 92.5% of the study patients, with equal proportions of patients missing data across groups. No differences were found in the costs (in dollars) associated with ambulatory visits and radiographic evaluation (Table 4). However, Coupler group patients used more laboratory and pharmacy resources than usual-care group patients. Aggregate costs across the 4 categories were also higher for the Coupler group, with the median cost per Coupler group patient being $100 higher than per usual-care group patient. This estimate did not include direct expenses associated with Coupler use. The average Coupler session took approximately 18 minutes of employee time to coordinate. The logarithmic mean difference between the groups was $71. Multivariable analysis using logarithmic cost as the outcome showed a significant main effect of treatment, with Coupler group patients using a logarithmic mean difference of $46 more than usual-care group patients.
Satisfaction surveys were returned by mail or solicited by telephone for 1573 patients (82.7%): 72.2% at site 1 and 92.6% at site 2, with similar response rates between the Coupler and usual-care groups (83.4% vs 82.0%). There were no significant differences in patient satisfaction survey results in any domains of satisfaction as determined using the Wilcoxon rank sum test or multivariable models (Table 5).
We surveyed each provider in the study (n = 12 [8 physicians, 3 physician assistants, and 1 nurse practitioner; 8 at site 1 and 4 at site 2]) to assess his or her satisfaction with Couplers. The strongest level of perceived satisfaction related to information quality: 75% agreed or strongly agreed that Couplers provides high-quality information. The strongest level of dissatisfaction related to time use, with 83% disagreeing or strongly disagreeing that Coupler use involves acceptable amounts of time. More than half of the providers also disagreed with the statements of benefits for medical decision making (70%), improved provider-patient interactions (61%), and overall benefits to patients (70%).
In this study, implementing a specific DSIT tool in a primary care setting did not substantially improve quality of care, decrease resource consumption, or improve satisfaction of patients or providers. In secondary analyses, the modest improvement in screening/prevention opportunities was offset by lower rates of acute/chronic opportunities. The results indicate the importance of a thorough evaluation of DSIT tools before assuming that they are effective based on face validity.
The relevance of any DSIT tool evaluation depends on the intended use of the tool. We designed this study in coordination with the developers of the tool to ensure that we were evaluating meaningful outcomes. Measurement across these 24 specific process measures allowed us to examine quality in a setting of diverse patients, diagnoses, and activities typical of ambulatory medical practice. Our method of aggregating unweighted measurements was recently used by McGlynn et al21 to measure the effectiveness of health care delivery in large populations.
Few studies of DSIT tools exist, yet the use of these tools proliferates. With the expansion of electronic health records, the integration of such tools into practice will likely accelerate. Past studies have often focused on validating tools’ suggestions by comparing their performance with that of experts using case simulations22 or by examining the impact on the diagnostic accuracy of users.23 Several studies have shown that DSIT tools may improve a physician’s diagnostic performance,24,25 depending on characteristics of the case, the provider, and other factors.26 However, most studies focused on limited numbers of diagnoses or problems and evaluated a tool’s efficacy rather than its effectiveness influenced by human factors or its impact on other dimensions of care.
This study has several issues to consider. Providers in the study cared for both intervention and control patients. However, using a historical control cohort and a concurrent control clinic, we found no evidence that the lack of effect was due to an effect on the care of the control patients. There was a significant improvement in care for the external control and study clinics at site 2. This could reflect the knowledge that performance was being measured, or it could reflect some other change in clinic operations. Improvement of this magnitude did not occur at site 1. Without the historical control, it would not have been possible to exclude Coupler use in some patients, affecting the care of all patients. Without the external controls, the improvement compared with the historical controls could be erroneously used to corroborate that possibility. A related potential limitation of using only 2 study sites is reflected in the different underlying percentages of fulfilled opportunities. However, the effect of the intervention was consistent across sites. Although we documented a limited benefit of Couplers, we cannot exclude the possibility that it may have facilitated the diagnosis of rare conditions or enhanced the performance of less experienced physicians or nonphysicians in clinical roles. Finally, this study was conducted at military facilities, so the findings may not be generalizable. Nevertheless, we believe that this setting, with the strong administrative support and optimized clinical environment, would have been ideal to show the benefit of the intervention.
This study serves as a paradigm for measuring quality of care in a multidimensional manner across a diverse ambulatory practice and for evaluating the impact of a complex intervention, such as DSIT tool deployment, on that quality. Such a multidimensional evaluation is critical given that direct effects on clinical quality and process reliability are but one measure of the benefits and costs of interventions such as DSIT tool implementation, new protocol introduction, new educational initiatives, or workflow redesign efforts.4 Despite the belief that DSIT tools will improve quality and safety, their cost and potential for unintended effects support the need for rigorous evaluations demonstrating positive effects on care and outcomes. These evaluations will ensure that resources are best allocated and that the promise of new technologies is achieved.
Correspondence: Harlan M. Krumholz, MD, SM, 333 Cedar St, PO Box 208088, New Haven, CT 06520-8088 (email@example.com).
Accepted for Publication: July 7, 2004.
Author Contributions: The authors had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Financial Disclosure: None.
Funding/Support: This study was funded by contract V549P-5897 as part of IAW Public Law 104-262 and the Veterans Health Care Eligibility Reform Act of 1996 (38 USC 8151-8153).
Role of the Sponsor: TRICARE approved the study design and reviewed the manuscript but was not involved in the conduct of the study; the collection, management, analysis, and interpretation of the data; or the preparation of the manuscript. TRICARE approval of the manuscript was not required.
Acknowledgment: We acknowledge the assistance of Colonel Kenneth Hoffman, project manager at TRICARE Management Activity, and the leadership, providers, and staff at Ireland Army Community Hospital and Clinics and Mayport Branch Health Clinic, who worked patiently with our staff to complete this study.