Flowchart of physicians through the trial. ECG indicates electrocardiography; GPs, general practitioners.
Effect of patient-specific ratings vs conventional guidelines on correct testing decisions based on 2 independent expert panels (between-arm comparisons). AHA/ESC indicates American Heart Association/European Society of Cardiology; CI, confidence interval; ECG, electrocardiography; and OR, odds ratio.
Effect within age, sex, and race/ethnicity of patient vignettes and of physician specialty. CI indicates confidence interval; ECG, electrocardiography; GPs, general practitioners; and OR, odds ratio.
Change in decision making before vs after the intervention, with P values for interactions. CI indicates confidence interval; ECG, electrocardiography; and OR, odds ratio.
Junghans C, Feder G, Timmis AD, Eldridge S, Sekhri N, Black N, Shekelle P, Hemingway H. Effect of Patient-Specific Ratings vs Conventional Guidelines on Investigation Decisions in AnginaAppropriateness of Referral and Investigation in Angina (ARIA) Trial. Arch Intern Med. 2007;167(2):195-202. doi:10.1001/archinte.167.2.195
Copyright 2007 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.2007
Conventional guidelines have limited effect on changing physicians' test ordering. We sought to determine the effect of patient-specific ratings vs conventional guidelines on appropriate investigation of angina.
Randomized controlled trial of 145 physicians receiving patient-specific ratings (online prompt stating whether the specific vignette was considered appropriate or inappropriate for investigation, with access to detailed information on how the ratings were derived) and 147 physicians receiving conventional guidelines from the American Heart Association and the European Society of Cardiology. Physicians made recommendations on 12 Web-based patient vignettes before and on 12 vignettes after these interventions. The outcome was the proportion of appropriate investigative decisions as defined by 2 independent expert panels.
Decisions for exercise electrocardiography were more appropriate with patient-specific ratings (819/1491 [55%]) compared with conventional guidelines (648/1488 [44%]) (odds ratio [OR], 1.57; 95% confidence interval [CI], 1.36-1.82). The effect was stronger for angiography (1274/1595 [80%] with patient-specific ratings compared with 1009/1576 [64%] with conventional guidelines [OR, 2.24; 95% CI, 1.90-2.62]). Within-arm comparisons confirmed that conventional guidelines had no effect but that patient-specific ratings significantly changed physicians' decisions toward appropriate recommendations for exercise electrocardiography (55% vs 42%; OR, 2.62; 95% CI, 2.14-3.22) and for angiography (80% vs 65%; OR, 2.10; 95% CI, 1.79-2.47). These effects were robust to physician specialty (cardiologists and general practitioners) and to vignette characteristics, including older age, female sex, and nonwhite race/ethnicity.
Patient-specific ratings, unlike conventional guidelines, changed physician testing behavior and have the potential to reduce practice variations and to increase the appropriate use of investigation.
Medical societies1 and governments2,3 have made a large investment in developing and implementing clinical practice guidelines to reduce variation in physician behavior and to make medical care more cost-effective. Despite this investment, the effect of conventional guidelines on changing physician behavior is modest when tested in controlled investigations4 and in the context of routine practice.5,6 In coronary heart disease, numerous guidelines address investigation strategies, but few data suggest that investigation guidelines change practice.7 Many patients with angina do not undergo adequate investigation, and the prognostic effect of inequitable underuse among women and older patients is a concern.8,9
A recent meta-analysis10 of decision support interventions identified the use of specific recommendations, a computerized format, and integration into the clinician work flow as 3 factors associated with changing physician behavior. Patient-specific ratings offer these features, as they are tailored to individual patient characteristics and are easily incorporated into the consultation in a computerized format (eg, as online prompts) with unambiguous recommendations that a test is appropriate or inappropriate for each patient. Patient-specific ratings of the appropriateness of investigation are derived using explicit methods11 that take into account the evidence base and incorporate the judgments of generalists and specialists.
The specific objective of the Appropriateness of Referral and Investigation in Angina (ARIA) Trial was to determine the extent to which patient-specific ratings change physician testing behavior compared with conventional guidelines. We randomized physicians to patient-specific ratings or to conventional guidelines, and recommendations for exercise electrocardiography (ECG) and angiography were assessed. We used patient vignettes to control for case-mix variation.12
The patient-specific ratings consisted of online prompts stating whether the specific vignette was considered appropriate or inappropriate for investigation. Access to detailed information on how the ratings were derived was included.
Physicians were eligible to participate in the trial if they were members of the British Cardiac Society (1032 cardiologists) or were general practitioners (2206 physicians, limited to 1 per practice) in the primary care trusts referring to 9 cardiothoracic centers in England and Scotland. Eligible physicians were invited to participate online. Physicians were randomized to an intervention of patient-specific ratings derived by 2 independent expert panels or by conventional American Heart Association13 and European Society of Cardiology14 guidelines. Each physician was asked to judge the appropriateness of exercise ECG or angiography in 12 Web-based patient vignettes before the intervention and then in 12 vignettes after the intervention (Figure 1). Physicians were told that this was a study of decision support but not that it was a randomized trial of different types of decision support. Physicians were reimbursed for their participation. At registration, physicians were asked about details of their practice (Table 1). The study was carried out between September 1, 2004, and June 29, 2005.
We developed 48 unique vignettes of patients with suspected or confirmed angina based on unique combinations of clinical factors (indications). Each indication had previously been rated for the appropriateness of exercise ECG and angiography by 2 expert panels independently, using the RAND/University of California at Los Angeles method.11 The panels each comprised 5 cardiologists, 1 cardiothoracic surgeon, and 5 general practitioners with an interest in cardiology. These 22 physicians were recruited from the same 9 centers on the basis of expertise (15 had published in the field) and peer nomination. They were not eligible for the trial (further details are available at http://www.ucl.ac.uk/ceg/studies). We chose only indications that were rated in the same appropriateness category by both panels, providing a more reliable definition of appropriateness. To resemble real-life patients, the vignette narratives described these clinical factors in free text (See the boxed example on page 198),15 included information on patient occupation and comorbidity, and represented a wide range of clinical and demographic characteristics, as summarized in Table 2.
We divided the 48 vignettes into 4 blocks of 12 and then created sequences of 2 blocks (1 preintervention and 1 postintervention) with evenly distributed clinical and demographic factors from all possible combinations of the 4 blocks (12 sequences of 2 blocks). The sequence of vignettes was random within each block. Participants in both arms were given 2 blocks of 12 vignettes following randomization (1 before the intervention and, on completion, 1 with the intervention. They were asked to make recommendations for investigation using a 5-point scale (1, definitely do; 2, probably do; 3, unsure; 4, probably do not do; or 5, definitely do not do).
The unit of randomization was individual physicians. A research assistant randomized 363 physicians using minimization software to balance recruitment by the 9 centers and the 2 clinical specialties. Physicians received an intervention after they completed the first set of 12 vignettes without decision support. In the conventional guidelines arm, physicians were automatically provided online guideline paragraphs most relevant to each vignette, as well as links to the full-text guidelines. The physicians were free to use any of the guidelines, and their use was recorded. In the patient-specific ratings arm, physicians were automatically provided online ratings applicable to the vignette, ranging from 1 to 9 (1-3, inappropriate; 4-6, equivocal; or 7-9, appropriate) for exercise ECG or angiography. For example, if the rating for exercise ECG was 7 for that particular patient vignette, the electronic prompt to physicians was “the expert panels recommend exercise testing (rating 7)”. Physicians were also given access to detailed information on how the ratings were derived.
The primary outcome for all test-ordering decisions was agreement of physicians' recommendations with those made by the 2 independent expert panels. Agreement was defined by a physician recommending definitely or probably doing a test rated appropriate by the panels or by recommending definitely or probably not doing a test rated inappropriate. An unsure recommendation was interpreted as disagreement.
As a secondary outcome, we investigated the agreement of physicians' recommendations with those of the American Heart Association and the European Society of Cardiology guidelines, which are based on the pretest probability of coronary artery disease. The guidelines recommend that exercise ECG be performed if the pretest probability is intermediate (20%-80%) and not be performed if the pretest probability is lower than 20% or higher than 80%. In the subset of 25 vignettes without confirmed coronary artery disease or previous exercise ECG results, we assessed the pretest probability of coronary artery disease using the Duke score16 based on the vignette characteristics of age, sex, smoking, cholesterol levels, resting ECG changes, typicality of chest pain, and presence of diabetes mellitus. The pretest probability score allowed each of these 25 vignettes to be categorized according to American Heart Association and European Society of Cardiology recommendations for exercise ECG.
We estimated that 140 physicians were required in each arm to detect an odds ratio (OR) of 1.50 for the effect of patient-specific ratings vs conventional guidelines, assuming that agreement with the panel was found in 60% of decisions in the conventional guidelines arm, with 80% power and at 5% significance. Because individual recommendations may be correlated for individual physicians (ie, clustered) or for individual vignettes (ie, nonindependent), the effective sample size may be less than the total number of decisions analyzed. To take this into account in the sample size calculation, we assumed intracluster correlation coefficients of 0.06 between physicians and 0.04 between vignettes.
The unit of analysis was physician recommendations on each vignette, and we used random-effects logistic regression analysis allowing for intracluster correlation. Between physicians, intracluster correlation coefficients for recommendations in agreement with patient-specific ratings were 0.06 for exercise ECG and less than 0.01 for angiography; for recommendations in agreement with conventional guidelines, they were less than 0.01. Between vignettes, intracluster correlation coefficients were negligible, so we allowed for clustering by physician only. We calculated ORs (95% confidence intervals [CIs]) comparing agreement between the recommendations made by trial physicians and the 2 outcomes between trial arms and within trial arms, using physicians as their own controls. We investigated whether effects differed in prespecified subgroups (physician specialty and age, sex, and race/ethnicity of the patient in the vignette) by fitting interaction terms. We excluded from the main analysis 71 physicians who did not complete the first 12 vignettes. We also examined the volume of appropriately recommended tests in both arms following the intervention. Analysis was conducted blind to the intervention using STATA 8.0 (StataCorp LP, College Station, Tex).
Cardiologists and general practitioners did not differ between trial arms with respect to practice characteristics (Table 1) or region. Physicians who registered but did not complete the first 12 vignettes did not differ in practice characteristics compared with those who did. The 4 blocks of 12 vignettes were equally distributed by intervention and by physician specialty. Missing recommendations represented 2% (134/6072) of exercise ECG decisions and 3% (194/6485) of angiography decisions and did not differ by intervention arm or by physician specialty. Therefore, we analyzed 5938 decisions for exercise ECG and 6291 decisions for angiography.
Before the intervention, decisions about the first 12 vignettes were no more appropriate in one arm than in the other: for exercise ECG, appropriate decisions were 619 (42%) of 1475 in the patient-specific ratings arm and 633 (43%) of 1484 in the conventional guidelines arm; for angiography, the corresponding figures were 1018 (65%) of 1557 and 1015 (65%) of 1563. Figure 2 shows that after the intervention physicians receiving patient-specific ratings were more likely to reach appropriate decisions about the second set of 12 vignettes for exercise ECG (54.9% vs 43.5%; OR, 1.57; 95% CI, 1.36-1.82) and for angiography (79.9% vs 64.0%; OR, 2.24; 95% CI, 1.90-2.62) compared with those receiving conventional guidelines. The effect of patient-specific ratings on changing test-ordering behavior was consistent across vignettes with different age, sex, and race/ethnicity variables and by physician specialty (Figure 3). Decisions in the patient-specific ratings arm vs the conventional guidelines arm did not differ (OR, 1.15; 95% CI, 0.93-1.41) for the secondary outcome (based on the Duke score for exercise ECG).
Overall, the use of patient-specific ratings led to an increase in the volume of appropriate tests ordered compared with the use of conventional guidelines. These values were 699 (84%) of 831 vs 541 (66%) of 817 for exercise ECG (P<.001) and 732 (81%) of 906 vs 604 (68%) of 890 for angiography (P<.001).
Figure 4 shows within-arm comparisons before and after the intervention. Physicians receiving patient-specific ratings made more appropriate decisions on exercise ECG (OR, 2.62; 95% CI, 2.14-3.22) and on angiography (OR, 2.10; 95% CI ,1.79-2.47) following the intervention, while conventional guidelines did not lead to an increase in appropriate decisions. Decisions within the patient-specific ratings arm (OR, 1.13; 95% CI, 0.91-1.37) and within the conventional guidelines arm (OR, 1.05; 95% CI, 0.87-1.26) did not change for the secondary outcome (based on the Duke score for exercise ECG). At baseline among patients in whom angiography was deemed appropriate by the expert panels, women seemed to be less likely to be recommended for angiography than men before the intervention (OR, 0.72; 95% CI, 0.54-0.97). This apparent sex inequity was attenuated by the patient-specific ratings intervention (OR, 0.94; 95% CI, 0.67-1.32). This was also the case for exercise ECG decisions (baseline: OR, 0.83; 95% CI, 0.58-1.19; postintervention: OR, 0.93; 95% CI, 0.70-1.23) and for angiography decisions (baseline: OR, 0.61; 95% CI, 0.40-0.92; postintervention: OR, 0.74; 95% CI, 0.45-1.12) in vignettes of nonwhite race/ethnicity.
In this multicenter trial, physicians randomized to patient-specific ratings changed their test-ordering decisions markedly, whereas those given conventional guidelines did not. To our knowledge, this is the first trial comparing computerized patient-specific ratings vs conventional guidelines in investigation decisions or in any aspect of cardiovascular disease. Based on more than 10 000 investigation decisions in the management of suspected or confirmed angina, these findings suggest that patient-specific ratings are a promising novel technology for improving appropriate investigation among patients with angina in primary and secondary care.
Chest pain in ambulatory care is probably the most common initial presentation of coronary artery disease, and early accurate diagnosis is important to prevent progression to acute coronary syndromes.17 However, symptoms of angina in the general population are commonly not diagnosed18 or are treated empirically in the absence of confirmatory investigation.8 Even among patients being assessed by cardiologists, there is an appreciable coronary event rate among patients who are told that their pain is noncardiac.19 To our knowledge, there are no randomized controlled trials of different investigation strategies among patients with chest pain; hence, there are no gold standard recommendations.
The primary outcome used in the ARIA Trial was the agreement between trial physicians' recommendations and those made earlier by 2 independent expert panels. This outcome has the advantages of reliability (both panels agreed) and face validity (it incorporated the judgments of generalists and specialists). Furthermore, patient-specific ratings have been shown to have prognostic validity for various procedures such as knee or hip replacement20 and revascularization21 (patients had better outcomes if they received the appropriate intervention compared with those who did not).
A secondary outcome, applicable to a subset of decisions, was based on the guideline recommendation that exercise ECG should be carried out among patients with intermediate pretest probability of coronary disease. We chose this outcome for a sensitivity analysis to test the possibility that physicians in the conventional guidelines arm might be more likely to change when judged by the guideline standard, based on the Duke score. No such effect was observed. Instead we observed a small (15%) nonsignificant advantage in the patient-specific ratings arm. This lack of effect on the outcome based on the Duke score is not surprising, as there was only moderate agreement with appropriateness based on expert panel ratings. Therefore, one third of 18 vignettes rated appropriate for exercise ECG had low or high pretest probability based on the Duke score. This lack of agreement is not unexpected: different variables are considered (the Duke score does not take into account angina severity), and (crucially) the Duke score was developed in a tertiary care setting and is known to overestimate the probability of disease in primary care.22
We observed a marked effect of patient-specific ratings on physician test-ordering behavior among cardiologists and general practitioners, suggesting that the ratings were universally applicable, clinically credible, and unambiguous. Our findings are consistent with a previous trial on paper-based decision support in treatment decisions for back pain.23 In our study, the effect of patient-specific ratings was consistent across vignettes of women, older patients, and different racial/ethnic groups. Among those in whom investigation was deemed appropriate by the expert panels, women and patients from ethnic minorities were less likely to be recommended for testing before the intervention. These differences were attenuated and became nonsignificant following the patient-specific ratings intervention. Taken together, these findings lend further support to the superiority of patient-specific ratings over conventional guidelines in changing physician behavior and offer a potential means to address inequitable underuse of investigation.
Previous interventions that included guidelines, audit, and feedback have had modest effects on test ordering and have focused largely on the overuse of investigation.24,25 The limited effect of guidelines on testing behavior may reflect difficulties in applying recommendations made among broad groups of patients to individuals, as well as the complexity and sheer volume of information contained in the guidelines. While in our study patient-specific ratings were readily available for each vignette in the patient-specific ratings arm, guidelines (even summarized and electronically accessible) had to be screened for information by participants in the conventional guidelines arm. Also, the same patient may appear under different sections of a single guideline, with potentially conflicting recommendations.
The lack of improvement in the management of patients with angina or asthma found in a randomized comparison of computerized guidelines vs no decision support was attributed to the poor uptake of the guidelines.7 Despite attempts on our part to enhance the uptake of guidelines, including providing relevant sections and readily accessible full-text guidelines, and despite 86% of participants accessing the guidelines during their decision-making process, we likewise found no change in management with the use of guidelines.
The strengths of our multicenter study lie in a rigorous trial design, the large number of clinicians from primary and secondary care, and a well-designed intervention applicable to everyday clinical decision making. The main limitation of our trial is the use of patient vignettes. However, they offer the advantage of standardizing the patient characteristics across trial arms, largely removing the possibility that differences in patients might confound the trial findings. The validity of patient vignettes has been established in the evaluation of diagnostic accuracy, physical examination, and actual clinical practice in outpatient settings and primary care.12 In addition, vignettes have been successfully used to explore physician behavior in ordering investigation and procedures in real-life practice.26,27
The positive findings in the ARIA Trial strengthen the case for randomized trials in real-life patient decision making, as we cannot exclude the possibility that the effectiveness of the intervention is altered by patient, physician, or system features in actual practice.28 We also cannot exclude the possibility that the physicians who volunteered to participate in the trial were an unrepresentative group and were more likely to change their practice with the patient-specific ratings intervention than physicians who did not volunteer. However, this is unlikely because their characteristics are representative of all physicians in participating centers. The electronic health record, with its capacity for embedded decision support, offers a route to implementing patient-specific ratings in routine patient care. For exercise ECG, 55% of decisions agreed with the expert panels in the patient-specific ratings group. Strengthening the empirical evidence base, particularly with studies in unselected patients in primary care, may further improve the validity of patient-specific ratings and agreement. Our findings revealed an increase in the number of appropriate tests ordered with patient-specific ratings, which could address the underuse of revascularization,21 for which angiography is a prerequisite in patients with angina. For policy makers, patient-specific recommendations provide a benchmark against which to measure health care inequalities and quality of care.
Patient-specific ratings substantially changed physician testing behavior, but conventional guidelines did not. Despite the challenge of an increasingly complex patient population29 and the inherent limitations of any form of clinical guidance, these patient-specific ratings represent (to our knowledge) the most meticulously tested and systematic attempt to guide physician practice, and they offer a promising intervention for implementation in routine care.
Correspondence: Harry Hemingway, FRCP, Department of Epidemiology and Public Health, University College London Medical School, 1-19 Torrington Pl, London WC1E 6BT, England (email@example.com).
Author Contributions: Drs Junghans and Eldridge had full access to all of the data; Drs Junghans and Hemingway take responsibility for the integrity of the data and the accuracy of the data analysis for this study. Study concept and design: Junghans, Feder, Timmis, Eldridge, Sekhri, Black, Shekelle, and Hemingway. Acquisition of data: Junghans and Feder. Analysis and interpretation of data: Junghans, Feder, Eldridge, Black, and Shekelle. Drafting of the manuscript: Junghans, Feder, and Hemingway. Critical revision of the manuscript for important intellectual content: Junghans, Feder, Timmis, Eldridge, Sekhri, Black, Shekelle, and Hemingway. Statistical analysis: Junghans and Eldridge. Obtained funding: Junghans, Feder, Timmis, Eldridge, Black, Shekelle, and Hemingway. Administrative, technical, and material support: Eldridge and Sekhri. Study supervision: Feder, Black, and Hemingway.
Financial Disclosure: None reported.
Funding/Support: This study was supported in part by the NHS Service Delivery and Organisation Research and Development Programme and the Department of Health. Dr Hemingway is supported by a Public Health Career Scientist Award from the Department of Health.
Role of the Sponsors: The sponsors had no input in the design or conduct of the study; the collection, management, analysis, or interpretation of the data; or the preparation, review, or approval of the manuscript.
Acknowledgment: We acknowledge the late Sarah Cotter, PhD, who contributed to the statistical analysis of this study. We thank Michael Kimpton and Steve Hayes for developing the ARIA Trial Web site and Melvyn Jones, MD, for back-translating the vignettes into indications for validation.