Receiver operating characteristic curves for the expert cardiologists estimates, the amount of horizontal ST depression, and the average of the 3 scores used for consensus. Straight line represents no descriminating value.
Lipinski M, Do D, Froelicher V, Osterberg L, Franklin B, West J, Atwood E. Comparison of Exercise Test Scores and Physician Estimation in Determining Disease Probability. Arch Intern Med. 2001;161(18):2239-2244. doi:10.1001/archinte.161.18.2239
Copyright 2001 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.2001
The recent American College of Cardiology/American Heart Association exercise testing guidelines provided equations to calculate treadmill scores and recommended their use to improve the predictive accuracy of the standard exercise test. However, if physicians can estimate the probability of coronary artery disease as well as the scores can, there would be no reason to add this complexity to test interpretation. To compare the exercise test scores with physician's estimation of disease probability, we used clinical, exercise test, and coronary angiographic data to compute the recommended scores and print patient summaries and treadmill reports.
To determine whether exercise test scores can be as effective as expert cardiologists in diagnosing coronary disease.
Five hundred ninety-nine consecutive male patients without previous myocardial infarction with a mean ± SD age of 59 ± 11 years were considered for this analysis. With angiographic disease defined as any coronary lumen occlusion of 50% or more, 58% had disease. The clinical/treadmill test reports were sent to expert cardiologists and to 2 other groups, including randomly selected cardiologists and internists, who classified the patients as having high, low, or intermediate probability of disease and estimated a numerical probability from 0% to 100%.
Forty-five expert cardiologists returned estimates on 336 patients, 37 randomly chosen practicing cardiologists returned estimates on 129 patients, 29 randomly chosen practicing internists returned estimates on 106 patients, 13 academic cardiologists returned estimates on 102 patients, and 27 academic internists returned estimates on 174 patients. When probability estimates were compared, the scores were superior to all physician groups (0.76 area under the receiver operating characteristic curve to 0.70 for experts [P = .046], 0.73 to 0.58 for cardiologists [P = .003], and 0.76 to 0.61 for internists [P = .006]). Using a probability cut point of greater than 70% for abnormal, predictive accuracy was 69% for scores compared with 64% for experts, 63% to 62% for cardiologists, and 70% to 57% for internists.
Although most similar to the disease estimates of the presence of clinically significant angiographic coronary artery disease provided by the expert cardiologists, the scores outperformed the nonexpert physicians.
CLINICAL and exercise test scores can enhance the decision-making process regarding whether a patient with symptoms possibly due to coronary artery disease should undergo coronary angiography.1,2 The scores help make it less likely that individuals without coronary artery disease will be referred for unnecessary cardiac catheterization. By providing a second opinion, scores also help nonspecialists make more appropriate decisions regarding referral to a specialist. Thus, exercise test scores can provide a means to see that expensive technology is properly used and that access to specialized therapy is ensured.
Besides providing increased predictive accuracy, scores eliminate physician bias and lessen the variability of decision making.3,4 Physicians do not always follow a totally rational decision-making process, but often make clinical decisions based on personal experience and heuristics.5 By eliminating the intuitive aspect of decision making, scores can provide an unbiased evaluation. Although the value of exercise testing scores to help physicians has been documented,6 many physicians remain skeptical of the accuracy of scores and prefer to rely on the results of more expensive tests in making their decisions. This skepticism remains despite data demonstrating that scores improve the diagnostic characteristics of exercise tests and predict the presence of coronary disease as well as or better than echocardiographic or nuclear tests.7 To resolve this skepticism, we performed an analysis to compare the diagnostic accuracy of exercise scores with that of cardiologists and generalists.
Patients were selected from a database of the last 2000 consecutive male patients who underwent clinical evaluation, exercise testing, and coronary angiography at the Long Beach and Palo Alto Veteran Affairs medical centers in Long Beach and Palo Alto, Calif, respectively. Patients with previous cardiac surgery or interventions, valvular heart disease, left bundle branch block, more than 1 mm depression, or Wolff-Parkinson-White syndrome on their resting electrocardiogram were excluded from the study. Previous cardiac surgery was the predominant reason for exclusion of patients. We then selected all patients who were referred to evaluate chest pain possibly due to coronary disease and who had complete data and coronary angiography within 4 months of the exercise treadmill test. As is the case for clinical observational studies such as this, there was no attempt to remove workup bias. To avoid falsely increasing the accuracy of the exercise treadmill test, we excluded patients with a previous myocardial infarction by history or diagnostic Q wave, leaving a target population of 599 patients.
Physicians who also examined the patients recorded with the use of computerized forms a thorough clinical history, including medications and risk factors, prospectively at the time of exercise treadmill testing.8,9
Patients underwent symptom-limited treadmill testing with the US Air Force School of Aerospace Medicine10 or an individualized ramp treadmill protocol.11 Before ramp testing, the patients were given a questionnaire to estimate the patient's exercise capacity before the test. This allowed most patients to reach maximal exercise at approximately 10 minutes.12 Visual ST-segment depression was measured at the J junction and corrected for pre-exercise ST-segment depression while standing; ST slope was measured during the following 60 milliseconds and classified as up-sloping, horizontal, or down-sloping. Slope was coded as 1 for horizontal, 2 for down-sloping, and 0 for normal slope (ie, up-sloping or ST-segment depression of less than 0.5 mm). The ST response considered was the most horizontal or down-sloping ST-segment depression in any lead, except aVR during exercise or recovery. An abnormal response was defined as 1 mm or more of horizontal or down-sloping ST-segment depression.
No test result was classified as indeterminate,13 medications were not withheld, and a maximal heart rate target was not used as an end point.14 The exercise tests were performed, analyzed, and reported per standard protocol with a computerized database (EXTRA; Mosby Publishers, Chicago, Ill).15 Decisions for cardiac catheterization were consistent with clinical practice.
Coronary artery narrowing was visually estimated and expressed as percentage of lumen diameter stenosis. Patients with 50% or greater narrowing in 1 or more of the following were considered to have significant angiographic coronary artery disease: the left anteriordescending, left circumflex, or right coronary arteries or their major branches or the left main coronary artery. The 50% criterion was chosen to be consistent with the cooperative trialist's choice.16
Reports of the patient information and treadmill test were then generated from the database. The results of the coronary angiography were excluded from the data sheet to blind the physician interpreter. The patient data sheet provided the information traditionally used by physicians to assess whether a patient presenting with possible coronary artery disease should undergo coronary angiography.
The studies were randomly divided into 78 groups of 12 studies. Each reviewer was sent the data sheets, a return envelope, and a cover letter, which explained the goals of the experiment and guidelines on assigning a patient to the high, intermediate, or low probability group for any coronary disease. We selected 110 cardiologists who were considered experts on the basis of their authorship of exercise testing or angiographic studies. The experts were sent 12 studies each. A 40% response rate resulted in a total of 336 studies filled out by 45 expert cardiologists.
A similar approach was taken with random cardiologists. The random cardiologists were nonacademic practicing cardiologists selected at random from a current membership directory of the American College of Cardiology. The cardiologists were selected as random cardiologists if they were not associated with a university or hospital and were not fellows in training, to distinguish them from the expert cardiologists. To increase the rate of participation in the study, only 6 data sheets of the group of 12 were sent to each random cardiologist. Approximately 400 random cardiologists were sent a packet of studies; 37 cardiologists responded, for a return rate of approximately 10% with 129 studies returned.
A group of random internists were also included for comparison. They were nonacademic practicing internists selected from the 1997 and 1998 official American Board of Medical Specialists directory of board-certified medical specialists. Those associated with a university or hospital were excluded. The randomly selected internists were then sent the same group of studies that were sent to the random cardiologists. Approximately 400 random internists were sent a packet of studies; 29 internists responded, for a return rate of 8% with 106 studies returned.
The final 2 groups of physicians were the local academic cardiologists and the local academic internists. We recruited colleagues at the Palo Alto Veterans Affairs Health Care Center, Palo Alto, Calif, and William Beaumont Hospital, Royal Oak, Mich, to participate and asked them to complete 12 studies each. Thirteen cardiologists returned data sheets on 102 patients, and 27 internists returned data sheets on 174 patients. The response rates were 40% and 75%, respectively.
Physicians were asked to classify the patient as having a high probability, intermediate probability, or low probability of having clinically significant coronary disease. The physicians were requested to make this evaluation based on the following criteria for stratification: low probability: patient is reassured that symptoms are most likely not due to coronary disease; intermediate probability: other tests, even possibly angiography, indicated to clarify diagnosis and antianginal medications tried; and high probability: antianginal treatment indicated and angiography may be required if severe disease is likely and an intervention is clinically warranted. Physicians were also requested to provide a numerical percentage as their estimate of the probability that the patient had clinically significant coronary disease.
The clinical and exercise test data were input into the following equation to generate 3 probability estimates: Probability = 1 / [1 + e−(a + bx + cy . . .)], where e is the natural log, a is the intercept, b and c are β coefficients, and x and y are variable values.
The appropriate coefficients and variables were from the 3 equations included in the American College of Cardiology/American Heart Association exercise testing guidelines.1 The variables included age, symptoms, risk factors, and exercise test responses, which were derived from the Veterans Affairs (VA) Score, the Simplified VA Score, the Detrano Score, or the Morise Score.
The pre-exercise test equation,17 including the chosen variables and their coefficients and the constant, is as follows: −2.1 + (0.03 × Age) − (0.4 × Symptoms)
+ (0.8 × Diabetes) + (0.4 × Hypercholesterolemia) + (0.01 × Pack-years) + (0.7 × Resting ST Depression in Millimeters).
The postexercise test equation, including the chosen variables, their coefficients, and the constant, is as follows: −1.2 + (3.3 × Pretest) + (0.5 × Exercise ST Depression in Millimeters) +(0.6 × ST Slope) − (0.16 × Metabolic Equivalents) − (0.5 × Exercise Angina), where pretest is a number between 0 and 1 generated by the pretest equation.
The simplified VA score18 is as follows: (6 × Maximal Heart Rate) + (5 × ST Depression Code)
+ (4 × Age Code) + (Angina Pectoris Code) + (Hypercholesterolemia) + (Diabetes) + (Treadmill Angina Index).
Detrano et al19 included 3549 patients from 8 institutions in the United States and Europe who underwent exercise testing and angiography between 1978 and 1989. Disease was defined as greater than 50% diameter narrowing in at least 1 major coronary arterial branch, and the prevalence of disease according to this criterion was 64%. The selected Detrano equation components are as follows: 1.9 + (0.025 × Age) − (0.6 × Sex) − (0.1 × Symptoms) − (0.05 × Metabolic Equivalents) − (0.02 × Maximal Heart Rate) + (0.36 × Exercise-Induced Angina) + (0.6 × ST Depression in Millimeters).
Sex was coded as 1 for female and −1 for male. Symptoms were classified into the 4 categories of typical, atypical, nonanginal pain, and no pain and coded with the values 1, 2, 3, and 4, respectively. Exercise angina was coded as 1 for presence and −1 for absence.
Morise et al20 studied 915 consecutive patients without a history of previous myocardial infarction or coronary artery bypass surgery who were referred to the exercise laboratory at West Virginia University Medical Center, Morgantown, between June 1981 and December 1994 for evaluation of coronary disease. All patients had coronary angiography within 3 months of the exercise test. The patients were classified as having disease if there was at least a 50% lumen diameter narrowing in 1 or more vessels, and using this criterion, the prevalence of disease in their population was 41%. Morise et al generated both pre-exercise and postexercise logistic regression equations. The Morise pre-exercise test intercept and variables are as follows: −3.6 + (0.08 × Age) − (1.3 × Sex) + (0.6 × Symptoms) + (0.7 × Diabetes) + (0.3 × Smoking) − (1.5 × Body Surface Area) + (0.50 × Estrogen) + (0.3 × Number of Risk Factors) − (0.40 × Resting Electrocardiogram).
Sex was coded as 1 for female and 0 for male. Symptoms were classified into the 4 categories of typical, atypical, nonanginal pain, and no pain and coded with the values 4, 3, 2, and 1, respectively. Diabetes was coded as 1 if present and 0 if absent. Smoking was coded as 2 for current smoking, 1 for any previous smoking, and 0 for never smoked. Estrogen was coded as 0 for men and, for women, 1 for estrogen-negative (postmenopausal and no estrogen) and −1 for estrogen-positive (premenopausal or taking estrogen). Risk factors included history of hypertension, hypercholesterolemia, and obesity (body mass index [calculated as weight in kilograms divided by the square of height in meters], ≥27). Resting electrocardiogram was coded as 0 if normal and 1 if there were QRS or ST-T wave abnormalities.
The Morise posttest equation is as follows: −0.12 + (4.5 × Pretest) + (0.37 × ST Depression in Millimeters) + (1.0 × ST Slope) − (0.4 × Negative ST) − (0.016 × Maximal Heart Rate).
Pretest is the pretest probability (0 to 1) derived from the pretest equation. ST depression in millimeters was coded as 0 for women. ST slope was coded as 1 for down-sloping and 0 for up-sloping or horizontal. Negative ST was coded as 1 if ST depression was less than 1 mm of depression horizontal or down-sloping or if ST depression was less than 1.5 mm of up-sloping.
Our group previously validated a means to make predictive equations more portable and self-calibrating by requiring a consensus for patient classification as to risk of coronary disease.18- 20 These studies used 2 thresholds of each of the computer-generated probability scores to separate the population into 3 groups: low probability (prevalence of coronary disease, <5%), intermediate probability (prevalence of coronary disease, 5%-70%), and high probability (prevalence of coronary disease, >70%). Patients were classified as having low or high probability if at least 2 of the 3 equations agreed, ie, there was a consensus by majority. This approach avoids difficulties due to differences in variable collection, test method, missing data, and disease prevalence.
The numerical probability must have a cut point to separate normal and abnormal results to calculate test diagnostic characteristics. A 70% probability was established as the cut point for the physician estimates, with predictions greater than or equal to 70% indicating that the patient has coronary disease and predictions less than 70% indicating that the patient does not have coronary disease.
The probability predictions for the 5 groups of physicians were entered into the database and compared against the angiographic results of the patients. Using the 70% cut point, the diagnostic accuracy of predicting angiographic coronary disease was established for each of the 5 groups of physicians. The predictive accuracy of a group of physicians is determined by adding the number of true-positives results and true-negative results and then dividing the sum by the total number of patients. A true-positive result in the probability percentage section results from the physician giving a probability greater than or equal to 70% for a patient with coronary disease. A true-negative result is when the physician gives a probability less than 70% for a patient without coronary disease. A false-positive result is when the physician gives a probability greater than or equal to 70% for a patient without coronary disease. A false-negative result is when the physician gives a probability less than 70% for a patient with coronary disease. The same was done with the average of the probability estimates generated by the 3 equations. The probabilities computed from the scores and their averages were also used to construct receiver operating characteristic (ROC) curves for comparison of their diagnostic (discriminatory) characteristics. P values were calculated from the standard errors generated for each area under the ROC curve.
Table 1 describes our patient population (N = 599). No significant differences were found in the 5 different physician data sheet samples.
Table 2 shows the predictive accuracy using the 70% probability cut point for the 5 groups of physicians compared with the average of the 3 scores in the same patients reviewed by each physician group.
Consensus provided a significantly higher predictive accuracy than did the physicians (69% vs 63%, P<.01). These data, however, do not allow comparison among the different groups, because each group of physicians provided probability percentages for different patients.
The next area of concern is the determination of the probability stratification of each patient. Per the definition provided to the reviewers, this demonstrates the accuracy with which a physician determines whether a patient should undergo further testing. The categories in which a patient could have been placed are high probability, intermediate probability, and low probability. Because a patient classified in the intermediate-probability group will undergo additional diagnostic tests to determine the presence of coronary disease, it is difficult to draw any conclusions from the data for patients in this category. However, high-probability and low-probability categories provide critical data. Patients who are in the low-probability group probably do not have cardiac catheterization or restriction of their activities. If the patient has coronary disease and was in the low-probability group, an incorrect assessment of the likelihood of disease may result in a cardiac event that could have been avoided. Also, if a physician considers a patient without coronary disease to have high probability of disease, he/she may undergo needless costly procedures that are not without risk. Table 3 provides a comparison among the 5 groups of physicians and scores in the number of patients with coronary disease who were considered to have low probability of disease divided by the total number of patients with coronary disease. The results show that scores performed better than did the physicians, with a sensitivity of 90% or more. A similar table was constructed for patients who did not have coronary disease but who were considered to have high probability of developing the disease, and the scores outperformed the physicians, with a specificity of approximately 90%. In addition, the scores performed better than did all 5 groups of physicians by putting a higher percentage of patients with coronary disease into the high-probability category.
The ROC curves were plotted for the physician groups and the amount of horizontal ST depression, and the average of the 3 scores was used for consensus (Figure 1). The area under the curve of the scores was significantly greater than that for the other methods for diagnosis. This can also be seen in Table 4.
Although scores based on exercise testing data have been advocated for years, only 3 previous studies have compared them with physician estimates of disease. Detrano et al21 performed one of the first such studies. They derived a score for estimating probabilities of significant and severe coronary disease, then they validated and compared it with the assessments of cardiologists. The score performed at least as well as did the physicians when the latter knew the identity of the patients. The clinicians were more accurate when they did not know the identity of the subjects but worked from tabulated objective data. Detrano and colleagues concluded that the application of scores or consultation with cardiologists not directly involved with patient management might result in more rational assessments and decision making. Hlatky et al22 validated 2 scores by comparing their diagnostic accuracy with that of cardiologists. Ninety-one cardiologists participated in the study; each evaluated the clinical summaries of 8 randomly selected patients who had complete evaluations, including coronary angiography. The scores outperformed these cardiologists. A third study23 considered scores for prognosis (rather than diagnosis), with clinical summaries for 100 patients sent to 5 senior cardiologists at 1 center. Again, the scores outperformed these cardiologists. Our study was larger and included different groups of physicians, validating the results of these earlier studies that showed that scores can predict angiographic results as well as physicians can.
We are not advocating that these scores replace physician judgment. The scores can be thought of as providing a readily available second opinion or consultation. The scores can also reassure nonspecialists that their decision to either manage the treatment of a patient themselves or refer that patient to a cardiologist is consistent with medical knowledge. Certainly, if a patient's symptoms reoccur or are not manageable, reassessment by the physician will lead to referral. In addition, by using the scores to objectively stratify probability, a management strategy becomes available that is more practical than the traditional interpretation of abnormal or normal exercise test results. Because the ability of scores based on any test to detect coronary disease remains imperfect, physicians will always be paramount in the decision process. However, decision making has been greatly improved in a wide range of endeavors through the use of scores.2
Although numerous studies have shown that scores enhance the discriminatory power of the standard, inexpensive, and widely available exercise test, scores have not been widely applied. There has always been a tendency to replace the old with the new in medicine despite evidence of only marginal improvement.24 Although imaging has certain advantages in patients with certain electrocardiogram abnormalities and for localization of ischemia, it is better applied in patients with chest pain when these advantages are the issues at hand. Certainly, most initial testing is best accomplished with the less expensive technology. We hope now that we have validated scores by comparing them with the interpretive abilities of a large number of expert cardiologists, physicians and other health care practitioners have the evidence to apply the strategy we have suggested. Although not designed as a cost-efficacy analysis, simple mathematical modeling allows demonstration of considerable savings. Also, our approach follows current health care mandates to empower nonspecialists yet ensure access to specialty care.
Legal liability and related issues are possible reasons that nonspecialists may refer patients to cardiologists, who may then recommend catheterization for those patients who have a low to intermediate probability of coronary disease. The decisions generated by the scores are not influenced by such liability or recent personal experiences with a rare bad outcome. Since the scores are objective and cited in the guidelines, they can shield physicians from such concerns.
Limitations to our study include its retrospective design, workup bias, lack of outcomes data, lack of women, and no formal cost-efficacy component. Another consideration is that the physicians did not interview or examine the patients themselves. However, we hope that our findings will encourage investigators to perform studies that overcome these limitations. At first, the low response rates seem to be a limitation, but if anything, they favor the physician groups, since those physicians with experience and confidence in their ability to diagnose coronary disease would be more likely to respond. Although the consensus of scores we applied could be considered complicated, a single score, such as our simple score,18 validated in a clinical setting, would be just as effective.
In our study, scores did as well as or better than physicians in estimating the probability of clinically significant angiographic coronary artery disease.
Accepted for publication January 18, 2001.
Presented as an abstract at the Scientific Sessions of the American Heart Association, Atlanta, Ga, November 8, 1999.
Corresponding author and reprints: Victor Froelicher, MD, Cardiology Division (111C), Veterans Affairs Palo Alto Health Care System, 3801 Miranda Ave, Palo Alto, CA 94304 (e-mail: firstname.lastname@example.org and Web site http://www.cardiology.org)