Kiefe CI, Allison JJ, Williams OD, Person SD, Weaver MT, Weissman NW. Improving Quality Improvement Using Achievable Benchmarks For Physician FeedbackA Randomized Controlled Trial. JAMA. 2001;285(22):2871-2879. doi:10.1001/jama.285.22.2871
Author Affiliations: Division of Preventive Medicine (Drs Kiefe, Williams, and Person), Center for Outcomes and Effectiveness Research and Education (Drs Kiefe, Allison, Williams, Person, Weaver, and Weissman), Division of General Internal Medicine (Dr Allison), School of Nursing (Dr Weaver), and Department of Health Services Administration (Dr Weissman), University of Alabama at Birmingham.
Context Performance feedback and benchmarking, common tools for health care
improvement, are rarely studied in randomized trials. Achievable Benchmarks
of Care (ABCs) are standards of excellence attained by top performers in a
peer group and are easily and reproducibly calculated from existing performance
Objective To evaluate the effectiveness of using achievable benchmarks to enhance
typical physician performance feedback and improve care.
Design Group-randomized controlled trial conducted in December 1996, with follow-up
Setting and Participants Seventy community physicians and 2978 fee-for-service Medicare patients
with diabetes mellitus who were part of the Ambulatory Care Quality Improvement
Project in Alabama.
Intervention Physicians were randomly assigned to receive a multimodal improvement
intervention, including chart review and physician-specific feedback (comparison
group; n = 35) or an identical intervention plus achievable benchmark feedback
(experimental group; n = 35).
Main Outcome Measure Preintervention (1994-1995) to postintervention (1997-1998) changes
in the proportion of patients receiving influenza vaccination; foot examination;
and each of 3 blood tests measuring glucose control, cholesterol level, and
triglyceride level, compared between the 2 groups.
Results The proportion of patients who received influenza vaccine improved from
40% to 58% in the experimental group (P<.001)
vs from 40% to 46% in the comparison group (P = .02).
Odds ratios (ORs) for patients of achievable benchmark physicians vs comparison
physicians who received appropriate care after the intervention, adjusted
for preintervention care and nesting of patients within physicians, were 1.57
(95% confidence interval [CI], 1.26-1.96) for influenza vaccination, 1.33
(95% CI, 1.05-1.69) for foot examination, and 1.33 (95% CI, 1.04-1.69) for
long-term glucose control measurement. For serum cholesterol and triglycerides,
the achievable benchmark effect was statistically significant only after additional
adjustment for physician characteristics (OR, 1.40 [95% CI, 1.08-1.82] and
OR, 1.40 [95% CI, 1.09-1.79], respectively).
Conclusion Use of achievable benchmarks significantly enhances the effectiveness
of physician performance feedback in the setting of a multimodal quality improvement
Gaps between medical care as actually practiced and the recommendations
derived from evidence-based research are large and widespread.1- 3
Because more complete use of these recommendations should result in the prevention
of considerable morbidity and mortality,4,5
research on methods to bridge these gaps is important. Quality improvement
approaches such as medical record audit and feedback, opinion leaders, academic
detailing, chart-based reminders, and computerized decision support have been
As explained recently by Samsa and Matchar,18
testing the general continuous quality improvement (CQI) approach to health
care in randomized controlled trials (RCTs) is rare and, perhaps of necessity,
inconclusive.19 Testing specific interventions
deriving from a CQI approach in RCTs is more common, but still not abundant.18 These RCTs represent efforts to examine improvement
activities with the same rigorous standards of evidence as those becoming
increasingly accepted in the practice of evidence-based medicine.20 Our study is an RCT that tests the addition of a
new tool, the Achievable Benchmarks of Care (ABC), to the "toolbox" for CQI.21- 23
Audit and feedback methods, in which clinicians receive reports of their
performance and usually are compared to the mean performance of a peer group,
have been used and studied extensively, but few of these studies have been
RCTs.24,25 One underlying theory
holds that viewing personal performance within the context of peer performance
is a powerful motivator for change.26- 28
However, researchers have reached few firm conclusions on the benefits of
such an approach.29 In general, only modest
benefits have been described and long-term sustainability has not been demonstrated.
Seeking a method to increase the effectiveness of using performance
feedback to clinicians, we developed the achievable benchmark method,30- 32 which is calculated
from the performance of all members of a peer group and represents a realistic
standard of excellence attained by the top performers in that group. The achievable
benchmark method has desirable statistical characteristics and has been well
received by physicians.33- 35
Our next step, after developing the method and confirming its face validity,
was to evaluate its effectiveness in improving care with the most rigorous
We enlisted the population of Alabama physicians participating in the
Ambulatory Care Quality Improvement Project (ACQIP) to test the effectiveness
of achievable benchmarks in provider feedback. ACQIP was designed by the Health
Care Financing Administration (HCFA) to improve quality of care for ambulatory
Medicare patients with diabetes mellitus and conducted by peer review organizations
(PROs) in Alabama, Iowa, and Maryland. To improve practice patterns, clinicians
received multimodal interventions, including feedback of baseline performance
data on quality measures. In this context, we performed an RCT to test the
hypothesis that achievable benchmark–enhanced feedback would result
in more improvements in care than the "usual" feedback that is part of the
multimodal interventions used nationwide in HCFA quality improvement projects.26
This group-randomized trial was conducted within the Alabama ACQIP,
a HCFA-sponsored demonstration project designed to improve outpatient care
of fee-for-service Medicare beneficiaries with diabetes mellitus.36 We first describe the ACQIP design and then illustrate
how we superimposed the achievable benchmark experiment on the ACQIP design
with 70 ACQIP physicians.
Physicians in ACQIP were given performance feedback based on several
quality measures. After feedback, the Alabama Quality Assurance Foundation
(AQAF) and the Alabama PRO partnered with physicians to develop and implement
quality improvement projects targeting the ACQIP performance measures. The
baseline data collection period reflected performance of participating physicians
from January 1, 1994, through June 30, 1995.37,38
After a structured and scheduled sequence of improvement efforts during 1996,
follow-up data on the performance of the same physicians from January 1, 1997,
through June 30, 1998, were collected.36 HCFA
has now implemented a national program of diabetes care improvement as part
of its sixth scope of work and PROs are continuing to use methods similar
to that of ACQIP.39
Physician Selection. HCFA planned to recruit 100 physicians per state in 1995 from Alabama,
Iowa, and Maryland. All physicians practicing family medicine, internal medicine,
or endocrinology were identified from Medicare claims data. To be eligible,
each was required to have a minimum of 25 eligible diabetic patients enrolled
in a fee-for-service Medicare plan. From 561 eligible Alabama physicians,
HCFA generated a random sample that was used by AQAF to recruit for the ACQIP
study. Physicians were invited to participate and when 100 had accepted, enrollment
Of the 97 initial Alabama ACQIP participating physicians enrolled in
1995, 70 completed the study in 1998. The 27 physicians lost to follow-up
were practicing in a different environment, had retired, or were deceased.
Patient Selection. Eligible patients were identified from Medicare outpatient (Part B)
files based upon a billing diagnosis of diabetes mellitus (International Classification of Diseases, Ninth Revision: codes 105.00-250.9x).
Eligible patients were 65 years or older, had no end-stage renal disease,
had a residence other than skilled nursing facility, and were alive at baseline.
AQAF then assigned each patient to a primary care physician (family practice,
general practice, internal medicine, osteopathy) or endocrinologist based
on the number of office visits and the number of billable Medicare services
For both baseline and follow-up assessment, we randomly selected and
reviewed an average of 20 patient medical records for each physician. To ensure
independence of baseline and follow-up observations, we planned to exclude
patients from follow-up who had had their records reviewed at baseline.36
Quality Measures. Through ACQIP and related projects, HCFA led the development of multiple
quality measures for ambulatory diabetic patients with contributions from
the American Diabetes Association (ADA), the National Committee for Quality
Assurance, the American Academy of Family Physicians, the American College
of Physicians, and the Veterans Health Administration.40,41
The indicators were designed to assess processes of care for quality improvement
and were not intended to serve as standards of care. All indicators were dichotomous
variables thought to be amenable to simple quality improvement measures. In
general, the quality indicators allow for a longer time frame to administer
the clinical intervention (vaccine, aspect of physical examination, or laboratory
test) than suggested by the ADA guidelines.42
This leniency means that performance for these indicators should be better
than for the ADA guidelines because decreasing the time frame for the clinical
intervention would probably decrease average indicator performance.
We ascertained from the medical record whether the following appropriate
care was performed for each eligible patient at least once during the 18-month
period: (1) measurement of long-term glucose control as reflected by at least
1 test of glycosylated hemoglobin (hemoglobin A1c) or fructosamine,
(2) measurement of serum cholesterol, (3) measurement of serum triglycerides,
(4) measurement of serum creatinine, (5) performance of in-office foot examination,
and (6) administration of influenza vaccine. Performance on an indicator was
quantified by dividing the number of eligible patients who received the item
by the total number of eligible patients.
During the study period, the guideline-recommended method for periodically
assessing renal function in the absence of previously diagnosed proteinuria
changed to the measurement of microalbuminuria.43
Because we did not have quantitative assessment of microalbuminuria at baseline,
it was not incorporated into the follow-up medical record abstraction. On
the other hand, because periodic serum creatinine measurement is no longer
recommended, we do not emphasize assessment of diabetic nephropathy screening
but focus on the other 5 indicators.
Chart Review. All data for the quality measures were obtained from chart review according
to methods previously described.44 Charts were
photocopied and abstracted centrally using MedQuest, publicly available software
developed for HCFA (http://www.hcfa.gov). The ACQIP investigators
developed a standardized chart review protocol and refined the protocol through
pilot testing. As part of the protocol, abstractors underwent intense training
with competency certification. The MedQuest chart review module contained
standard lists for variable synonyms, medications, diagnoses, and procedures.
Throughout the chart abstraction period, 5% of charts were randomly sampled
for dual abstraction and physicians evaluated chart abstractions for validity.
Validity and reliability of all key variables were at least 95%.
ACQIP Intervention. All ACQIP physicians participated in an intensive quality improvement
program in which they were informed of their individual performance on the
ACQIP indicators as well as of the mean performance of their peers (other
participating Alabama physicians). Each physician received this information
in mailings approximately 3 to 6 weeks apart during 1996, according to a schedule
developed by HCFA and AQAF.36 With assistance
from AQAF, physician offices developed quality improvement plans (QIPs), currently
on file at AQAF. The extensive and multimodal QIPs included formalized group
meetings, root cause analysis, and changes of care at the office level, such
as posting of patient educational material, use of chart interventions in
the practice environment, reminders, clinical "flow sheets," and standing
orders for appropriate administration of influenza vaccination. The QIPs were
developed and documented according to a standardized and reproducible template.
We superimposed a group-randomized trial on the basic ACQIP design (Figure 1). In December 1996 we randomized
the 97 Alabama ACQIP physicians to either the comparison or experimental achievable
benchmark group. Of these 97 physicians, 27 were lost to follow-up. All ACQIP
physicians not lost to follow-up agreed to participate in the achievable benchmark
experiment, which consisted of adding, to the standard ACQIP intervention
described above, an achievable benchmark for each indicator in the final report
that was mailed to the achievable benchmark physicians, but not in the final
report mailed to comparison physicians.33 AQAF
personnel who assisted the physician offices with developing the quality improvement
projects were not informed as to which physicians received achievable benchmark
feedback. Preintervention and postintervention changes in the experimental
vs the comparison groups provided the main test of achievable benchmark effectiveness.
The achievable benchmark is calculated for a specific indicator of care,
such as the percentage of eligible patients receiving influenza vaccination.
In essence, the achievable benchmark represents the average performance for
the top 10% of the physicians being assessed. In practice, adjustments are
made to account for differences in the numbers of patients per physician and
also to allow the inclusion of physicians with small numbers of eligible patients
without unduly distorting the overall performance assessment.45
Thus, an adjusted performance fraction (APF) is calculated for each physician
by dividing the number of patients receiving the vaccination plus 1 by the
number eligible for vaccination plus 2. The clinicians are then ranked, from
highest to lowest, according to this APF until at least 10% of the patients
for all the physicians have been included. The achievable benchmark calculation
is then based on all the eligible patients for these top-ranked physicians
and is the number of patients receiving the vaccination divided by the number
Details of the achievable benchmark method and its theoretical underpinnings
are published elsewhere.30- 33
A computer program for achievable benchmark computation, accompanied by a
user manual, is posted on the Internet (http://www.main.uab.edu/show.asp?durki=11311) and will be provided upon request.
The achievable benchmark experiment was a group-randomized trial, in
that patients were nested within physicians, with physicians the unit of randomization
and also the unit of some, but not all, of the analyses. To take full advantage
of the available information, we also conducted some analyses with the patient
as unit of analysis using techniques appropriate for the analysis of group-randomized
We examined baseline demographics of the physicians and their patients.
We also compared study physicians with nonparticipating Alabama physicians.
Separate analyses were performed for each indicator. With the physician as
unit of analysis, we used paired t tests to compare
the mean baseline and follow-up performance of achievable benchmark intervention
physicians (n = 35) and then repeated this analysis for comparison physicians.
To evaluate the statistical significance and magnitude of the achievable benchmark
effect, ie, between-treatment differences in postintervention performance,
we used generalized linear models. These models considered nesting of the
2978 patients within physicians and contained baseline performance as a covariate
to adjust for any preintervention performance differences.48,49
We used a logit link to account for the binary nature of the response variable.
We also developed patient-level generalized linear models to estimate
the odds of receiving a recommended intervention according to study arm after
adjusting for physician characteristics. We did not adjust for patient characteristics
because each quality measure specified a group of patients who were ideal
candidates for the intervention. The applicability of each quality measure
in this study does not depend upon the patient characteristics. For example,
diabetic patients should receive influenza vaccination regardless of whether
they have hypertension, obesity, or coronary artery disease. Palmer50 cogently argues that process measures that carefully
identify ideal candidates for a procedure often do not require risk adjustment.
Therefore, overadjustment for patient characteristics would obscure important
findings. SAS Version 8.0 statistical software was used for the statistical
analyses (SAS Institute Inc).
In general, Alabama physicians completing the ACQIP study did not differ
significantly from all physicians eligible for participation in ACQIP or from
all physicians initially enrolled in ACQIP (Table 1). The physicians randomized to the experimental and comparison
arms of the achievable benchmark experiment were not significantly different
regarding years in practice, practice location, country of medical school
attended, and specialty (Table 2).
Patients of comparison and achievable benchmark physicians were similar in
age, race, and pertinent comorbidity both at baseline and at follow-up (Table 3). Contrary to initial HCFA plans,
313 of the 1360 patients studied at follow-up had been included in the baseline
group as well. To address this issue, we performed all analyses with and without
these patients. Our results did not change substantially, although there was
some loss of statistical significance with the reduced sample size.
The achievable benchmarks for each indicator were: (1) influenza vaccination,
82%; (2) foot examination, 86%; (3) long-term glucose control measurement,
97%; (4) cholesterol measurement, 99%; and (5) triglycerides measurement,
98%. Both groups of physicians had a mean preintervention influenza vaccination
rate of 40%; physicians receiving achievable benchmarks improved to a postintervention
rate of 58%, while comparison physicians improved to 46% (Figure 2). In addition, both experimental and comparison groups
improved significantly on foot examination (46% to 61% vs 32% to 45%) and
long-term glucose control measurement (31% to 70% vs 30% to 65%). For cholesterol
measurement, the experimental group improved significantly (66% to 72%) while
the comparison group did not (66% to 69%). The changes for triglyceride measurement
(61% to 65% vs 57% to 60%) were not significant.
Patients of achievable benchmark physicians had significantly higher
adjusted odds of receiving appropriate care at follow-up compared with patients
of comparison physicians for influenza vaccination, foot examination, and
measurement of long-term glucose control (Table 4).
After adjustment for the physician characteristics and baseline performance
using generalized linear models, patients of urban physicians tended to receive
more appropriate care (Table 5).
In addition, international medical graduates were more likely to perform foot
examinations. Physicians who graduated after 1970 were more likely to order
influenza vaccination but less likely to order lipid testing for their patients.
Family practitioners were more likely than internists to order influenza vaccination.
Finally, even after adjustment for multiple physician characteristics, patients
of physicians assigned to the experimental study arm had significantly higher
odds of receiving appropriate care at follow-up on all 5 measures.
In this RCT, we demonstrated that achievable benchmark feedback improved
clinician performance beyond the effect produced by an underlying improvement
intervention, which in itself was associated with significant overall improvement
for most quality measures. For influenza vaccination, foot examination, and
long-term glucose control measurement, physician receipt of achievable benchmark
feedback was associated with 33% to 57% higher odds of patients receiving
appropriate care at follow-up compared with patients of comparison physicians.
Interpreting the magnitude of the observed achievable benchmark effect
demands consideration of the relative harm, benefit, and cost of each clinical
intervention and of the incremental cost of adding an achievable benchmark
to existing feedback programs.51 Because achievable
benchmarks are easy to calculate from existing data, they are a simple tool
for enhancing audit and feedback approaches. Also, there is no foreseeable
harm from adding achievable benchmarks to clinician profiles. Although we
did not quantify the cost of using achievable benchmarks, it is modest given
that the tool requires no data collection beyond that necessary for the usual
audit and feedback process.
Because the quality measures we studied are backed by evidence and are
applicable to a high proportion of diabetic patients, using the achievable
benchmark in population-wide initiatives could benefit a substantial number
of patients.52 For example, the prevalence
of type 2 diabetes mellitus is estimated to be 15.6 million in the United
States.53 Between 1994 and 1997, some 20 000
to 40 000 deaths per year were attributable to influenza and subsequent
pneumonia.54 Of all patients dying with influenza
and pneumonia, approximately 18% are estimated to be diabetic,55
resulting in some 3600 to 7200 deaths per year in diabetic patients. A meta-analysis
revealed the efficacy of influenza vaccine in preventing death to be about
68%.56 Given that 12% more patients of physicians
who received achievable benchmark care compared with patients of physicians
who did not receive achievable benchmark care were vaccinated for influenza
and assuming that none of those dying from influenza were vaccinated, between
294 and 587 deaths, and substantially more episodes of influenza and pneumonia,
could be prevented per year.
A previously published survey of achievable benchmark physicians (81%
response rate) conducted after feedback provides some insight into possible
mechanisms of achievable benchmark influence on practice patterns.33,34 Of all respondents, 74% considered
specific new approaches to improve care, 63% identified new approaches specifically
for their office, and 55% actually implemented new approaches. The most frequently
reported process change was the incorporation of practice flow sheets and
the most frequently reported change in clinical practice was an increase in
office foot examinations.
Our study was conducted in the office setting, where the individual
physician has direct influence. We speculate that comparison of individual
performance with an achievable benchmark might have provided additional motivation
for change. This speculation is consistent with social cognition models of
change that emphasize provider perceptions and attitudes.57,58
However, we recognize the complexity and incomplete picture presented by the
literature on changing provider behavior.59
The concept of benchmarking is widely included in the improvement literature.
However, traditional definitions of benchmarks and benchmark providers have
been subjective and opinion driven, rather than data driven.60- 63
Since our earlier publications of the achievable benchmark method,31- 33 we have received
many requests for information regarding its use. For example, the achievable
benchmark method is now included in the toolkit published by the US Public
Health Service for states wishing to reach Healthy People 2010 goals64 and has also been adopted by PROs conducting HCFA-sponsored
quality improvement projects.35
Our finding that physicians in rural settings were less likely to improve
with feedback is intriguing. There is substantial literature on rural health
care delivery, yet little is available on improvement efforts based on quality
of care measurement in rural areas.65- 69
Also, our data suggest that recent medical school graduates, foreign medical
graduates, and family practitioners may be more responsive to improvement
efforts. Number of years in practice and country of medical school have received
some peripheral attention in the literature on the relationship between physician
characteristic and quality, but no clear pattern of association has emerged.70- 75
The relationship between physician specialty and quality of care has recently
come under intense scrutiny.76- 78
Because the identification of physician characteristics predicting success
of quality improvement was not the focus of our analysis, we intend to study
this issue separately.
Our study has several limitations. First, we included volunteers from
a subset of Alabama physicians. However, physicians were chosen by a stratified
randomization process, and we did not find significant differences between
eligible physicians and physicians who completed this study. Because we were
investigating how to improve quality improvement methods, working with volunteer
physicians is consistent with our research objectives. In fact, many of the
current improvement efforts, including those promoted by HCFA and other organizations,
follow general principles of CQI and rely on voluntary participation of providers.
Second, the improvement attributable to achievable benchmark feedback
may or may not persist over time. This is a limitation of what is known about
audit and feedback, as well as about improvement efforts in general, and constitutes
an area of active research.4,24,25,29
Third, to inexpensively calculate an achievable benchmark, one needs
a readily available data source. Although we used existing chart-review data
in this project, we have demonstrated in previous publications that achievable
benchmarks may be calculated from other data sources such as the National
Health Interview Survey30 and Medicare administrative
Finally, questions regarding the achievable benchmark method itself
remain, including whether it can be shown to be equally effective in other
settings, such as in inpatient or managed care outpatient improvement efforts.
Also, we do not know that adding the achievable benchmark method to a less-intense
quality improvement program would produce similar incremental benefit. However,
we note again that intensive multimodal quality improvement projects are frequently
used. HCFA has now implemented several national Medicare programs based on
the ACQIP model.39 Similar multimodal quality
improvement programs are also espoused by the Joint Commission on Accreditation
of Healthcare Organizations,23 and managed
care organizations frequently conduct multimodal quality improvement interventions
of equal intensity.
We believe that, as the measurement of quality becomes ever more important
in health care, it behooves researchers in this area to subject their approaches
to rigorous scrutiny where feasible. Our study represents a methodological
advance in that it demonstrates the effectiveness of a simple, new quality
improvement tool. In addition, we rigorously expand the available literature
on evaluating quality improvement tools.
In conclusion, with the achievable benchmark method we have added a
new tool to the "toolbox" for translating evidence into practice. In the RCT
reported here, we have demonstrated this tool's effectiveness in enhancing
audit and feedback based on medical record review in the ambulatory setting.
With the current imperative to improve health care delivery, any effective
addition to improvement efforts deserves close attention. When calculated
from existing data and used as an enhancement to an existing feedback program,
the achievable benchmark method has the advantages of being neither resource
intensive nor inherently hazardous. With high face validity, the peer-based,
data-driven achievable benchmark method has many advantages over subjectively
defined benchmarks and represents an advance in the methodology of quality
measurement and improvement.