AF indicates atrial fibrillation; AOM, acute otitis media; ARR, adjusted risk ratio; Ca, cancer; CV, cardiovascular; DRE, digital rectal examination; GPs, general practitioners; HRT, hormone replacement therapy; MI, myocardial infarction; PPV, positive predictive value; PSA, prostate-specific antigen; RRR, relative risk ratio; and TIA, transient ischemic attack.
aSee study listing for Sadler37 in Table 2.
AEDs indicates antiepileptic drugs; AF, atrial fibrillation; Ba, barium; Ca, cancer; IV, intravenous; HRT, hormone replacement therapy; MI, myocardial infarction; MRI, magnetic resonance imaging; U/S, ultrasonography; and XR, radiograph. aSee study listing for Volk et al29 in Table 1.
eAppendix. Example of Search Strategy Used in MEDLINE (Ovid)
eTable 1. Details of Studies Which Assessed Clinicians’ Benefit and Harm Expectations of Treatment
eTable 2. Details of Studies Which Assessed Clinicians’ Benefit and Harm Expectations of Treatment Tests and Screening (Non-Radiological)
eTable 3. Summary of Studies Assessing Clinicians’ Benefit and Harm Expectations of Medical Imaging
eFigure 1. Flow of the Information Through the Phases of the Review
eFigure 2. Assessment of Benefit and Harm Expectations of Included Studies, by Intervention Topic
Hoffmann TC, Del Mar C. Clinicians’ Expectations of the Benefits and Harms of Treatments, Screening, and TestsA Systematic Review. JAMA Intern Med. Published online January 09, 2017. doi:10.1001/jamainternmed.2016.8254
Do clinicians have accurate expectations of the benefits and harms of treatments, tests, and screening tests?
In this systematic review of 48 studies (13 011 clinicians), most participants correctly estimated 13% of the 69 harm expectation outcomes and 11% of the 28 benefit expectations. The majority of participants overestimated benefit for 32% of outcomes, underestimated benefit for 9%, underestimated harm for 34%, and overestimated harm for 5% of outcomes.
Clinicians rarely had accurate expectations of benefits or harms, with inaccuracies in both directions, but more often underestimated harms and overestimated benefits.
Inaccurate clinician expectations of the benefits and harms of interventions can profoundly influence decision making and may be contributing to increasing intervention overuse.
To systematically review all studies that have quantitatively assessed clinicians’ expectations of the benefits and/or harms of any treatment, test, or screening test.
A comprehensive search strategy of 4 databases (MEDLINE, EMBASE, Cumulative Index of Nursing and Allied Health Literature, and PsycINFO) from the start years to March 17-20, 2015, with no language or study type restriction, was performed. Searches were also conducted on cited references of the included studies, and experts and study authors were contacted. Two researchers independently evaluated methodologic quality and extracted participants’ estimates of benefit and harms and authors’ contemporaneous estimates.
Of the 8166 records screened, 48 articles (13 011 clinicians) were eligible. Twenty studies focused on treatment, 20 on medical imaging, and 8 on screening. Of the 48 studies, 30 (67%) assessed only harm expectations, 9 (20%) evaluated only benefit expectations, and 6 (13%) assessed both benefit and harm expectations. Among the studies comparing benefit expectations with a correct answer (total of 28 outcomes), most participants provided correct estimation for only 3 outcomes (11%). Of the studies comparing expectations of harm with a correct answer (total of 69 outcomes), a majority of participants correctly estimated harm for 9 outcomes (13%). Where overestimation or underestimation data were provided, most participants overestimated benefit for 7 (32%) and underestimated benefit for 2 (9%) of the 22 outcomes, and underestimated harm for 20 (34%) and overestimated harm for 3 (5%) of the 58 outcomes.
Conclusions and Relevance
Clinicians rarely had accurate expectations of benefits or harms, with inaccuracies in both directions. However, clinicians more often underestimated rather than overestimated harms and overestimated rather than underestimated benefits. Inaccurate perceptions about the benefits and harms of interventions are likely to result in suboptimal clinical management choices.
At a time of increasing worry about the escalating demand for and use of medical care,1 there is growing interest in its drivers.2,3 Of the intrinsic factors— those related to patients or clinicians—expectations can be powerful.3,4 Our5 systematic review of patients’ expectations found that patients generally overestimate the benefits and underestimate the harms of medical interventions. However, patient expectations are only one influence on the decision-making process, and patients cannot be assisted to make informed decisions if clinicians themselves do not have accurate expectations of intervention benefits and harms.
If clinicians’ expectations about intervention benefits are overly optimistic6,7 or their knowledge of harms is inadequate,8 they may oversell and overuse interventions.9 Overly optimistic expectations on the part of both clinician and patient may converge to contribute to a perfect storm of unnecessary testing and treatment. Conversely, if clinicians underestimate the likely benefit or overestimate the harms, appropriate interventions may not be offered. Neither situation is desirable, and in both, evidence-practice gaps occur.
Studies of clinicians’ expectations about the benefit or harm of various interventions are fragmented across the literature. We aimed to systematically review all studies that measured clinicians’ expectations of the benefit or harm of any medical treatment, test, or screen.
All quantitative primary study designs were eligible. We had no restrictions about participant eligibility: they could, but did not need to, be a provider of the intervention studied. We included studies in which participants were asked to provide a quantitative estimate of the expected benefits and/or harms of a treatment, test, or screen. Outcomes that reported either the chance of benefit or harm or its effect size, but not those that reported a descriptive estimate without quantification (eg, much better) or the risk of having or developing a condition were included.
We used a comprehensive search strategy (subject heading terms and free text), developed with a medical librarian experienced in systematic reviews, and searched MEDLINE (1946), EMBASE (1974), Cumulative Index of Nursing and Allied Health Literature (1981), and PsycINFO (1967) from these start years until March 17-20, 2015. Initial search terms were drawn from our patient expectation review5 and a few key studies. The strategy was built iteratively: running a search, scanning relevant retrieved articles for further indexing terms, and rebuilding the search strategy with additional terms and subject headings. The final search strategy for MEDLINE (eAppendix in the Supplement) was adapted for each database. To identify further published, unpublished, and ongoing studies, we (1) tracked relevant references through Web of Science’s cited reference search, (2) scanned the reference lists of identified studies, and (3) contacted experts and authors in the field.
One of us (T.C.H.) and a research assistant screened the titles and abstracts of articles identified and eliminated articles according to the inclusion criteria. We obtained the full texts of studies considered eligible from this process or for which eligibility was unclear. Both authors independently decided each study’s inclusion or exclusion, resolving disagreements by discussion.
Both authors independently extracted data and methodologic quality features (eTables 1-3 in the Supplement), with disagreements resolved by discussion. Since each included study used a survey, information relevant to key quality criteria was extracted for assessment of bias in surveys (clear research question, sufficiently large and representative sampling method, outcome measures and instruments used, and response rate, ideally ≥60%).10 Data for the outcomes and measures relevant to the review’s purpose were extracted. Where provided, we extracted authors’ contemporaneous correct answers about the benefits or harms. For several studies, we contacted the authors for additional data or to clarify details.
If studies provided a correct answer about benefits and harms, we extracted or calculated the proportion of participants who responded correctly or with an overestimation or underestimation. Meta-analysis was not possible because of the range of outcomes and response options.
We screened 8166 articles (after removing duplicates), discarded 8049 after screening by title and abstract, and screened 117 full-text studies, of which 48 were eligible (eFigure 1 in the Supplement).
The studies were published between 198111 and 201512; came from 17 countries, with 16 from the United States; had sample sizes ranging from 3011 to 131013; and included a total of 13 011 participating clinicians. Twenty studies addressed treatments12- 30 (Table 1), 8 addressed tests or screening (5 cancer screening, 3 antenatal tests and screening)11,31- 37 (Table 2), and 20 addressed medical imaging38- 57 (Table 3). eTables 1-3 in the Supplement contain an expanded version of the tables presented herein, with participants’ responses to the expectation questions and the study authors’ correct estimates.
There were 181 outcomes relevant to this review: 41 (23%) assessed benefit expectations, 125 (69%) assessed harm expectations, and 15 (8%) in 3 studies evaluated expectations in which questions were posed neutrally (eFigure 2 in the Supplement). More studies assessed harm expectations alone (30 studies [67%]) than benefits alone (9 [20%]) or benefit and harm (6 [13%]). The 20 medical imaging studies assessed only harm expectations. All studies used survey methods (online, face-to-face, telephone, or postal). The response rate was 60% or more in 16 of the 34 studies (47%) reporting this.
Expectations of benefit were assessed in 15 studies. In 11 studies (total of 28 outcomes), participants’ responses were compared with the authors’ correct estimates (Figure 1), with most participants (≥50%) providing a correct estimate for only 3 outcomes (11%) (reduction in stroke risk with aspirin therapy for atrial fibrillation (AF),27 risk reduction of cardiovascular events with statin therapy,6 and reduction in breast cancer mortality from mammographic screening33). In studies with overestimation or underestimation data provided (22 outcomes), 50% or more of the participants overestimated benefit for 7 outcomes (32%) and underestimated benefit for 2 outcomes (9%).
Nineteen outcomes (from 5 studies) are not shown in Figure 1 because the authors either did not provide a correct estimate or the outcomes were measured in a way that precluded calculation of the proportion of patients who provided a correct estimation, overestimation, or underestimation (eg, a mean estimate of benefit; Table 1). For 11 of these outcomes, the authors drew no conclusion about participants’ responses. For 8 outcomes, the authors commented that participants overestimated the intervention’s benefit.
Expectations of harms were compared with authors’ correct estimates in 26 studies (69 outcomes), with most participants (≥50%) correctly estimating harm for only 9 outcomes (13%) (Figure 2).13,38,54,56,57 Most participants underestimated harm for 20 outcomes (34%) and overestimated harm for 3 outcomes (5%) of the 58 outcomes for which underestimation and overestimation data were available.
Twenty studies (37 outcomes) examined expectations (harm only) in medical imaging: abdominal computed tomography (CT),39,40,43- 48,50,52- 54,56,57 brain CT,43,57 any CT scan,38,41,49,55 radiograph,38,54,56,57 and other imaging tests.42,54,57 Harms were underestimated by most participants for 14 outcomes (38%), while the proportion who correctly responded ranged from 7% to 88%. Most participants correctly estimated the harm for only 8 outcomes (22%) and most did not overestimate harm for any outcomes.
Five studies examined screening for breast cancer,11,33,36 colorectal cancer,35 and prostate cancer.34 One study measured benefit and harm expectations,35 3 measured only benefit expectations,11,33,34 and 1 phrased the question neutrally.36 For the 8 benefit outcomes, the proportion of correct responses ranged from 17% to 56%, overestimation of benefit responses ranged from 1% to 56%, and underestimation responses ranged from 22% to 53%. For the 2 harm outcomes (risks of colonoscopy and flexible sigmoidoscopy35), the proportion of participants who underestimated harm (50% and 40% respectively) was higher than the proportion who gave correct responses (29% and 30%) and overestimated harm (10% and 18%).
Of the 7 studies (33 outcomes) on fetal and maternal medicine, 4 examined harm expectations of medications during pregnancy13,16,17,23 and 3 examined antenatal screening.31,32,37 Studies of medications assessed harm expectations only (28 outcomes) and studies of screening mostly assessed benefit expectations (4 outcomes), with the exception of 1 study,31 which evaluated both benefits and harms. Of the benefit outcomes, the proportion of participants who answered correctly was low (range, 11%-27%), with most people overestimating the accuracy of screening tests. For harm outcomes, the proportion of participants who correctly responded was low for most outcomes (range, 31%-36%, with the exception of 74% of the participants correctly responding about birth defect risk associated with paracetamol [acetaminophen] use in pregnancy13). Most harm outcomes were assessed by other means (eg, mean percentage risk), and teratogenic risk was consistently overestimated.
Five studies (11 outcomes) measured expectations about statins,6 warfarin for AF,20,25,27 aspirin for AF,27 or carotid endarterectomy.12 Six outcomes assessed benefit expectations: for 1 outcome (aspirin to reduce stroke risk in AF), 70% of the participants responded correctly27; for 1 outcome (carotid endarterectomy), 78% overestimated benefit12; and for another outcome (warfarin for AF), 47% underestimated benefit.27 For the 5 harm outcomes,20,25,27 58% of the participants correctly responded for 1 outcome (bleeding from warfarin).27 For 2 outcomes, the median risk of hemorrhage from warfarin was overestimated,20 and for the other 2 outcomes, a correct answer was not provided, although the authors commented that bleeding risk concerns seemed to prevail over stroke prevention.25
In addition to the cardiovascular surgery study12 reported above, 3 studies (25 outcomes) assessed expectations of surgery: liver transplant,29 prostatectomy,19 and 10 surgical procedures (Table 2).22 These studies evaluated only harm expectations. In the 2 questions about liver transplant, 90% or more of clinicians underestimated harm.29 In the study of 10 surgical procedures (death and a complication assessed for each), 27% of the estimates were considered correct, and underestimation and overestimation errors occurred about equally.22 One study (3 outcomes) presented expectations as mean percentage estimates of harm (range, 12%-93%), but did not compare this with the correct answer, only concluding that actual risks are low.19
In addition to the medication studies described above, 6 studies (46 outcomes) measured medication expectations: antibiotics for acute respiratory infections,21 psoriasis treatments,14 an antipsychotic,15 nonsteroidal anti-inflammatory drugs,26 and hormone replacement therapy.18,30 Two studies (16 outcomes) assessed benefit. For the 1 study on antibiotics for acute respiratory infections (6 outcomes)that provided a correct response, correct response rates ranged from 20% to 40%, with 50% or more of the participants overestimating benefit in 4 outcomes. For the 3 studies (14 outcomes) that assessed harm outcomes, only 1 study provided correct responses, and in all 3 outcomes, harm (risk of nonsteroidal anti-inflammatory drug–related gastrointestinal complications) was overestimated.26 For the other 2 studies, the authors concluded that there was variation in clinicians’ estimates of harm.14,15 Both studies of hormone replacement therapy expectations asked about the effect on various conditions.18,30 One study presented male and female clinician responses separately, concluding that participants overestimated the beneficial effects on heart disease and osteoporosis, and (female clinicians only) also Alzheimer disease.18 The other study concluded that physicians generally overestimated the risk of hormone replacement therapy, with the size of benefits and harms estimated correctly 28% of the time and overestimated 67% of the time.30
In addition to the studies on surgery described above, 1 study examined harm expectations of external beam radiation and brachytherapy for prostate cancer19; however, correct responses were not provided and the conclusions focused on comparative responses from radiation oncologists and urologists. One study assessed stem cell transplant; for 3 of the 4 benefit outcomes, benefit was overestimated, and for 9 of the 11 harm outcomes, harm was underestimated.24
To our knowledge, this is the first systematic review of clinician expectations of the benefits and harms of medical interventions. Clinicians rarely had accurate expectations of benefits or harms of the interventions, with inaccuracies in both directions, although they more often overestimated rather than underestimated benefits and underestimated rather than overestimated harms.
There are many possible explanations for these findings. Clinicians’ low awareness of actual benefits and harms of many interventions may reflect a preoccupation with pathophysiologic mechanisms of interventions58,59 rather than trial-derived effectiveness, or it may reflect not being taught, recalling, or keeping up-to-date with relevant evidence. However, being knowledgeable about contemporary evidence for interventions is difficult and compounded by the exponential growth in trials and systematic reviews,60 evidence scattered across hundreds of journals,61 dynamic nature of evidence for many interventions, difficulties in extrapolating accurately from trial evidence to individual patients, and the inherent uncertainty that accompanies benefit and harm estimates. Considering an estimate’s imprecision (eg, using CIs) and being aware of the credible range surrounding a point estimate is important and necessary for decision making. Only a few included studies considered the correct response to be an acceptable range of answers. Although the proportion of participants who answered correctly would have increased if more included studies had taken this approach, the wide variation in clinicians’ benefit and harm estimates would remain.
The finding of more instances of clinicians underestimating harms and overestimating benefits than the opposite provides some support for the existence of therapeutic illusion (“an unjustified enthusiasm for treatment on the part of both doctors and patients”62(p1328)), which is a proposed contributor to the inappropriate use of interventions.63 Other potential contributors include the often-misleading portrayal of intervention benefits and absence of harms data in journal articles64 and information from commercial sources, such as pharmaceutical advertisements in medical journals.65,66 Clinicians may seek evidence that supports interventions they believe to be effective and already use in a possible illustration of confirmation bias.63 One study in this review compared 2 specialties, finding that clinicians overwhelmingly recommend the intervention that they provide.19 Conversely, awareness of an intervention’s benefits and harms may subsequently influence the interventions that clinicians provide. Some of the included studies examined clinicians’ recommendations, finding that those with higher expectations of an intervention’s harm were less likely to recommend it.20,26
However, awareness of evidence may not be sufficient. Clinicians may inappropriately recommend an intervention for several reasons, including a lack of time or reimbursement for explaining the rationale for not doing something,67 financial incentives for providing the intervention,68 medico-legal concerns,68,69 and cognitive biases (eg, anticipated regret [the fear of missing something] and commission bias [the tendency toward action rather than inaction]).70 Clinicians’ desire to reassure patients (and themselves) may override the evidence, especially for investigations,69 even though this may actually do little to reassure patients with a low pretest probability of serious illness.71
The consequences of clinicians’ misapprehension of benefits and harms are considerable. Inaccuracies are likely an important contributor to the massive overuse of tests and treatments, particularly if clinicians’ overoptimistic expectations synergistically compound those of the patients. Conversely, when clinicians underestimate benefits and overestimate harms, optimal patient care may be compromised as effective interventions are underutilized.
We found much more focus on assessing expectations about harm than benefit (67% of studies measured harm expectations only) in contrast to our review5 of patient expectations where most studies (63%) focused on benefit expectations. Clinicians may be more sensitive to harming patients rather than just not providing benefit, which may stem from a fundamental concern of primum non nocere: the primary duty of doing no harm. Medicolegal concerns may also influence clinicians to place greater emphasis on the risks of not doing something rather than the risk of harm from intervening.72 Yet paradoxically, there is less evidence available about intervention harms than there is about benefits, with harms less frequently measured and reported in primary studies and systematic reviews.64
Medical imaging studies dominated the investigations identified, yet none assessed benefit expectations. Perhaps researchers undertaking these surveys assumed the benefits were self-evident and indisputable. Our findings of low knowledge and mostly underestimated harms are supported by a systematic review of CT knowledge.73 Termed the silent harm, proposed reasons for harm underestimation include the long delay in radiation-induced cancer, difficulty attributing the harm to a specific exposure, and lack of epidemiologic data specific to medical procedures.74 The harms examined were limited to radiation-induced cancer, with no consideration of other potential harms, such as those related to unnecessary testing, overdiagnosis, or overtreatment.
Few studies were balanced and assessed both benefit and harm expectations in this review and in our patient expectation review.5 Clinicians need accurate knowledge about both benefits and harms to enable unbiased discussion with patients. Presenting only one distorts informed decision making and can influence the decision. For example, older peoples’ willingness to receive medication for cardiovascular disease prevention is relatively insensitive to benefit, but highly sensitive to harms.75
Solutions for redress are not easy. Shared decision making is a logical mechanism for bringing evidence into consultations,5,76,77 but this requires clinicians to know the best current evidence about the benefits and harms of the interventions being contemplated. To facilitate discussions, clinicians need ready access to up-to-date, concise, and clear summaries of intervention benefits and harms. Some decision support tools, such as decision boxes,78 provide clinicians with evidence summaries. Of particular promise is the SHARE-IT (Sharing Evidence to Inform Treatment Decisions) tool from the MAGIC project, through which decision aids can be rapidly and semiautomatically generated from GRADE (Grading of Recommendations Assessment, Development, and Evaluation) evidence summaries in guidelines or systematic reviews.79 Of course, for most decisions, it is insufficient for clinicians to know just the benefits and harms since the benefit-harm tradeoff of each option varies and the best decision for each patient is one that is congruent with the individual’s values and preferences, necessitating that clinicians do not simply bombard patients with numbers, but rather encourage discussion and collaborative decision making.
This review has several strengths: no restrictions on language or study design, contact with authors for additional data, and diversity in the interventions, clinicians, and countries included. However, this diversity means heterogeneity, precluding calculation of summary estimates of the size of overestimates and underestimates. Some studies had insufficient information to enable calculation of proportions of participants providing correct estimates, overestimates, or underestimates; others did not provide the correct answers, although since they reported conclusions about the accuracy of participants' estimates, they must have been referring to correct answers. We took authors’ correct estimates at face value and did not attempt to verify whether the answers were based on the best evidence available at the time of that study. Some studies had small and/or selective samples, and the measures used to assess clinicians’ expectations were largely unvalidated. Because risk perception accuracy can vary according to how it is assessed, exploring expectation accuracy and consistency for different format and response options would be worthwhile.
Clinicians’ expectations of the benefits and harms of interventions can markedly influence the care that patients receive. If the benefits and harms are not known or communicated, effective interventions may be underused, low value interventions overused, and patients’ informed decision making hampered. Notwithstanding the challenges to doing so, addressing clinicians’ distorted perceptions about the benefits and harms of screening, tests, and treatments is critical to optimal patient care.
Corresponding Author: Tammy C. Hoffmann, PhD, Centre for Research in Evidence-Based Practice, Faculty of Health Sciences and Medicine, Bond University, Gold Coast, Queensland, Australia 4229 (firstname.lastname@example.org).
Accepted for Publication: October 10, 2016.
Published Online: January 9, 2017. doi:10.1001/jamainternmed.2016.8254
Author Contributions: Drs Hoffmann and Del Mar had full access to all data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Both authors.
Acquisition, analysis, or interpretation of data: Both authors.
Drafting of the manuscript: Both authors.
Critical revision of the manuscript for important intellectual content: Both authors.
Statistical analysis: Both authors.
Conflict of Interest Disclosures: Drs Hoffmann and Del Mar have received funding for research from the National Health and Medical Research Council of Australia and consultancy from the Australian Commission on Safety and Quality in Health Care, and Bupa, related to shared decision making.
Additional Contributions: Sarah Thorning, BSc, Grad Dip ILS (at Bond University at the time of the study), assisted with designing and conducting the searches of the electronic databases; Leanne McGregor, PhD (at Bond University at the time of the study), and John Rathbone, PhD (Bond University), assisted with screening search results; and Sharon Sanders, PhD, and Elizabeth Gibson, PhD (Bond University), checked data. There was no financial compensation. We thank the authors of the included studies who responded to our queries and provided additional details.