Comparison of 41 guidelines using percentage distribution of quality of evidence underlying individual recommendations. ART indicates antiretroviral therapy; HIV, human immunodeficiency virus; and OI, opportunistic infection. *Used a grading system that constituted a modification of the standard Infectious Diseases Society of America evidence-grading system.
Comparison of 5 recently updated guidelines with their respective previous versions. The total number of individual recommendations found is graphically depicted for each guideline pair and according to the quality of underlying evidence. See the Table footnote for an explanation of levels.
Lee DH, Vielemeyer O. Analysis of Overall Level of Evidence Behind Infectious Diseases Society of America Practice Guidelines. Arch Intern Med. 2011;171(1):18–22. doi:10.1001/archinternmed.2010.482
Clinical practice guidelines are developed to assist in patient care. Physicians may assume that following such guidelines means practicing evidence-based medicine. However, the quality of supporting literature can vary greatly.
We analyzed the strength of recommendation and overall quality of evidence behind 41 Infectious Diseases Society of America (IDSA) guidelines released between January 1994 and May 2010. Individual recommendations were classified based on their strength of recommendation (levels A through C) and quality of evidence (levels I through III). Guidelines not following this format were excluded from further analysis. Evolution of IDSA guidelines was assessed by comparing 5 recently updated guidelines with their earlier versions.
In the 41 analyzed guidelines, 4218 individual recommendations were found and tabulated. Fourteen percent of the recommendations were classified as level I, 31% as level II, and 55% as level III evidence. Among class A recommendations (good evidence for support), 23% were level I (≥1 randomized controlled trial) and 37% were based on expert opinion only (level III). Updated guidelines expanded the absolute number of individual recommendations substantially. However, few were due to a sizable increase in level I evidence; most additional recommendations had level II and III evidence.
More than half of the current recommendations of the IDSA are based on level III evidence only. Until more data from well-designed controlled clinical trials become available, physicians should remain cautious when using current guidelines as the sole source guiding patient care decisions.
For centuries, medicine was taught in a purely authoritarian manner and was practiced by following expert advice. Testing of treatment modalities for their efficacy was reported as early as the 11th century (see The Canon of Medicine by Ibn Sīnā1); nevertheless, interventions of clear benefit to patients were rarely endorsed. For example, despite Semmelweis' observation in 1847 that hand washing could abort outbreaks of puerperal fever, most physicians at the time did not adhere to the practice.2 Only through the pioneering efforts of A. Cochrane and a research group at McMaster's University led by David Sackett, MD, FRCP, and Gordon. Guyatt, MD, FRCP, in the 1950s has medicine since embraced evidence-based practices.
During the past half century, a deluge of publications addressing nearly every aspect of patient care has both enhanced clinical decision making and encumbered it owing to the tremendous volume of new information. Clinical practice guidelines were developed to aid clinicians in improving patient outcomes and streamlining health care delivery by analyzing and summarizing data from all relevant publications.3- 5 Lately, these guidelines have also been used as tools for educational purposes, performance measures, and policy making.6
Interest has been growing in critically appraising not only individual clinical practice guidelines but also entire guideline sets of different medical (sub)specialties.7- 10 We assessed the overall quality of evidence underlying recommendations outlined in existing guidelines from the Infectious Diseases Society of America (IDSA).
The IDSA guidelines use the IDSA–US Public Health Service grading system (henceforth referred to as the IDSA evidence-grading system).11 In this system, each recommendation is graded according to its strength and the underlying quality of evidence. Strength of recommendation includes levels A through C (ie, A indicates good evidence to support recommendation for use; B, moderate evidence to support recommendation; and C, poor evidence to support recommendation); some guidelines also included levels D (moderate evidence to support recommendation against use) and E (good evidence to support recommendation against use). These guidelines were mostly released before 2008. Quality of evidence ranges from level I through III (I indicates evidence from ≥1 properly randomized controlled trial; II, evidence from ≥1 well-designed clinical trial, without randomization, from cohort or case-controlled analytical studies or from dramatic results from uncontrolled experiments; and III, evidence from opinions of respected authorities based on clinical experience, descriptive studies, or reports of expert committees). For ease of analysis, we merged all recommendations labeled D and E (found mainly in pre-2008 guidelines) into categories B and A, respectively. We read and tabulated all current guidelines posted on the IDSA Web site (http://www.idsociety.org) as of May 2010. Designations of strength of recommendation and quality of evidence linked to each recommendation were extracted. Special attention was given to avoiding duplication in the counting of recommendations listed more than once (ie, listed in the summary section or in table format in addition to the body of a given guideline). Guidelines cosponsored by other societies were included in the analysis if the IDSA evidence-grading system was used with only minor modifications. Publications not adhering to the IDSA evidence-grading system were excluded from subsequent analysis to maintain objectivity of data compilation. For comparison of the overall quality of evidence between individual guidelines, we used relative proportions rather than absolute numbers because the latter varied greatly between guidelines.
To study the evolution of IDSA guidelines across time, we compared 5 recently published guidelines, updated between January 2008 and May 2010, with their respective older versions. We compiled data on the percentage of new references (ie, citations with a publication date after release of the earlier guidelines) and on the total number of individual recommendations and the number of level I, II, and III quality-of-evidence designations for each of the 5 guideline pairs.
Between January 1994 and May 2010, the IDSA released 90 guidelines covering a wide range of topics. As of May 2010, fifty-two current guidelines were listed on the IDSA Web site (http://www.idsociety.org). Of these guidelines, 41 (79%) used the IDSA evidence-grading system for individual recommendations,11 some with minor modifications (2 guidelines), and could, therefore, be analyzed in more detail (Figure 1).
Twenty-one of the 41 guidelines (51%) cover a new topic and 20 (49%) are updates of earlier publications. Two guidelines had been updated twice. The mean time between original and updated versions was 6.7 years (range, 1-15 years). The total number of individual recommendations per guideline ranged from 4 to 864, with a median of 48. A mean of 13 authors (range, 4-66) contributed to each guideline.
We identified 4218 individual recommendations, which we charted according to strength of recommendation and quality of evidence (Table). Forty-three percent of recommendations (n = 1796) were designated as strength A, 43% (n = 1819) as strength B, and 14% (n = 603) as strength C recommendations. Among all level A strength recommendations (“good evidence to support a recommendation for use”) less than one-quarter were supported by level I quality of evidence (ie, designated as A-I recommendation). A global look at the overall distribution of the quality of evidence underlying all 4218 recommendations showed that only 14% were linked to level I, whereas more than half were supported by level III quality of evidence only (Table).
We then compared the distribution of recommendation grading according to quality of evidence between individual guidelines (Figure 1). Again, level I quality of evidence was seen in only 1 of 6 recommendations per guideline (median, 15%; interquartile range, 6%-24%), whereas half of the recommendations (50%; 43%-64%) were supported by level III quality of evidence only. Guidelines on surgical prophylaxis (published in 1994) had the highest percentage of level I recommendations (46%), followed by guidelines on travel medicine (41%) (published in 2006) and asymptomatic bacteriuria (38%) (published in 2005). In contrast, for blastomycosis (published in 2008) and sporotrichosis (published in 2007), more than 80% of all recommendations were based on level III evidence only, and level I support was lacking entirely.
Finally, we looked at the extent to which the quality of evidence supporting IDSA guidelines improved across time. We selected 5 guidelines that had recently been updated and compared these with their respective earlier versions. In all but 1 guideline pair, the total number of cited articles increased in the newer guidelines, in 1 case 5-fold. On average, new publications constituted 53% (range, 34%-65%) of all referenced citations in the updated guidelines. For each guideline pair, the total number of recommendations increased with the update, ranging from 20% to 400% (Figure 2). However, only 2 updated guidelines had a significant increase in the number of level I quality-of-evidence recommendations; most additional recommendations were supported by level II or III quality of evidence only.
In 1990, the Institute of Medicine proposed the development of guidelines to reduce inappropriate variation in the provision of health care by assisting patient and practitioner decision making.12 Since then, for the field of infectious diseases alone, 90 guidelines summarizing a large body of published data have been sponsored by the IDSA, and as of May 2010, fifty-two current guidelines can be found on their Web site.
In daily clinical work, practitioners sometimes assume that adhering to practice guidelines means practicing evidence-based medicine. However, individual recommendations within published guidelines may not be supported by high-quality evidence. We examined all 41 current IDSA guidelines that followed the IDSA evidence-grading system.11 Of the 4218 individual recommendations found, only 14% were supported by the strongest (level I) quality of evidence; more than half were based on level III evidence only. Although on average updated guidelines contained newly published studies to 50% in their reference lists and although updates contained substantially more individual recommendations, only 2 of 5 new guidelines had a significant increase in level I recommendations.
There are several possible reasons to explain these findings. Recently, Tricoci et al10 assessed the scientific evidence underlying the clinical practice guidelines in cardiology. They mainly analyzed how the American College of Cardiology and the American Heart Association guidelines have evolved across time and found results that are similar to the present data. In their study, only 11% of recommendations were supported by the highest level of evidence, whereas 48% were based on expert opinion, case studies, or standards of care only. Tricoci et al speculate that “guidelines in other medical areas in which large clinical trials are performed less frequently, may have an even weaker evidence-based foundation.”10(p835) Indeed, in contrast to other subspecialties of internal medicine, in the field of infectious diseases, relatively few large multicenter randomized controlled trials (RCTs) have been conducted, with the notable exception of antiretroviral therapy trials in human immunodeficiency virus (HIV) care. Many infectious diseases occur infrequently, present in a heterogeneous manner, or are difficult to diagnose with certainty. For others, an RCT would be impractical or wasteful or might be deemed unethical. Such examples might include the study of the usefulness of tick bite avoidance through physical distance for the prevention of Lyme disease, the utility of hand hygiene to reduce nosocomial infections, or the use of cesarean delivery over vaginal delivery to reduce the risk of vertical HIV transmission. Also, RCTs are costly, and the lack of resources and funding in the field clearly poses an obstacle to obtaining a level I recommendation for many management decisions. Travel medicine guidelines had one of the highest percentage points of recommendations supported by level I quality of evidence. This could be because vaccines and medications for prophylaxis and treatment of travel-related illnesses require RCTs before Food and Drug Administration approval and that there are global efforts to combat many of the diseases highly prevalent in the tropics. Conversely, guidelines for the management of endemic fungal infections lacked level I recommendations entirely. The infrequent occurrence of these diseases and the difficulty of an accurate diagnosis make it difficult to conduct RCTs.
A second reason for the scarcity of level I quality-of-evidence recommendations may be the use of the IDSA evidence-grading system.11,13 This system was originally proposed to evaluate the effectiveness of preventive health care interventions in Canada.14 It requires at least 1 supporting RCT for a level I recommendation. Many IDSA recommendations, however, address questions about diagnosis or prognosis (neither of which can be studied using an RCT and, thus, could never receive the highest-level recommendation). Other recommendations endorse obvious interventions, such as hand hygiene, for which no RCT will ever be conducted. Finally, not all RCTs leading to a level I designation are of the same quality. Some may have used surrogate markers as an outcome measure, some may have had small sample sizes, and others may have been poorly conducted. Well-designed nonrandomized studies, on the other hand, may yield solid information but nonetheless cannot lead to a level I recommendation using the current evaluation system. The GRADE (Grading of Recommendations Assessment, Development, and Evaluation) system,15- 19 first proposed in 2004, offers a potential solution. It has 4 quality-of-evidence categories (high, moderate, low, and very low) that are based solely on the likelihood of further research being able to challenge the confidence in the estimate of effect. The label “high” could, thus, be allocated in the absence of an RCT. The grade system was recently applied for grading recommendations in the HIV management guidelines released by the World Health Organization in November 2009.20 We believe that standardization and use of a single grading system across all fields of medicine would make it easier for the busy clinician to interpret individual recommendations. For the current 52 guidelines endorsed by the IDSA, 6 different grading systems were used.
Ideally, clinical guidelines should state clearly and concisely all important decision options and outcomes; include information about diagnostic tests, prognosis, treatment, harm, and economic analyses whenever applicable; identify, validate, and correctly display all relevant underlying evidence; and be resistant to clinically sensible variations in practice.21- 23 Publishers of guidelines themselves, however, do not universally follow guidelines on how to publish guidelines.9,24 Although quality analysis was not part of this study, we came across imprecisions on more than 1 occasion and for more than 1 guideline, including illogical, erroneous, or missing references for recommendations and their associated grades, including A-I recommendations. Inaccurate or wrong citations can be found relatively frequently in the medical literature, as has been reported previously.25,26 The creation of practice guidelines is labor and time intensive, and the final product becomes a frequently used reference guide. To improve the overall quality and, thus, usefulness of practice guidelines, we ponder how, in addition to having a more standardized grading system, a more accurate proofreading mechanism especially for cited articles underlying the recommendations' grading could be implemented.
This study has several limitations. First, we did not include all current IDSA guidelines in the analysis. Excluded guidelines, however, did not use the IDSA evidence-grading system; hence, their analysis would have introduced subjectivity into the analysis. We believe that their inclusion would not have changed the conclusions. Second, the analysis was purely statistical and descriptive, and the primary cited literature was not evaluated. Third, although several national and international organizations have, likewise, published guidelines pertinent to the field of infectious diseases, we limited this study to guidelines put forth by the IDSA, considered the preeminent organization in the field worldwide.
In the era of managed care, guidelines are used increasingly not only for decision making in clinical practice but also as benchmarks in the appraisal of quality of health care provision. A thorough understanding of their inherent limitations is, therefore, critical. This analysis of IDSA guidelines showed a relative paucity of recommendations supported by level I quality of evidence and a rate of more than 50% of all recommendations supported by low-level evidence only. What are the implications of these findings for busy clinicians who manage patients with infectious diseases? We believe that the current clinical practice guidelines released by the IDSA constitute a great and reliable source of information that should be used. However, in circumstances when patient outcome is less than desirable, or when colleagues use diagnostic or therapeutic choices not included in the recommendations, it is prudent to remember that many of the individual recommendations are not supported by solid evidence. In such cases, we encourage reviewing the primary literature and using one's clinical judgment rather than relying solely on recommendations.
Guidelines can only summarize the best available evidence, which often may be weak. Thus, even more than 50 years since the inception of evidence-based medicine, following guidelines cannot always be equated with practicing medicine that is founded on robust data. To improve patient outcomes and minimize harm, future research efforts should focus on areas where only low-level quality of evidence is available. Until more data from such research in the form of well-designed and controlled clinical trials emerge, physicians and policy makers should remain cautious when using current guidelines as the sole source guiding decisions in patient care.
Correspondence: Ole Vielemeyer, MD, Division of Infectious Diseases and HIV Medicine, Department of Medicine, Drexel University College of Medicine, 245 N 15th St, NCB 6306, Mailstop 461, Philadelphia, PA 19102 (firstname.lastname@example.org).
Accepted for Publication: June 14, 2010.
Author Contributions:Study concept and design: Lee and Vielemeyer. Acquisition of data: Lee and Vielemeyer. Analysis and interpretation of data: Lee and Vielemeyer. Drafting of the manuscript: Lee and Vielemeyer. Critical revision of the manuscript for important intellectual content: Vielemeyer. Statistical analysis: Lee. Study supervision: Vielemeyer.
Financial Disclosure: None reported.
Previous Presentation: This study was presented in part at the 47th IDSA Annual Meeting; October 31, 2009; Philadelphia, Pennsylvania.
Additional Contributions: Diana Winters, BA, Drexel University College of Medicine Academic Publishing Services, provided invaluable input during the editing process of the manuscript.