eAppendix 1. Draft AGREE-REX vs AGREE-REX Version 1 (V1)
eAppendix 2. AGREE-REX: Recommendation Excellence User’s Guide
Customize your JAMA Network experience by selecting one or more topics from the list below.
Identify all potential conflicts of interest that might be relevant to your comment.
Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.
Err on the side of full disclosure.
If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.
Not all submitted comments are published. Please see our commenting policy for details.
Brouwers MC, Spithoff K, Kerkvliet K, et al. Development and Validation of a Tool to Assess the Quality of Clinical Practice Guideline Recommendations. JAMA Netw Open. 2020;3(5):e205535. doi:10.1001/jamanetworkopen.2020.5535
Is it possible to create a tool to specifically evaluate the quality of clinical practice guideline recommendations?
In this cross-sectional study of 322 international stakeholders, the Appraisal of Guidelines Research and Evaluation–Recommendations Excellence (AGREE-REX) tool was developed to appraise guidelines for clinical practice. All participants rated the tool as usable and agreed that it represents a valuable addition to the clinical practice guidelines enterprise.
A panel of stakeholders agrees that the AGREE-REX tool may provide information about the methodologic quality of guideline recommendations and may help in the implementation of clinical practice guidelines.
Clinical practice guidelines (CPGs) may lack rigor and suitability to the setting in which they are to be applied. Methods to yield clinical practice guideline recommendations that are credible and implementable remain to be determined.
To describe the development of AGREE-REX (Appraisal of Guidelines Research and Evaluation–Recommendations Excellence), a tool designed to evaluate the quality of clinical practice guideline recommendations.
Design, Setting, and Participants
A cross-sectional study of 322 international stakeholders representing CPG developers, users, and researchers was conducted between December 2015 and March 2019. Advertisements to participate were distributed through professional organizations as well as through the AGREE Enterprise social media accounts and their registered users.
Between 2015 and 2017, participants appraised 1 of 161 CPGs using the Draft AGREE-REX tool and completed the AGREE-REX Usability Survey.
Main Outcomes and Measures
Usability and measurement properties of the tool were assessed with 7-point scales (1 indicating strong disagreement and 7 indicating strong agreement). Internal consistency of items was assessed with the Cronbach α, and the Spearman-Brown reliability adjustment was used to calculate reliability for 2 to 5 raters.
A total of 322 participants (202 female participants [62.7%]; 83 aged 40-49 years [25.8%]) rated the survey items (on a 7-point scale). All 11 items were rated as easy to understand (with a mean [SD] ranging from 5.2 [1.38] for the alignment of values item to 6.3 [0.87] for the evidence item) and easy to apply (with a mean [SD] ranging from 4.8 [1.49] for the alignment of values item to 6.1 [1.07] for the evidence item). Participants provided favorable feedback on the tool’s instructions, which were considered clear (mean [SD], 5.8 [1.06]), helpful (mean [SD], 5.9 [1.00]), and complete (mean [SD], 5.8 [1.11]). Participants considered the tool easy to use (mean [SD], 5.4 [1.32]) and thought that it added value to the guideline enterprise (mean [SD], 5.9 [1.13]). Internal consistency of the items was high (Cronbach α = 0.94). Positive correlations were found between the overall AGREE-REX score and the implementability score (r = 0.81) and the clinical credibility score (r = 0.76).
Conclusions and Relevance
This cross-sectional study found that the AGREE-REX tool can be useful in evaluating CPG recommendations, differentiating among them, and identifying those that are clinically credible and implementable for practicing health professionals and decision makers who use recommendations to inform clinical policy.
Clinical practice guidelines (CPGs) are systematically developed statements informed by a systematic review of evidence and an assessment of the benefits and harms of care options designed to optimize patient care.1-3 The potential benefits of CPGs, however, are only as good as their quality. Appropriate methods and rigorous development strategies are important factors in the successful implementation of CPG recommendations.4-10 Not all CPGs are alike; their quality is variable and often falls short of reported goals.11-19
The Appraisal of Guidelines, Research and Evaluation revision (AGREE II) tool has become an accepted international resource to evaluate the quality of CPGs and to provide a methodologic framework to inform CPG development, reporting, and evaluation.5-7,20-22 The AGREE II tool targets the entire CPG development process and all components of the CPG report: the articulation of scope and practice, who is involved, methods used, applicability, editorial independence, and clarity.
Since the release of AGREE II, studies have reported that high AGREE II scores do not guarantee that the resulting CPG recommendations are optimal.23-27 For example, Nuckols et al24 evaluated the technical quality and acceptability of 5 musculoskeletal CPGs. Use of the AGREE II tool resulted in high quality scores (eg, rigor domain scores >80%). However, participants reported that the CPGs omitted common clinical situations and contained recommendations of uncertain clinical validity. Similar results have been found with disability-related CPGs.26
These studies suggest that a distinction exists between user perceptions of a CPG report and the report’s recommendations. Hence, a barrier may exist if users rely solely on the AGREE II quality scores in making decisions about which CPG recommendations to implement or which CPGs to adapt to a specific context. For example, if a CPG provides insufficient information about the values of patients, health care professionals, and funders, or there is a lack of alignment across different viewpoints, that CPG may yield recommendations that are difficult to use and implement, even if the evidence base is solid or the methods used to create the CPG are of high quality. The CPGs that address controversial issues in which values clash (eg, medically assisted dying) may be especially susceptible to this concern. Inadequate consideration of different perspectives and varied implementation concerns are a common limitation in CPG appraisal tools.28
The development of AGREE II focused primarily on methodologic quality and internal validity of the CPG report and to a lesser extent on the external validity of the recommendations. A more thorough investigation of the implementation science literature and the usability and relevance of recommendations was warranted. Our international team of CPG developers and researchers created the AGREE-REX (Appraisal of Guidelines Research and Evaluation–Recommendations Excellence) tool to evaluate the quality of CPG recommendations specifically, defined as credible and implementable recommendations.
The development process used international standards of measurement design.29 Our first step required identification of candidate items. This step was completed and is described in previous studies.30,31 In brief, a realist review was conducted to identify attributes of CPGs associated with the implementation of their recommendations. The review resulted in the Guideline Implementability for Decision Excellence Model (GUIDE-M) that was vetted by the international CPG community.30 This multilevel model comprises 3 core tactics, 7 domains, and approximately 100 embedded components. The model was evaluated by 248 stakeholders from 34 countries and refined.
A core domain of the model (deliberations and contextualization) provided content coverage of our concept of CPG recommendation quality. The domain is composed of 3 subdomains, 11 attributes, and many subattributes and elements: clinical applicability (clinical, patient, and implementability relevance), values (perspectives of patient, health care professional, population, policy, developer), and feasibility (local, novelty, resources).
We derived candidate items from these data that 15 international CPG stakeholders evaluated. We used this feedback to refine the content and create the Draft AGREE-REX, used in this study (eAppendix 2 in the Supplement). The Draft AGREE-REX comprises 11 items (4 themes) and 2 overall items.
Three response scales were designed to rate each item of the Draft AGREE-REX. Two mandatory 7-point response scales (with 1 indicating strongly disagree and 7 indicating strongly agree) asked appraisers to rate the extent to which quality criteria are reported in the CPG (documentation scale) and then used to inform the CPG recommendations (consideration scale). An optional 7-point scale asked appraisers whether the documented and considered information aligned with, and was suitable for use in, their context (suitability scale). This scale was designed for use only when CPG recommendations from an authoring group are being considered for endorsement, adaptation, or implementation by another group. Two overall items asked appraisers for their overall ratings of the implementability of the CPG recommendations and their overall ratings of the clinical credibility of the CPG recommendations. Each item was answered according to a 7-point scale.
To test the Draft AGREE-REX tool, a cross-sectional study design was used. The CPG users, developers, researchers, or trainees were eligible to participate. Between December 2015 and March 2017, advertisements to participate were distributed through professional organizations (eg, the Guidelines International Network) as well as through the AGREE Enterprise social media accounts and their registered users. Given the nature of the recruitment strategy and the substantial number of cross-postings, an accurate number of individuals the advertisements reached is not available. Completion of the study implied consent and participants were offered a CAD$50 gift card. The study received ethics approval from the Hamilton Integrated Research Ethics Board.
The CPGs were selected from the National Guideline Clearinghouse of the Agency for Healthcare Research and Quality. Selection criteria were as follows: English language, published between 2013 and 2015, and length of core CPG document less than 50 pages.
The target sample size was calculated based on the interrater reliability outcome, assuming 2 raters per CPG, an intraclass correlation coefficient of 0.6, and a CI from 0.5 to 0.7. On the basis of these assumptions, 316 participants were required to appraise 158 CPGs. This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline for cross-sectional studies.
Participants were required to read a single CPG, evaluate the entire set of recommendations with the Draft AGREE-REX, and complete the AGREE-REX Usability Survey. Individuals who responded to the advertisement were sent an email with an invitation letter, an electronic copy of the Draft AGREE-REX, the CPG to which they were randomly assigned, and access to LimeSurvey to submit AGREE-REX appraisal scores and to complete the AGREE-REX Usability Survey. Reminder emails were sent to nonrespondents at 2-week intervals up to 3 times.
Using the three 7-point scales, participants were asked to rate the items, the instructions, the response scale, their ability to apply the tool, and its usefulness. For each Draft AGREE-REX item, ratings from the documentation scale and the considerations scale were calculated as a mean between the 2 appraisers. Strong positive correlations between the 2 rating scales emerged (defined as an r >0.90), and analyses produced identical patterns of results.
An overall AGREE-REX score was calculated by adding the mean item scores from the consideration scale and scaling the total as a percentage of the maximum possible score. These scores were used to assess the tool’s measurement properties. The AGREE-REX ratings of the CPGs appraised in the study have been reported.30
Two research staff members (K.S and K.K) with formal training and experience independently evaluated all the CPGs with the AGREE II tool. The AGREE II tool comprises 23 items within 6 domains. Each item is answered using a 7-point agreement scale with higher ratings indicating higher CPG quality.5 The AGREE II domain scores were used as part of the analytical framework to assess the performance of the Draft AGREE-REX.
Quantitative data were analyzed using SPSS software, version 24 (IBM Corp). Means and SDs for each of the items in the AGREE-REX Usability Survey were calculated. Cronbach α and correlations-if-item-deleted were calculated to assess the internal consistency of the items. Intraclass correlations were calculated for 2 to 5 appraisers using the Spearman-Brown reliability adjustment to assess the reliability of the overall AGREE-REX score.29,32,33 A 2-tailed P < .05 was considered as statistically significant.
Differentiating itself from the AGREE II tool, the AGREE-REX tool evaluates the quality of CPG recommendations, defined as the extent to which they are credible and implementable. Thus, to explore construct validity, correlations between the overall AGREE-REX score and the implementability score and the clinical credibility score were calculated, with the expectation that positive correlations would emerge. As an exploratory measure of discriminant validation, the correlations between the overall AGREE-REX score and AGREE II domain scores, assuming the mean scores across 4 raters and correcting for the attenuation in the correlation due to measurement error, were also calculated. The correlations of the former were expected to be larger than those of the latter. No standard for CPG recommendation quality currently exists; thus measures of criterion validity were not appropriate.23,32,33
Participants provided written feedback, and themes that emerged were noted. Formal thematic analysis was not undertaken.
Using the quantitative data and the written feedback from participants, the research team used an iterative process to refine the Draft AGREE-REX tool. This refinement was achieved through an in-person meeting, a feedback session with stakeholders at the 2017 Global Evidence Summit,34 and multiple teleconference meetings with the AGREE-REX team (2017-2019). Decisions were reached by consensus.
Of the 692 individuals who responded to the advertisement and were emailed a formal invitation, 322 (47.0%) completed the study. Of the 322 respondents, 202 (62.7%) were female, 252 (78.2%) had some experience with the AGREE II tool, 188 (58%) indicated that English was their first language, and 170 (53.8%) identified themselves as CPG developers (Table 1). Participants represented 6 geographic regions; 177 (55.0%) were from North America, 76 (24.0%) from Europe, 32 (10.0%) from South America, 24 (7.4%) from Asia, 7 (2.1%) from Africa, and 6 (2.0%) from Oceania.
As reported in Table 2 and Table 3, participants rated the survey items as easy to understand (with a mean [SD] ranging from 5.2 [1.38] for the alignment of values item to 6.3 [0.87] for the evidence item on the 7-point scale) and easy to apply (with a mean [SD] ranging from 4.8 [1.49] for the alignment of values item to 6.1 [1.07] for the evidence item on the 7-point scale). Participants rated the tool’s instructions on the 7-point scale as clear (mean [SD], 5.8 [1.06]), felt confident in applying the tool to a guideline (mean [SD], 5.1 [1.43]), regarded the tool as complete (mean [SD], 5.7 [1.18]), and agreed that the tool adds value to the CPG enterprise (mean [SD], 5.9 [1.13]). In addition, 229 (71%) of respondents intended to use the AGREE-REX tool for evaluation, 203 (63%) for endorsement, and 187 (58%) for development or reporting purposes.
Internal consistency of the items was high (Cronbach α = 0.94); deleting an item did not alter this finding. Interrater reliability predicted for the mean of 2 was 0.47, of 3 was 0.57, of 4 was 0.64, and of 5 was 0.69.
Correlation between the overall AGREE-REX score and the implementability score was 0.81 and between the overall AGREE-REX score and the clinical credibility score was 0.76 and more robust than the correlations between the overall AGREE-REX score and each of the AGREE II domain scores (for example, r = 0.10 for clarity of presentation and r = 0.43 for applicability) (Table 4).
Participants offered wording changes and editorial suggestions to help clarify concepts and ideas. Core themes emerged in the written feedback. For Draft AGREE-REX and AGREE II, some participants articulated concerns about how to use both tools, potential redundancy, and lack of instruction. Some participants preferred having the tools separate and others suggested they be integrated. For Draft AGREE-REX content and usability, participants articulated challenges in applying some items in the values theme and offered suggestions for clarity. Most participants did not like the 2 response scales or could not differentiate the intent between them.
Based on the study results and feedback from participants, changes were made to the tool. Table 5 lists the final items and criteria. eAppendix 1 in the Supplement compares the draft with the final version 1 of the tool and eAppendix 2 provides the entire AGREE-REX User’s Guide.
The original 11 items were edited to 9 items (2 items combined and 1 item deleted) and clustered into 3 conceptual categories: clinical applicability, values, and implementability.
The original 3 response scales were modified to 2. The mandatory quality assessment scale asked appraisers to rate on the 7-point scale the overall quality of the item by considering whether the item criteria were addressed in the CPG and influenced the recommendations—for example, the extent to which data on the values and preferences of the various stakeholders were obtained and reported and extent to which these data were explicitly considered in formation of the recommendation.
The optional 7-point suitability for use scale is appropriate when a CPG is being considered for endorsement, adaptation, or implementation. This response scale considers whether the content of the criteria and its consequences for recommendations align with what would be expected in the context in which the CPG recommendations would be applied—for example, whether the potential users of a CPG perceive that the values and preferences of patients and policy makers collected and used to inform the CPG recommendations align with those in their own context. Appraisers are asked to rate the suitability for use in their setting/context.
In response to feedback, the 2 overall assessment questions (implementability and clinical credibility) were replaced by 2 new overall assessment questions to align with the AGREE II overall assessment items. The first new question (required) asked raters whether they would recommend the CPG for use in an appropriate context and the optional second new question asked raters whether they would recommend the CPG for use in their own context. A categorical response scale of yes, yes with modifications, and no is used to answer these assessment questions.
There was debate whether to integrate the new items into the existing AGREE II or have a separate AGREE-REX tool. A decision was made to create a separate tool to provide optimal flexibility to potential users. A resource to provide directions for use of the AGREE suite of tools has been written (M. C. Brouwers, PhD, unpublished data, 2020).
Overall, results of the study indicated that AGREE-REX is a usable, reliable, and valid tool to evaluate CPG recommendations. The AGREE-REX tool is a complement rather than an alternative to the AGREE II tool. The AGREE II tool focuses on the quality of the entire CPG process. The AGREE-REX tool focuses specifically on the quality of the CPG recommendations.
We believe that AGREE-REX will be a useful tool to evaluate CPG recommendations (single, bundle), differentiate among them, and identify those that are clinically credible and implementable for practicing health professionals and decision makers who use recommendations to inform clinical policy. Appraising a CPG with the AGREE II tool and the AGREE-REX tool may help provide information about the methodologic quality and the quality of the guideline recommendations. The appraisal step using both tools may help mitigate challenges in moving directly to costly and complex implementation commitments with CPGs that may lack rigor and suitability to the setting in which they are to be applied.
In addition to the evaluation version of the tool, we have created the AGREE-REX Reporting Checklist, which can be used to inform development and reporting standards. The criteria used for evaluation purposes are presented as quality concepts to be included and documented in the CPG as it is being developed and, moreover, to inform the development protocol. The checklist will help identify specific operational strategies to meet AGREE-REX quality criteria to incorporate from the outset. For example, the well-designed Evidence to Decision Framework reflects the utility of some of the AGREE-REX concepts.35 In addition, the checklist can help researchers prioritize when there is an absence of rigorous and feasible operational methods so efforts can be directed to address those gaps.
The recently released Clinical Practice Guidelines Applicability Evaluation (CPGAE-V1.0) also addresses this area. Designed to evaluate CPG applicability,36 the CPGAE-V1.0 has been used to assess traditional Chinese medicine guidelines but has not yet been tested by the international community, nor have its measurement properties been explored. Similarly, the recently released National Guideline Clearinghouse Extent of Adherence to Trustworthy Standards (NEATS instrument) is designed to measure CPG adherence to the Institute of Medicine standards for trustworthy guidelines.37 The methods of development and scope of these tools are different; nonetheless, investigating how the AGREE-REX tool and these tools complement each other may be a valuable area of inquiry.
Strengths of the AGREE-REX tool include the use of methodologic standards of measurement design in its development29,32,33; the use of multidisciplinary literature as a basis for the concepts underpinning AGREE-REX30,31; and its development by a multidisciplinary international research team and engagement of 322 internationally representative participants involved in CPGs. The participants reaffirmed the need for this tool, and their participation was vital to ensure that the resource was tailored to the needs of the international CPG communities.
This study has limitations. The measurement properties and usability surveys were performed with the penultimate draft version of the tool. Financial considerations prohibited the repetition of the studies to confirm that the changes made to the AGREE-REX tool were associated with improvements in measurement properties and usability. Nonetheless, we believe that decisions for modifications made were informed by evidence. Capturing information from in-the-field experiences on an ongoing basis will be essential in continuing to develop the evidence base to support use of the AGREE-REX tool. Additional supporting materials (eg, training tools) are being developed to improve interrater reliability of the tool. Another limitation is the criteria used to select the CPGs (<50 pages, English language only) and that the tool was applied to the whole set of recommendations in each report. Although the tool, and not the CPGs themselves, was the object of study, the criteria and unit of recommendation may affect the perceptions of the tool and its measurement properties. Continued application to a range of CPGs is required to better assess its generalizability.
The results of this study suggest that AGREE-REX is a reliable, valid, and usable tool designed to evaluate CPG recommendations specifically. It is a complement to the AGREE II tool.
Accepted for Publication: March 19, 2020.
Published: May 27, 2020. doi:10.1001/jamanetworkopen.2020.5535
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2020 Brouwers MC et al. JAMA Network Open.
Corresponding Author: Ivan D. Florez, MD, MSc, Department of Pediatrics, University of Antioquia, Calle 67, No. 53 – 108, Medellín 0500001, Colombia (email@example.com).
Author Contributions: Dr Brouwers had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Brouwers, Spithoff, Alonso-Coello, Burgers, Cluzeau, Férvers, Graham, Grimshaw, Kastner, Qaseem, Straus, Florez.
Acquisition, analysis, or interpretation of data: Brouwers, Spithoff, Kerkvliet, Burgers, Hanna, Kho, Qaseem, Straus, Florez.
Drafting of the manuscript: Brouwers, Burgers, Straus.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Brouwers, Kerkvliet, Alonso-Coello, Qaseem, Straus, Florez.
Obtained funding: Brouwers, Graham, Straus.
Administrative, technical, or material support: Kerkvliet, Straus, Florez.
Supervision: Brouwers, Spithoff, Burgers, Straus.
Other - International steering committee: Férvers.
Conflict of Interest Disclosures: Dr Brouwers reported receiving grants from the Canadian Institute for Health Research during the conduct of the study. Mss Spithoff and Kerkvliet reported receiving grants from the Canadian Institute for Health Research during the conduct of the study. Dr Burgers reported serving as Trustee of the AGREE Research Trust from 2004 to 2014. No other disclosures were reported.
Funding/Support: This project was funded by the Canadian Institutes of Health Research, grant 201209MOP-285689-KTR-CEBA-40598.
Role of the Funder/Sponsor: The funding source had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Additional Contributions: The authors thank the following individuals for their contributions, advice, and input into this project: Onil Bhattacharyya, MD, PhD, University of Toronto, Canada; George Browman, MD, MSc, FRCPC, Retired, Canada; Anna Gagliardi, PhD, University of Toronto, Canada; Peter Littlejohns, MD, FRCP, King’s College London, United Kingdom; Holger Schunemann, MD, PhD, McMaster University, Canada; Louise Zitzelsberger, PhD, Health Canada, Canada. Contributors advised on the concept and proposed protocol and the early stages of the development of the beta version of the tool. No contributor was financially compensated, and all contributors provided permission to be acknowledged.
Additional Information: The AGREE suite of tools is available on the AGREE Enterprise website (http://www.agreetrust.org).
Create a personal account or sign in to: