Development and Validation of a Tool to Assess the Quality of Clinical Practice Guideline Recommendations

Key Points Question Is it possible to create a tool to specifically evaluate the quality of clinical practice guideline recommendations? Findings In this cross-sectional study of 322 international stakeholders, the Appraisal of Guidelines Research and Evaluation–Recommendations Excellence (AGREE-REX) tool was developed to appraise guidelines for clinical practice. All participants rated the tool as usable and agreed that it represents a valuable addition to the clinical practice guidelines enterprise. Meaning A panel of stakeholders agrees that the AGREE-REX tool may provide information about the methodologic quality of guideline recommendations and may help in the implementation of clinical practice guidelines.


OVERVIEW: AN INTRODUCTION TO THE AGREE-REX BACKGROUND
Clinical practice guidelines are systematically developed statements informed by a systematic review of evidence and an assessment of the benefits and harms of alternative care options with the aim of optimizing patient care. They are informed by research evidence, values, and local/regional circumstances and inform decisions and judgements about health care at the clinical, management and policy levels 1,2 .
The AGREE II has become an international methodological resource to inform guideline development, reporting, and evaluation 3 . Meeting rigorous methodological requirements is necessary but not sufficient to ensure that guideline recommendations are clinically credible or implementable. In response, and informed by research evidence and the participation of the international guideline community, the AGREE-REX (Appraisal of Guidelines REsearch and Evaluation -Recommendations EXcellence) was designed.
The AGREE-REX is a valid and reliable tool to assess the quality of guideline recommendations and a strategy to inform their development and reporting. The AGREE-REX aims to optimize the quality of guideline recommendations, defined as recommendations that are clinically credible, trustworthy, and implementable.

The AGREE-REX is a complement to the AGREE II.
The AGREE-REX addresses three factors that must be considered to ensure that guideline recommendations are of high quality. We define high quality recommendations as those that are clinically credible, trustworthy, and implementable. The three factors are: • Clinical credibility of the recommendations based on the available evidence and its appropriateness for the target users, context, and patients/populations; • Consideration of values of all relevant stakeholders in the formulation of the recommendations; • Implementability of the recommendations.
The AGREE-REX can be applied to guidelines targeting any clinical or health topic and targeting any step in the health care continuum (health promotion, prevention, screening, diagnosis, treatment/intervention, and follow-up).

DEVELOPMENT OF THE AGREE-REX
Development of the AGREE-REX was led by an international team of practice guideline, knowledge translation, and methodology experts and researchers. A realist literature review was conducted to identify characteristics of guidelines that influence their implementability. The result of this work, the Guideline Implementability for Decision Excellence Model (GUIDE-M) 4,5 , served as the basis for generating the AGREE-REX items. This was followed by a series of evaluations and refinements to establish the instrument's usability, reliability, and validity that involved hundreds of individuals in the guideline community world-wide.

AGREE-REX USERS
The AGREE-REX is intended for use by the following stakeholder groups: • By guideline developers to evaluate existing guidelines to determine which are of adequate quality and appropriate for application or adaptation to their own context. • By guideline developers to provide a methodological blueprint for de novo development that will yield high quality recommendations; • By health care providers who wish to undertake their own assessment to ensure guidelines recommendations are appropriate for adoption in their clinical setting; • By policy makers, health care administrators, program managers and professional organizations to help them decide if guideline recommendations are appropriate to inform clinical practice strategies and policy design; • By researchers who wish to assess the quality of guideline recommendations in a particular topic area; • By guideline database administrators to assess the quality of guideline recommendations before inclusion in their database; and • By educators to teach critical appraisal skills and core competencies in guideline recommendation development and reporting. • By any stakeholder interested in supporting the improvement of practice guideline recommendation development, reporting, and evaluation.

AGREE-REX DOMAINS, ITEMS, AND CRITERIA
The AGREE-REX consists of nine items organized within three theoretical domains (Table 1), each focusing on a different factor that influences the quality of guideline recommendations. Each of the nine items has an operational definition and a list of specific criteria that characterize the concept. The number of criteria across the items ranges between 2 and 10.

HOW TO USE THE AGREE-REX: IN BRIEF
The AGREE-REX can be used for evaluation purposes to determine the degree to which guideline authors optimize the quality of the recommendations. It can also be used to inform guideline development and reporting requirements.

How To Use The AGREE-REX For Evaluation Purposes
The AGREE-REX includes two evaluation statements for each of the nine items. The first evaluation statement assesses whether the criteria that define each item were considered in formulating the recommendations and asks the user to rate the overall quality of this item. The second evaluation statement (optional) assesses the suitability or appropriateness of the guideline recommendations for a particular setting. Both items are answered using a 7-point response scale (1 [lowest quality] to 7 [highest quality]).
Depending on the needs of the user, the AGREE-REX can be applied to each individual guideline recommendation (or a prioritized set of individual recommendations), once to a group of guideline recommendations (e.g. a cluster of recommendations addressing a similar topic), or once to all guideline recommendations as a whole. Decisions about the level of AGREE-REX assessment should be based on the user's judgement.

How To Use The AGREE-REX For Development and Reporting Purposes
The AGREE-REX item criteria can serve as a blue print by identifying the quality concepts that should be considered and incorporated into the development process and reported in the final guideline document. Determining any criteria that are not relevant to a particular guideline project should be done at the outset and a rationale for these decisions provided in the final guideline document.

How To Use The AGREE-REX With Other AGREE Tools
The AGREE-REX is a complement to the AGREE II (and the AGREE Global Rating Scale [GRS]). Whereas the AGREE II and AGREE GRS consider the entire guideline process, the AGREE-REX focuses specifically on the development and reporting of guideline recommendations. While there is no standard or required way to use the AGREE tools in combination, our recommendations are provided below: • A combination of the AGREE Reporting Checklist and the AGREE-REX Reporting Checklist are recommended for use to support guideline development and reporting goals. • Application of either the AGREE II or the AGREE GRS and the AGREE-REX are recommended to support evaluation goals. • If the evaluation goals also include an interest in choosing or prioritize among candidate guidelines, the following strategies are proposed to make the process more efficient: 1. Apply either the AGREE II or the AGREE GRS to narrow down a candidate list of guidelines that meet a minimum methodological threshold (e.g., a minimum of 50% on item or domain ratings) and then apply the AGREE-REX. This approach would be most appropriate if a user would not consider any guideline that did not meet minimum methodological development standards. 2. Apply the AGREE-REX to narrow down the list of guidelines that meet a minimum recommendation quality threshold (e.g., a minimum of 50% of the overall AGREE-REX score) and then apply the AGREE II or the AGREE GRS. This approach would be appropriate for a user who would not consider any guideline that did not meet a minimum recommendation quality score.

ADDITIONAL RESOURCES
The AGREE-REX has been developed with the assumption that the user is familiar with basic evidencebased practice principles and the key components of a clinical practice guideline. If you are new to practice guidelines and would like more information, foundational resources include: • Appraisal of Guidelines Research and Evidence (AGREE), www.agreetrust.org • Grading of Recommendations Assessment, Development, and Evaluation (GRADE), www.gradeworkinggroup.org • Guidelines International Network (G-I-N), www.g-i-n.net Additional resources to assist with the application of the AGREE-REX will be made available on the AGREE Enterprise website at www.agreetrust.org as they are developed.

INSTRUCTIONS: AGREE-REX
These instructions have been designed to assist users in the application of the AGREE-REX and should be reviewed before applying the tool.

Review and Preparation
Before applying the AGREE-REX, a complete review of the guideline document and any additional supporting information within the document (e.g., tables, appendices) or published separately (e.g., methodological protocol) is required.

Level of Recommendation: Single, Cluster, or All
The AGREE-REX can be applied to assess the formation of a single (or prioritized) recommendation, a group or cluster of recommendations, or all the recommendations at once in a guideline document. A decision regarding level of recommendation should be made a priori, before evaluation begins and the rationale for the choice should be reported. Below is a list of considerations that can guide decisions about the level of recommendations to which the AGREE-REX should be applied.
Application of the AGREE-REX to a single recommendation or group of recommendations is most appropriate when: • The AGREE-REX user believes that quality may vary between recommendations in the guideline being assessed; or, • Only selected recommendations (or a single recommendation) are of interest and are being considered for adaptation, endorsement, or implementation.
Application of the AGREE-REX to all the guidelines recommendations is most appropriate when: • The AGREE-REX user believes that quality is consistent between recommendations in the guideline being assessed; or, • All guideline recommendations are of interest and are being considered for adaptation, endorsement or implementation; or, • Resource and time constraints make it impractical to evaluate each recommendation (or group of recommendations) separately.

Rating Scale and Assessment Process
The AGREE-REX includes two evaluation statements for each item: one to assess overall quality (required) and one to asses suitability for use (optional). It also includes two overall assessment statements to apply to the whole guideline (again, one required and one optional).

Quality Assessment:
Rate the overall quality of this item. This evaluation statement should be applied to determine whether criteria to optimize clinically credibility, trustworthiness, and implementability were considered in formulating the recommendations. All items are rated using a 7-point scale (1 [lowest quality] to 7 [highest quality]).
• A score of 1 should be given if there is no information that is relevant to the AGREE-REX item's criteria or the item's criteria were not considered in the formulation of the guideline recommendations.
• A score of 7 should be given if all the item's criteria have been carefully and thoroughly considered in the formulation of the recommendation(s). • A score between 2 and 6 should be given when some but not all of the item's criteria are considered in the formulation of the recommendation(s) and/or the link between the criteria and the recommendations is not optimal. • The appraiser should provide their reasoning for the score in the comments box provided. This is useful for discussion with other appraisers.

Suitability for Use (Optional):
The overall quality and interpretation of the item criteria are appropriate for my context. This evaluation statement is optional and can be applied to the items if the goal of the evaluation is also to determine whether or not the guideline recommendations are appropriate for use in a particular setting. All items are rated using a 7-point scale (1 [strongly disagree] to 7 [strongly agree]).
• A score of 1 should be given when there is no information that is relevant to the AGREE-REX item's criteria or and interpretation of the item's criteria are not appropriate for the context in which the appraiser intends to use the guideline recommendations. • A score of 7 should be given if the quality is excellent and the interpretation of the item's criteria are appropriate for the context in which the guideline will be used. • A score between 2 and 6 should be given if some but not all of the interpretations of the item's criteria associated with the recommendation are appropriate for the context in which the guideline will be used. • The appraiser should provide their reasoning for the score in the comments box provided.

Overall Assessment Statements:
The overall assessment statements require the user to make a judgement about whether the appraiser would recommend the guideline recommendations for use 1. in the appropriate context, and, if applicable, 2. in the appraiser's context. The appraiser has three answer options: yes, yes with modifications, or no.
1. I would recommend these guideline recommendations for use in the appropriate context.

Yes Yes, with modifications No
Calculating AGREE-REX Scores AGREE-REX results can be calculated and reported in various ways, including as item scores, domain scores, or an overall score. In addition, users must decide whether the scores will be calculated using individual scores from multiple appraisers or if appraisers will be required to reach consensus on scores.

Using Individual Appraisers' Scores vs. Consensus Scores
Using individual scores from multiple appraisers to calculate AGREE-REX scores preserves the variability and different perspectives of the appraisers. This approach is used when appraisers do not meet to discuss their scores. The reliability assessment of the tool was completed on its penultimate version and based on these data, five independent appraisers should be recruited if a consensus process will not be undertaken.
When there is an opportunity for multiple appraisers to meet to discuss scores, users may choose to use a consensus approach to reach agreement about AGREE-REX item scores. This method is also appropriate. The consensus score should be then applied to the calculation described below.

Item Scores, Domain Scores, and Overall Score
Item scores AGREE-REX items scores can be calculated by averaging the individual appraisers' scores (i.e., calculating the mean) on the 7-point scale (1=strongly disagree; 7=strongly agree) for each of the nine items. If a consensus approach is used to determine scores, then the consensus scores are the item scores. Advantages of reporting item scores are that no assumptions need to be made about the weighting or relative importance of the items, and it allows users to make observations or comparisons at the item level.

Domain scores
AGREE-REX domain scores can be calculated by adding all the scores of the individual items in a domain (the sum of the item scores is referred to as the "obtained score" in the formula below) and by scaling the total as a percentage of the maximum possible score. If item scores are determined by consensus, the same formula can be used. Reporting domain scores allows users to make observations and comparisons based on domain themes (i.e., clinical applicability, values, and implementability). The limitation of this method is that the clustering of the nine items into the three domains is based on the face validity of the cluster, and not empirical evidence. In addition, there is no empirical evidence available to determine the weighting or relative importance of the items within the domains; in the formula below, all items are given equal weighting within a domain.

Example:
If five appraisers give the following sores for Domain 1 (Clinical Applicability): Overall score An AGREE-REX overall score can be calculated by adding all nine item scores and using the formula above to scale the total as a percentage of the maximum possible scale. If item scores are determined by consensus, the same formula can be used. Reporting an overall score provides a simple way to describe the quality of guideline recommendations overall and to compare between multiple guidelines. However, an overall score on its own does not provide precise information about the particular strengths and weaknesses of the guideline recommendations. In addition, an overall score assigns equal weighting to each of the nine items, but there is no evidence available to determine the relative importance of the items in determining the quality of guideline recommendations.

Interpreting AGREE-REX Scores
At present, there are no empirical data to link specific quality scores (item scores, domain scores or overall scores) with specific implementation outcomes (e.g., speed of adoption, spread of adoption) or specific clinical outcomes; this makes selection of quality thresholds to differentiate between high, moderate, or low quality guideline recommendations a challenge. In the absence of these data, we provide examples of approaches that can be used to set quality thresholds: • Users could perform a tertile split of the overall score (or domain scores or overall score) of the candidate guidelines being considered and classify documents as being higher quality, moderate quality, or lower quality. • Users may determine threshold scores through consensus among stakeholders or appraisers. For example, guidelines with overall scores >70% may be defined as high quality, those with overall quality scores <30% lower quality, and all others moderate quality.
• Users might value one item or domain over the others for their decision-making purposes and create thresholds based on that item or domain. • Users may use AGREE-REX Scores as a continuous variable and conduct modelling exercises to determine what AGREE-REX scores predict certain outcomes and use that score as the threshold.
Any decisions about how to define minimum thresholds for quality or applicability should be made by a panel of all relevant stakeholders before beginning the AGREE-REX appraisals. Decisions should be guided by the context in which the practice guideline is to be used and by evaluating the importance of the different items and criteria in that context. For example, stakeholders can use scores to compare practice guidelines documents and identify limitations of the guidance being considered, or to select high quality practice guidelines to implement.

Clarity of Presentation
When evaluating each AGREE-REX item, the following questions should also be considered: • Is the information well written (i.e., clear and concise)?
• Is the information easy to find in the guideline?
• Does the guideline provide the user with an appropriate level of transparency?

Applicability of AGREE-REX Items
On occasion, some AGREE-REX items may not be applicable to the particular guideline under review.
There are different strategies to manage this situation, including skipping that item in the assessment process or rating the item as 1 (absence of information) and providing context about the score. Regardless of the strategy chosen, decisions should be made in advance and described in an explicit manner. As a principle, excluding items from the appraisal process is discouraged.

User's Judgement in Appraising
How the AGREE-REX is applied and the actual evaluation process requires a level of judgement. Be explicit about choices and provide a rationale for the decisions made.