OUP user menu

A systematic review of appraisal tools for clinical practice guidelines: multiple similarities and one common deficit

Joan Vlayen, Bert Aertgeerts, Karin Hannes, Walter Sermeus, Dirk Ramaekers
DOI: http://dx.doi.org/10.1093/intqhc/mzi027 235-242 First published online: 2 March 2005


Objective. To identify a critical appraisal tool for clinical practice guidelines that could serve as a basis for the development of an appraisal tool for clinical pathways.

Design. Systematic review of the literature and personal contacts. Databases searched were: Medline, Embase, and Cinahl. Search terms were: practice guidelines, appraisal, and evaluation. The items of the identified appraisal tools were examined and thematically grouped into 10 guideline dimensions. Content analysis and scoring of these domains by the appraisal tools was evaluated.

Results. Twenty-four different appraisal tools of practice guidelines were identified. None scored the evidence base of the clinical content of guidelines. Four tools scored all the guideline dimensions. The Cluzeau instrument is the only one of these four that has been validated. Of the three instruments based on the Cluzeau instrument, the AGREE instrument is the only validated instrument that uses a numerical scale.

Conclusions. Being a simplified version of the Cluzeau instrument, the AGREE instrument has the most potential to serve as a basis for the development of an appraisal tool for clinical pathways. However, important limitations will have to be dealt with when developing such a tool.

  • appraisal tool
  • clinical pathways
  • content analysis
  • practice guidelines

In 1990, the Institute of Medicine (IOM) defined clinical practice guidelines as systematically developed statements to assist practitioner and patient decisions about appropriate health care for specific clinical circumstances [1]. Great variability exists in the quality of clinical practice guidelines [2,3], and numerous appraisal instruments were therefore developed in an attempt to discriminate high-quality from lower-quality guidelines. A recent review of the literature identified 13 appraisal instruments of practice guidelines, but concluded that evidence at that time was insufficient to support the exclusive use of just one appraisal instrument [4].

Selection of a decent guideline is one thing, but the selected guideline still has to be implemented. One way to implement clinical practice guidelines into daily practice is through clinical pathways. These are sequences of standardized multidisciplinary processes or critical interventions that must occur for a specific population towards the desired outcomes within a defined time period [5]. Originally, nursing care, organization, and cost-effectiveness were the most important aspects to be covered by clinical pathways. However, they also provide a means of improving systematic collection and abstraction of clinical data for audit and of promoting change in practice [6]. Increasingly, questions on the ability of clinical pathways to improve clinical quality, patient satisfaction, and the satisfaction of care providers are being researched.

Furthermore, it is unclear whether a relationship exists between the methodological and content quality of clinical pathways on the one hand and clinical quality, defined by the judicious and explicit use of the evidence from clinical trials, on the other. The instauration of several diagnostic and therapeutic interventions in a local clinical pathway based purely on the opinion of local specialists and without systematically taking the available up-to-date evidence for these interventions into consideration, can threaten the appropriateness and quality of care.

Clinical pathways show important similarities to clinical practice guidelines. They are both intended to provide appropriate and effective health care for specific clinical circumstances and to reduce variation in practice [1,6]. However, clinical pathways commonly integrate key aspects from several clinical practice guidelines, because they outline the care to be given for the patient’s entire clinical path rather than for one specific clinical situation [7]. Clinical practice guidelines are usually developed by government agencies, institutions, or expert panels, whereas clinical pathways are more local initiatives.

Unlike clinical practice guidelines, no (validated) instrument exists to assess the methodological quality of clinical pathways. A possible strategy to develop a critical appraisal tool for clinical pathways is to base it on an appraisal tool for clinical practice guidelines [7]. In the present article, the first phase of the development of such an appraisal tool for clinical pathways is described. In this first phase, we performed an update of the above-mentioned study of Graham et al. [4]. A systematic review of the literature was carried out to identify and compare existing critical appraisal tools for clinical practice guidelines. Possible scientific validation and international dissemination of the appraisal instruments were of particular concern in our review. In a later phase, one or more instruments will be selected and applied to clinical pathways, and serve as a basis for the development of an appraisal tool for clinical pathways.


Search strategy

A literature search of the English and non-English literature indexed in the Ovid–Medline database (1966–October 2003), Embase database (1990–October 2003), and Cinahl database (1982–October 2003) was conducted, using the following MeSH and text terms in combination: practice guidelines, appraisal, evaluation. Methodology filters were not used, in order to conduct a sensitive literature search and because the usefulness of such filters is unclear for this particular subject. A manual search of the references of relevant articles was conducted. We also contacted the developers of the instruments identified to determine whether they were aware of any other instruments to include in our review.

All articles that described the evaluation of clinical practice guidelines or the development of a guideline appraisal tool were included. Articles describing a practice guideline or the development process of a practice guideline were excluded. No restriction was placed on abstracts, conference proceedings, or language. One investigator (  J.V.) assessed the selected papers and retrieved all the different tools that appraised the quality of clinical practice guidelines. Tools based completely on, or copied from, an existing tool were excluded.

Tools evaluation

The tools were compared for their general characteristics (source, items, scoring system). We also analysed whether items of specific importance in the development of clinical practice guidelines and clinical pathways were evaluated. One investigator (J.V.) therefore examined the questions/statements from all the instruments and thematically grouped them into separate guideline dimensions. Two other investigators (D.R., B.A.) independently performed a content analysis of the different instruments to assess whether or not they covered one or more items of each dimension. The scoring of the evidence base of the clinical content was also assessed. In cases of disagreement, the instruments were analysed and discussed in a small group (J.V., D.R., B.A.).


The Embase search yielded 11 523 articles of which 42 were selected. Of the Medline search another 41 of 2863 retrieved articles were selected. The Cinahl database contained 1187 articles, four being selected. A manual search of the bibliographies of the articles retrieved yielded another 11 relevant articles.

In the 98 articles retrieved, we identified 24 possible critical appraisal tools of guidelines. One tool was excluded from the review because it concerned an automated version [8] of another tool [9]. An additional instrument, not found through the systematic review of the literature or our personal contacts, was handled by one of the reviewers of this article [10].

General characteristics of the appraisal tools

Table 1 provides an overview of the characteristics of the 24 tools. The first appraisal tool was published in 1992 by Lohr and Field [11]. Before 1995, tools were exclusively published in North America [1116]. Since 1995, appraisal tools have gained interest all over the world. Twenty-two tools were developed in eight different countries: six tools in the USA [11,12,1518], five in Canada [13,14,1921], four in the UK [9,2224], two in Australia [25,26] and Italy [27,28], one in France [29], Germany [30], and Spain [31]. Two tools have been developed internationally [10,32].

View this table:
Table 1

Characteristics of critical appraisal tools of guidelines

AuthoraDateCountry of originPublished in peer-reviewed literatureValidationScoring systembNo.of items
Institute of Medicine [11]1992USAYesNot statedY/N/NA46
Hayward et al. [14]1993CanadaYesNot statedNone9
Selker [12]1993USAYesNot statedNone7
Hayward et al. [13]1995CanadaYesNot statedNone10
Mendelson [15]1995USAYesNot statedNone8
Woolf [16]1995USAYesNot statedNone10
SIGN [24]1995UKNoNot statedY/N52
Mutter-Pilson [29]1995FranceYesNot statedY/N/NA18
Ward and Grieco [26]1996AustraliaYesNoScale18
Liddle et al. [25]1996AustraliaNoNot statedScale14
Savoie et al. [21]1996CanadaNoNot statedY/N15
Calder et al. [19]1997CanadaYesNoY/N24
Shaneyfelt et al. [9]1998UKYesYesY/N25
Helou and Ollenschlager [30]1998GermanyYesNot statedY/N/?/NA41
Apolone and Bamfi [27]1999ItalyYesNot statedNone6
Cluzeau et al. [22]1999UKYesYesY/N/?/NA37
Grilli et al. [28]2000ItalyYesYesY/N3
Casi et al. [31]2000SpainYesNoY/N21
Marshall [20]2000CanadaYesNot statedNone9
Sanders et al. [18]2000USAYesNot statedScale15
Reed et al. [17]2000USAYesNot statedScale33
Hutchinson et al. [23]2003UKYesNot statedNone5
AGREE collaboration [32]2003EuropeYesYesScale23
Shiffman et al. [10]2003North America/UKYesNoNone18
  • a SIGN: Scottish Intercollegiate Guidelines Network; IMCARE: Internal Medicine Center to Advance Research and Education; APA: American Psychological Association; AGREE: Appraisal of Guidelines Research and Evaluation.

  • b Y: yes; N: no; NA: not applicable; ?: not sure.

Eleven instruments [9,10,13,18,21,22,24,26,27,29,32] were based on the instrument developed by the IOM [11], three instruments [9,13,19] referred to Hayward et al. [14], another three instruments [23,30,32] referred to Cluzeau et al. [22].

The number of questions ranged from 3 to 52. Some questions were subdivided into two or more smaller questions. Nine tools used no specified scoring system, 10 tools used a yes/no score with or without the possibility of answering ‘not sure’ or ‘not applicable’ (Table 1). Five instruments used some kind of scaling system [17,18,25,26,32]. The instrument developed by Sanders [18] and the AGREE instrument [32] use a numerical scale.

All but three instruments were published in peer-reviewed literature. Only four instruments have been subject to a validation study: the instruments developed by Shaneyfelt [9], Cluzeau [22], and Grilli [28], and the AGREE instrument [32].

Content analysis

In total, a list of 469 questions or statements was generated from the 24 instruments. The common questions/statements were grouped into 50 different items (Table 2). These 50 items were then grouped into 10 guideline dimensions based on the work of the IOM [33] and the study of Graham et al. [4] (Table 2): validity, reliability/reproducibility, clinical applicability, clinical flexibility, multidisciplinary process, clarity, scheduled review, dissemination, implementation, and evaluation. All 50 items could be fit into the 10 dimensions.

View this table:
Table 2

Items of the appraisal instruments grouped into 10 guideline dimensions

ValidityDecision making: how consensus was reachedMethod(s) used to reach consensus about guideline recommendation; role of values
Decision making: how recommendations were madeMethod(s) used in formulating recommendations
Evidence collectionHow the evidence was obtained
Literature searchHow the literature was searched, including search strategy
Sources of evidenceSources of evidence (textbooks, periodical literature)
References citedReferences for the evidence upon which the guideline was based
Literature selectionCriteria used to in- and exclude literature from the data synthesis
Evaluation of evidenceHow the evidence was graded, which may or may not include a statement about the strength of evidence
Synthesis of dataMethod(s) by which the evidence was synthesized
Recommendations and their evidenceRecommendations consistent with each other and the evidence used to support them
Major recommendationsDifferentiating major from other recommendations
Links strength of evidence to recommendationLinks strength of evidence to recommendation
Other guidelinesExistence of other guidelines relevant to guideline topic checked and compared
Consistent with policy of guideline development organizationConsistent with policy of guideline development organization
AlternativesAlternative interventions to those recommended or dealt with by the guideline to deal with topic
Health benefitsExpected health benefits of guideline mentioned
Harms, risksPotential harms or risks of guideline mentioned
CostsEconomic and other cost outcomes of guideline mentioned
Outcomes statedOutcomes expected to result from guideline stated
Reliability/ reproducibilityIndependent reviewPeer review; guideline sent to experts not involved in its development for review
Pilot/pretestingGuideline piloted or pretested in clinical setting before its dissemination
DocumentationProcess of guideline development documented
Clinical applicabilityPurposeGoal or objective of the guideline
RationaleRationale of or reason for the guideline
Guideline topicTopic or health problem dealt with
Patient populationPatient population for whom the guideline is intended
Provider populationGroup of health care providers to whom the guideline is directed or who should use the guideline
In-/outpatientDiscriminating between in- and outpatients
Ethical aspectsEthical aspects
Clinical flexibilityExceptions/flexibilityFlexibility in the application of the guideline, or situations in which guidelines may not apply
Patient preferences consideredWhether patient choices and/or views were considered
ClarityUnambiguousGuideline is clearly worded
PresentationGuideline presentation is user friendly
Ease of useGuideline can be used in a straightforward manner
Structured abstractStructured abstract or summary provided
Patient informationPatient information included
Scheduled reviewScheduled reviewDate guideline becomes no longer valid or is scheduled for review
Date of issue of guidelineDate of issue of guideline
Development teamMultidisciplinary processAll relevant disciplines involved
Composition of guideline development teamThe individuals and/or disciplines, occupations, or organizations represented in the group who developed the guideline
Conflict of interestConsideration of any (potential) bias, (potential) conflicts of interest related to the individuals developing the guideline
Funding and related biasSources of funding
EndorsersEndorsement of guideline by official bodies
Guideline development organizationThe organization or group who developed the guideline
Patient representatives involvedPatient representatives involved
ImplementationImplementationStrategies to implement the guideline
FeasibilityPolicy and administrative implications of using the guideline
DisseminationDisseminationHow the guideline is to be distributed to intended users
EvaluationEvaluationHow the guideline is to be evaluated once it has been implemented
AdherenceAdherence to the guideline by the intended users

The two independent reviewers of the different instruments agreed completely on four instruments [9,22,23,26]. Disagreement existed on 0–5 dimensions per instrument (mean 2.2). The dimension clinical flexibility most frequently caused disagreement (10 instruments).

All the instruments evaluate the validity of guidelines with at least one item, almost all evaluate the clinical applicability (Table 3). Approximately 75% of the instruments score the dimensions reliability/reproducibility, clinical flexibility, scheduled review, and multidisciplinary process. Fourteen tools address dimension clarity. Only a minority scores the dimensions dissemination, implementation, and evaluation. Three appraisal tools (Table 3) score all the dimensions mentioned above by using at least one question [22,24,30]. None of the instruments score the evidence base of the clinical content of guidelines.

View this table:
Table 3

Dimensions covered by the critical appraisal tools

AuthorValidityReliability/ ReproducibilityClinical applicabilityClinical flexibilityClarityScheduled reviewDevelopment teamImplementationDisseminationEvaluation
Institute of Medicine [11]XXXXXXX
Hayward et al. [14]XXXXX
Selker [12]XXXX
Hayward et al. [13]XXXXX
Mendelson [15]XXXXXXX
Woolf [16]XXXXXXX
Mutter-Pilson [29]XXX
Ward and Grieco [26]XXXXXXX
Liddle et al. [25]XXX
Savoie et al. [21]XXXXXXXXX
Calder et al. [19]XXXXXXX
Shaneyfelt et al. [9]XXXXXX
Helou and Ollenschlager [30]XXXXXXXXXX
Apolone and Bamfi [27]XXX
Cluzeau et al. [22]XXXXXXXXXX
Grilli et al. [28]XX
Casi et al. [31]XXXXXXXXX
Marshall [20]XXXXXX
Sanders et al. [18]XXXXXXX
Reed et al. [17]XXXXXXX
Hutchinson et al. [23]XXX
AGREE collaboration [32]XXXXXXXXX
Shiffman et al. [10]XXXXXXXX


In addition to the 13 instruments found by Graham et al. [4], we identified another 11 different tools for the critical appraisal of clinical practice guidelines. Comparison of these 24 instruments showed a wide variation in source, number of items, ways of scoring, and specific aspects that are scored. The questions used in the appraisal tools were grouped into 50 different items, which is slightly more than in the study of Graham et al. [4]. However, these 50 items could easily be fitted into the 10 guideline dimensions used by Graham et al.

Three appraisal tools were found to address all the guideline dimensions [22,24,30]. Of these, the Cluzeau instrument [22] is the only instrument that has been subject to a thorough validation study. It was originally based on the instrument developed by the IOM [11] and contains 37 items divided into three dimensions: rigour of development (questions 1–20), context and content (questions 21–32), and application (questions 33–37). A yes/no score is used to respond to each question.

Three additional instruments are based on the Cluzeau instrument [23,30,32]. Of these, the AGREE instrument [32] is the only instrument that has been validated. It uses a numerical scoring scale, making it easier to compare scores. It is more compact than the Cluzeau instrument, containing only 23 items divided into six domains: scope and purpose, stakeholder involvement, rigour of development, clarity and presentation, applicability, and editorial independence. Unlike the Cluzeau instrument, the dimension dissemination is not scored in AGREE. Because the AGREE instrument is a validated, easy-to-use, and transparent instrument, which was internationally developed and widely accepted, it can possibly serve as a basis for an instrument to evaluate the methodological quality of clinical pathways. English investigators have already reported on an appraisal tool, the Integrated Care Pathway Appraisal Tool (ICPAT) [7], which is based on the AGREE instrument, but is yet to be validated.

There are some important limitations in the use of the AGREE instrument. Firstly, the domain scores are useful for comparing clinical practice guidelines, but it is not possible to set thresholds for the scores to classify a clinical practice guideline as ‘good’ or ‘bad’. Secondly, the AGREE instrument does not assess the clinical content of the clinical practice guideline nor the quality of evidence supporting the recommendations, which is a common deficit in all the existing appraisal tools. The use of a systematic methodology in the retrieval of evidence supporting guideline development is frequently scored by appraisal tools [9,11,13,14,16,17,1922,2426,3032]. However, even by using a recent and advanced instrument such as the Cluzeau instrument or the AGREE instrument, the results of the search for evidence, the correct use of inclusion and exclusion criteria, and the critical appraisal of the retrieved evidence are not validated. Therefore, a major conclusion of this review is that in order to evaluate the quality of the clinical content and more specifically the evidence base of a clinical practice guideline, verification of the completeness and the quality of the literature search and its analysis has to be added to the process of validation by an appraisal instrument. Experience with the methodologies of evidence-based medicine such as literature search and critical appraisal is therefore essential for guideline validators to assure the quality of the appraisal process. Blind application of an appraisal instrument, even when validated and widely implemented, without particular attention to the evidence supporting the guideline, can threaten the credibility of these instruments and the current evolution in the international community to further elaborate the quality of guideline development [34].

Because of the differences between clinical pathways and clinical practice guidelines, the AGREE instrument cannot be applied to clinical pathways using the present version. Some items will have to be reformulated or removed, new items will have to be included. For example, the language used in the AGREE instrument is clearly ‘guideline language’ and will have to be translated into ‘pathway language’: for example, clinical pathways contain concrete interventions rather than recommendations, clinical pathways are implemented rather than published. At present, our team is composing a development group consisting of experts in clinical pathway development and experts in clinical practice guideline development and validation to create a version of the AGREE instrument applicable to clinical pathways. The development of this instrument will be described in a subsequent publication.

In conclusion, 24 different appraisal tools of clinical practice guidelines were identified. Of these tools, the Cluzeau instrument seems to be the most complete. Being a more compact version of the Cluzeau instrument and using a numerical scale, the AGREE instrument has the potential to serve as a basis for a critical appraisal tool for clinical pathways. However, some important limitations of the AGREE instrument will have to be dealt with when developing such a tool.


View Abstract