Methods underpinning national clinical guidelines for hypertension: describing the evidence shortfall

Background To be useful, clinical practice guidelines need to be evidence based; otherwise they will not achieve the validity, reliability and credibility required for implementation. Methods This paper compares the methods used in gathering, analysing and linking of evidence to guideline recommendations in ten current hypertension guidelines. Results It found several guidelines had failed to implement methods of searching for the relevant literature, critical analysis and linking to recommendations that minimise the risk of bias in the interpretation of research evidence. The more rigorous guidelines showed discrepancies in recommendations and grading that reflected different approaches to the use of evidence in guideline development. Conclusion Clinical practice guidelines as a methodology are clearly still an evolving health care technology.


Background
Clinical practice guidelines can provide building blocks for changing and improving health care [1] and are a useful means of bridging the gap between scientific research evidence and usual practice [2]. They are defined as 'systematically developed statements to assist physicians and patients about appropriate health care for specific clinical circumstances' [3]. To achieve their potential as effective tools for improving health care they need to maximise their validity, a feature related to the use of evidence within a guideline and development using a multidisciplinary process [4]. However, despite an apparently explicit methodology there are variations in what guidelines say and how they relate this to underlying evidence [3,5,6]. There is also concern that guideline development may be subject to external influence [7,8].
Like many other conditions hypertension has been the subject of many different international guidelines. The World Health Organisation (WHO) have described hypertension -defined as a blood pressure of greater than 140/90 mmHg -as one of the ten leading risk factors influencing the global burden of disease [9]. It is a contributory factor in ischaemic heart disease and cerebrovascular disease accounting for 20% and 10% of all deaths in England and Wales respectively [10]. Reducing blood pressure levels leads to significant reductions in cardiovascular and cerebrovascular morbidity and mortality [11]. Nevertheless the management of hypertension remains suboptimal: for example, 40% of the adult population of England suffer from hypertension, but current levels of detection and treatment result in only 9% of sufferers having their hypertension controlled to <140/90 mm Hg [12]. There is a clear need to improve the management of hypertension both in this country and worldwide. If guidelines, however, are to play a role in this improvement they will need to maximise their validity.
The aim of this study was to review how well 10 guidelines for hypertension addressed validity in terms of their methods and their use of published evidence.

Contributing guidelines
We reviewed the methods used in development and the key recommendations of ten current guidelines (see table 1) meeting the following criteria: they concerned the general management of hypertension, or the management of hypertension in specific populations; published in English and nationally or internationally recognised. Guidelines developed before 1994 were also excluded as they predated the publication and wide dissemination of the work by Field and Lohr which offered the first and seminal work on guideline methodology [1].
We used five guidelines (CMA [13], WHO [14], VHA [15], ICSI [16], SA [17]) meeting these criteria retrieved by the comprehensive search strategy employed by the German Guideline Clearing Report [7], whose search strategy cov-  [17] 2003 SA British Hypertension Society [21] 2004 BHS National Institute for Clinical Excellence [22] 2004 NICE a Y/N: Includes/does not include description of stakeholders involved in the development process, including patient representatives and any conflicts of interest. P: Partially -a list of names with institutional affiliations was provided. b Y/N: Searches were/were not undertaken, at least through Medline. c Y: Made extensive use of existing systematic reviews (referenced 7 or more). y: Made partial use of existing systematic reviews (referenced 6 or less). N: Did not use existing systematic reviews. d Y/N: Reports/does not report explicit grading of the quality of the supporting evidence/recommendations.  ered 1990-1999. We updated the strategy by searching MEDLINE, EMBASE and OMNI from 1999 onwards using the thesaurus heading HYPERTENSION and limiting to guidelines or practice guidelines in English. This retrieved a further five guidelines (SIGN [18], ESH [19], JNC [20], BHS [21], NICE [22]) for consideration.

Evaluation of guideline development methods
We evaluated the methods used to develop each guideline with particular reference to three dimensions that relate to the use of research evidence, as found in the full published report of each guideline: • the construction of the guideline development group and its component stakeholders.
• the use of published literature and the strategy used in screening for the primary evidence; in particular, the use of existing systematic reviews or the performance of a new systematic review explicitly to answer questions posed by the guideline.
• the grading of evidence and recommendations: in particular, an explicit link between recommendations and supporting evidence.

Evaluation of recommendations and their underlying evidence
We compared recommendations on four areas that were common to all the guidelines: diagnosis of hypertension, lifestyle modification, criteria for initiation of antihypertensive drug therapies and initial recommended drug therapy. We also explored links between recommendation grades and citations and looked at how these differed in recommendations for drug therapy and salt intake.

Methods used to develop the guidelines
The measures used to assess the guideline development process are summarised in table 2.
Only three guidelines were constructed by multidisciplinary groups where the members' affiliations and conflicts of interest were described; these three guideline groups included patient representatives as well as key professional stakeholders. A further six guidelines provided only a list of names and institutional affiliations of members of the guideline development group. One further guideline gave no details of the guideline development group (see table 3).
Only one guideline conducted new systematic reviews to inform recommendations. Seven guidelines made extensive use of existing systematic reviews; three of these guidelines also stated that a search strategy based on MESH search terms was used for identifying relevant research evidence. A further two guidelines made limited use of existing systematic reviews.
There were different approaches used in the guidelines to assess the evidence available upon which to base a recommendation and upon the grading of the recommendation itself (see table 3 for a description of the grading systems used in the guidelines). Three guidelines did not grade either the evidence they cited or their recommendations. Two coded the evidence on the basis of study design but did not link this to the recommendations. In contrast, five guidelines graded the evidence and explicitly linked this to the recommendations. However, there were differences also between these grading systems with different criteria used to assess the contributing studies. The grading systems used by two guidelines (NICE, CMA) allowed for the quality of relevant randomised controlled trials and metaanalyses and the strength of their results to be analysed. The grading systems used by other guidelines did not allow for this more sensitive assessment of the evidence. Differences and shortcomings in these grading systems can be confusing and impede effective communication [23]. The GRADE system was developed as a result of these shortcomings and recommends an approach which takes into account study design, quality, consistency and directness in judging the considers the benefit harm ratio, quality of evidence, applicability, and baseline risk when translating to recommendation [23].

Recommendations made by the guidelines Diagnosis of hypertension
The guidelines were consistent in defining the threshold for hypertension as 140/90 mmHg and all agreed that blood pressure should be measured twice in a consultation on at least two separate occasions. The need for a full medical examination, clinical history and accurate blood pressure measurement was described in all of the guidelines. However, they differed on the recommended routine tests. All agreed that an electrocardiogram, blood chemistry, a complete blood cell count and urinalysis should be conducted during the initial assessment of hypertension, to assess broader cardiovascular risk. There was less agreement on the assessment of: total cholesterol, lipid profile, blood glucose, creatinine, blood calcium, thyroid stimulating hormone, gammaglutamyl transpeptidase and serum urate.

Lifestyle modifications
All of the guidelines addressed lifestyle modification as an integral part of the management of hypertension and as a first line treatment in mild hypertension, and made similar recommendations for weight loss, limiting alcohol and sodium intake, regular exercise and smoking cessation (see table 4). Guidelines typically recommended a target A clinical guideline will be unable to offer guidance on every consideration that must be made by caregivers and patients. Guidelines will reflect this complexity and are likely to vary in their scope and coverage of the decisions involved in the care pathway.

Criteria for initiation of antihypertensive drug therapy
Guidelines used a varying combination of blood pressure and other factors to establish a threshold for drug therapy (see Table 5). These factors included the presence of concomitant disease, target organ damage, cardiovascular risk factors, response to lifestyle changes and the patient's own personal preferences. For uncomplicated patients the relatively recent SA and JNC guidelines recommended lower thresholds. All guidelines (except CMA) recommended lower thresholds for patients with target organ damage, renal disease or diabetes. Some guidelines modified their recommendations for older age groups and specific ethnic groups.

First line drug therapies
Variations existed in thresholds for initiating drug treatment and initial drug therapy in typical patients (table 5). Five guidelines recommended the use of thiazdes as initial therapy in non-black patients aged over 55-65 years (SIGN, CMA, ICSI, SA, NICE). The other five differed in their recommendations, one recommended thiazides or b-blockers (VHA), one recommended low dose monotherapy selecting from a broad range of antihypertensive agents (WHO), one recommended low dose monotherapy or a combination of low dose antihypertensive agents (ESH), one recommended thiazides or calcium channel blockers (BHS) and one recommended thiazides alone or in combination with a range of antihypertensive drugs (JNC). The pattern of variation in these recommendations did not follow publication date of the guideline or relate to the research sources used in the development of the guideline.

Grading recommendations and links to the evidence base Salt intake
Although all guidelines recommended restriction of salt intake, four (SIGN, CMA, BHS, NICE) relied upon a similar and extensive body of work, either directly using the original data in a systematic review or indirectly sourcing the study via a previously published systematic review (see table 6). Nevertheless, these four guidelines were inconsistent in their grading of the recommendation: two guidelines (CMA, NICE) graded it 'B' suggesting that the pattern of care was recommended with caution and based upon research evidence subject to bias, while two graded it as 'A' (BHS, SIGN) indicating that the recommendation was based on strong research evidence not vulnerable to bias. Although the VHA guideline cited much less evidence than these four guidelines, it nevertheless graded salt restriction as 'I', equivalent to 'A' in other schemes. These disparities reflects the differences in the grading of recommendations in guideline development. Both the CMA and NICE guidelines adopted systems that required judgement about the quality of the RCT and the strength of its findings rather than a system that graded recommendations solely upon research design. The other five guidelines made similar recommendations about salt restriction, although they cited very limited evidence to support this. Y Trial cited by guideline SR Trial included in systematic review cited by guideline Y(ex) Trial cited by guideline but excluded from meta-analysis 1997 SR [50,92] SR [3,87,9 3] SR [63,96] First line drug therapy Overall the guidelines were relatively consistent in the studies that they cited as the evidence for the drug treatment recommendations (Table 7). One guideline (VHA) cited very little evidence; another (ICSI) did not cite any systematic reviews but referred to recent primary reports of trials, two guidelines (CMA, SA) relied almost exclusively on existing systematic reviews, whereas others (WHO, SIGN, ESH, JNC, BHS) supplemented citation of systematic reviews with citation of recent primary reports; one guideline (NICE) performed its own systematic review. Recommendations for use of thiazides and/or beta-blockers as initial drug therapy in typical patients were graded as 'A' by three guidelines (NICE, SIGN and CMA). One guideline (BHS) recommended thiazides or calcium channel blockers grading it 'C'. This recommendation was largely based on the ABCD algorithm which in turn is based upon an extrapolation of how different drugs work rather than RCT findings; hence the evidence was graded as category III (descriptive studies, or evidence extrapolated from RCTs or quasi experimental studies), leading to the grade of 'C' for the recommendation.

Discussion
Current guidelines are inconsistent in their handling of key methodologies that relate to the sourcing, interpretation and application of research evidence. Some cite a substantial body of evidence whereas others present little evidence. Some grade their recommendations -although the grading systems and grades used are not consistentwhereas others do not. These findings are consistent with other studies exploring the quality of guideline development [24][25][26][27]. Methodological failings may affect the quality of the guideline in several ways. A search that is insufficiently thorough may introduce bias into the summary of the evidence [28]. Systematic reviews have been described as the optimum method of summarising evidence of effectiveness within a clinical practice guideline [29]. In this study we found that most of the guidelines relied on previously published systematic reviews, despite Primary report of trial cited by guideline. SR Trial included in systematic review cited by guideline; primary report not cited by guideline. the possible problems with this strategy. Firstly, systematic reviews may date quickly and not incorporate newer evidence. Secondly, the scope of the published reviews may not always match the remit of the guideline and so may not be relevant to the target population of the guideline. Thirdly, up-to-date high quality systematic reviews may not be available in all the areas covered by a guideline.
Despite the inconsistent approach in the guidelines to sourcing the evidence, interpreting it and applying it to recommendations, and the great variation in the volume of supporting evidence cited, the areas of consensus are substantial. Different hypertension guidelines made similar recommendations for many areas of management, notably recommendations for lifestyle changes and their role as first line interventions for patients in certain categories of risk. This level of agreement suggests the possibility either that the published guideline did not cite all the evidence which influenced the recommendations, or that guideline groups may develop guidelines that are heavily influenced by previously published guidelines or an implicit international consensus. Guidelines generated without a systematic search of the literature and without systematic review of all the supporting evidence would be more likely to reflect the biases of developers and it would not be surprising if they were congruent with other guidelines in the same area.
Only five guidelines graded the recommendations made. Failure to grade research evidence and the subsequent recommendations means that the decision making process is not explicit and does not inform the guideline user of the strength of evidence underpinning a particular pattern of care.
The inconsistent grading of the same recommendation in different guidelines also indicates varying approaches to interpreting and applying research evidence in guideline development. The process of evaluating the quality of research evidence and applying this to guideline recommendations using a system of grading is clearly inconsistent and currently an evolving area of guideline methodology.
As well as seeking research evidence, guidelines seek to elicit and incorporate the views of clinical experts and various stakeholders in interpreting the evidence or in offering expert opinion where objective evidence is sparse. Indeed this is an important feature used to assess the quality of a guideline [30]. Differences in guidelines, reflecting the differing views of individuals participating in the guideline development process, are therefore to be expected [29,31]. Herein lies a tension between the rigour needed to try and produce objective and unbiased state-ments and to also be responsive to the views of participants. It is clear from other reviews of clinical guidelines that the composition of the development group is reflected in the recommendations. Savoie et al [31] in their critical appraisal of guidelines for cholesterol testing found that the greater the involvement of clinical experts in the development process of the guideline, the less the recommendations reflected the research evidence. As only three of the guidelines which we considered fully reported the composition of the guideline development group, it is not possible to make inferences about its impact on recommendations for hypertension. Achieving evidence based guidelines while incorporating the views of the various stakeholders within the development group may create conflict and divergence that the final guideline may mask. The differences between the guidelines described here may reflect this tension.
Differences in recommendations may reflect not only differences in material sourced, differences in interpreting and grading the evidence but also different influences in moving from evidence to recommendations. Differing recommendations for first line drug therapy suggest this. The research base underpinning the recommendations for first line drug therapies is strong in terms of the number and quality of trials in the area (see table 6). Nevertheless, three guidelines (NICE, SIGN and CMA) recommended thiazides for the uncomplicated patient, grading this as 'A', while one (BHS) recommended thiazides or calcium channel blockers grading it 'C'. This suggests either a strong competing interest or the possibility that a less supported but broader recommendation is felt to hold greater clinical merit than older treatments which have accumulated a strong research base. This again may reflect a tension in the development of guidelines, between the restrictiveness of the conventional evidence based approach which inevitably relies on older, well researched therapies and the greater openness of an approach which allows newer, less well endorsed treatments.
Clinical guidelines are rarely based solely on the research evidence and incorporate the consensus views of experts. Raine et al. argue that current approaches to guideline development often lack a sufficient transparency and reliability concerning how such consensus opinions are formed [32]. They highlight the possible influence of key individuals, unrepresentative decision making and the role of constraints of time and resources which limit the range of guidelines that can be generated and their need for updating. They propose an approach which makes reasons for disagreement and degree of consensus explicit and suggests the inclusion of a survey stage to enhance reliability.
The potential influence of external pressures in the formation of guideline recommendations highlights the need for transparency in the declarations of conflictions of interest by authors of clinical practice guidelines. One investigation of panels that write clinical guidelines found that more than one-third of authors declared financial links to relevant drug companies, with around 70% of panels being affected [33]. Another study found that 87% of authors of clinical practice guidelines had some form of interaction with the pharmaceutical industry [7]. If authors have relationships that pose a potential conflict of interest these need appropriate disclosure so that readers may evaluate the merit of those guidelines.

Conclusion
Many challenges exist to improve the use of evidence in all its forms in guideline development. Clinical practice guidelines remain a developing healthcare technology and if they are to fulfil their potential as a tool to improve standards of care these challenges need to be addressed. The requirements of future guidelines are clear if they are to inform clinicians and patients about appropriate healthcare. Authoritative and rigorously developed guidelines should (where possible) feature transparent and fully reported: guideline group methods and participation; involvement of stakeholders and sponsors; reporting and use of evidence and linking of recommendation to evidence; understanding of health care delivery, the policy context and narratives of patient experience.

Competing interests
The authors contributed to the development of one of the guidelines reviewed [26].

Authors' contributions
FC, HOD and JMM wrote the manuscript. ME critically revised the manuscript.