All nine included studies were cohorts with historical controls. On top of the various methodological biases introduced in each study, other confounders are introduced with these study designs. These include population selection bias due to different time periods, staff selection bias due to different time periods, potential differences in care procedures due to hospitals instigating new safety protocols over time, and unclear reporting and monitoring procedures between periods for outcome assessment, especially in the control period. More high quality studies, such as RCTs, are needed in this area to increase the level of evidence.
Across the studies there were different populations, different time periods for patient enrolment, and different assessments used. There was a lack of details for inclusion/exclusion criteria and baseline population characteristics in most studies. For example, in four of the five studies set in the ICU there was no reporting of specific patient selection criteria [4–7]. Without explicit a priori inclusion and exclusion criteria investigator and staff discretion can play a biased role in population selection. There may be limitations in the generalisability of the studies to non-ICU settings. There was only one study that investigated the use of checklists in diverse socio-economic and surgical settings [10]. In this study the magnitude of the changes in outcomes before and after the intervention varied between study locations. This suggests that the setting may influence the effectiveness of patient safety checklists, and that in locations with good performance at baseline for the measured outcomes there may be limited potential for improvements. The authors noted no effects of income level or surgery type clusters on the outcomes, but geographic location, resource levels, loads on staff, level of staff training, and other factors may have influenced the effectiveness of the intervention [10].
Across studies within each clinical setting, it is hard to summarise and link trends between checklists and outcomes. As shown in Table 3, due to the combination of variations in setting, the checklist design and educational training given, and outcomes, it was unfeasible to accurately summarise any trends across all studies. Even within a particular setting synthesis is challenging. For example in the ICU setting, three studies used checklists to measure LOS outcomes [3, 6, 7]. However, these studies used different checklist designs, with one form [6] being less descriptive and detailed than the other two checklists [3, 7]. The training components of the intervention were not clearly described in many studies. These variations in study design and limitations in reporting prohibit summary of the trends for interventions and outcomes across studies.
It is also important to consider the impact of team and staff factors on the impact of checklists. One study noted that staff changed frequently during the study and was not the same between control and intervention periods [5]. For many of the studies staff changes during the study period could have affected delivery of care, particularly delivering differences in care between treatment groups. However staff turnover is common in hospitals and this may not be something that can be controlled in studies set in hospitals. It is uncertain if other factors (e.g. new policy directives, unit and organisational safety culture) could have influenced the behaviour of staff in caring for patients. One study stated that "there were other efforts during the study period to improve ventilator care and reduce catheter-related infections that may have contributed to reduced LOS" [7].
Most studies did not provide evidence of how they measured if staff were using the checklists properly during their work. All studies used some sort of staff training or education to increase compliance and proper use of the checklists, but it is unclear if the education and training was effective as this was not assessed in any of the studies. Apart from assessors being there to observe staff doing their work and using the checklists, which one study did [5], one did part of the time [10], or allowing other staff to check each other's checklist before proceeding with further actions [4, 10, 12], there were no other methods to ensure that staff were using the checklists properly. It is also unclear if there is an optimal design of checklists for specific tasks. In most studies the checklist itself was not validated prior to implementation. Validation of the checkilist is important to ensure that the list contains all relevant items, no unnecessary items and that the included items are interpreted accurately and consistently by the users. For example, one study states that "it is not clear that each element of the checklist needed to be there" [5].
There were different outcome measurement periods between treatment groups. The longest period between control and intervention assessments for any study was 12 months [3]. In one study in the ED setting there was a very large difference in the observation periods between the control period (three months) and the intervention period (four weeks) [9].
Outcomes were not uniformly defined across all studies, even relatively well accepted outcomes such as LOS defined and measured in different ways [3, 6, 7]. Assessment of LOS is also complex as it is not usually normally distributed as assumed in some of the studies, in some studies was measured differently after checklist implementation and the link between LOS and other surrogate outcome measures to the outcome of patient safety is also unclear [4, 6]. It is also questionable whether improvements in staff communication and protocol adherence translate directly to improvements in patient outcomes [4, 6, 9]. It may be incorrect to draw direct links between improved staff communication and protocol adherence and better patient outcomes from any of these studies because we do not know all the characteristics of the patient population that was studied and all other patient care factors. It is also unclear how long after the introduction of the safety checklist outcome assessment should start. One study defined proper checklist use as being when the intervention had been implemented for 60 days [11]. This is an arbitrary point and it is unclear if the 60 day period after the implementation of the intervention can be validly used as a cut-off point. Inclusion of a comprehensive education and training package could increase the optimal use of the intervention earlier. However, studies could also measure outcomes too early and give a false impression of ineffectiveness. The dilemma for healthcare providers is to measure the outcomes as soon as possible, but making sure that the intervention has been properly integrated into clinical practice first. It was unclear if there was a relationship between the effect of using a checklist and time, however most studies only assessed outcomes for a few months. Caution should be exercised when extrapolating any reported short-term outcomes from the studies to longer term predictions about effectiveness. Longer outcome assessments of maybe at least one year, or over a few cycles of staff changes, may be needed to determine the sustainability of changes.
Strengths and limitations of this systematic review
This review has some limitations. Only comparative studies written in English since 1980 were considered, so potentially relevant studies published in other languages or prior to 1980 may have been missed. We also did not have the resources to hand-search information sources or contact individual hospitals or experts for potentially relevant studies or evaluations of checklist programs. However our search was broad, and we included a wide variety of checklists in a broad range of patient care settings.
The included studies were undertaken in a variety of settings, used varying methods and evaluated differing interventions and outcomes. As a result we were unable to undertake a statistical meta-analysis, however we believe our detailed quality appraisal and narrative synthesis highlights the strengths, weaknesses and key messages of this complex body of literature.
Implications for further research
Some studies remarked that some of the items in their own checklists were probably unnecessary. To determine the most useful design and content of checklists clinical trials comparing different checklist designs and content within the same settings are required.
There are Cochrane systematic review protocols to review the evidence on the effect of computer-generated paper reminders [13] and paper reminders on practice and healthcare outcomes [14] but these are looking at a broader range of interventions and outcomes. A more recent published evaluation of the WHO surgical checklist was published in 2010 [15]. There has been a retrospective publication of an 8 year use of safety checklists in neurosurgery [16], and we hope this will result in more publications of long-term comparative studies in this area, both retrospective and prospective studies. In conjunction with this systematic review, Southern Health designed and implemented a medical safety checklist for use by clinical staff and has been monitoring its effect on clinical outcomes. Results from this pilot work at Southern Health may help inform the body of evidence for using safety checklists to improve safety. Concurrently, Southern Health is looking at piloting the use of electronic checklists in improving patient safety. Southern Health plans to update this current systematic review in 2012, and add an appraisal of the evidence for electronic checklists.
Implications for practice
Health services planning to implement safety checklists should use an evidence-based approach to selecting or designing and validating checklist and/or checklist items for their clinical improvement goals. Resource use should also be considered, such as staff time and funding requirements to properly provide training and education for using the checklists. Health services piloting new checklists or using established checklists should be encouraged to create an evaluation plan on their use of safety checklists and publish their findings so that the body of evidence can grow.