Main findings
Reducing patient harm has been identified as one of the main areas needed to improve both outcomes and costs of healthcare. The US Department of Health and Human Services, in forming the Partnership with Patients initiative has specifically targeted a reduction in preventable harm by 40% as one of its two key goals [13]. The American Food and Drug Administration’s Safe Use Initiative has targeted to “measurably reduce preventable harm from medications [14].” Other international agencies and organizations are also highly interested in this area. To measure our progress toward these goals, we need a validated metric that clearly incorporates the definition of a medical harm event and the level of preventability of such an event. Consulting the literature using a structured search strategy, we have identified seven key “themes” in extant definitions of “preventable harm”, with the most common being a) a harm with an identifiable and modifiable cause, b) “harm where a reasonable adaptation to a process will prevent future recurrence”, and c) “harm where an existing guideline has not been adhered to”. Our review revealed that, for the most part, these definitions are locally derived (67/127, 53%) and lack external validation. Interestingly, some of the other definitions such as all harm is preventable, were less commonly used. This implies that the definition of a Never Event is somewhat vague, dynamic and not standardized. Indeed, the 25 Never Events defined by the UK National Patient Safety Agency have numerous exceptions and modifications that highlight this challenge. We will discuss the three most common definition domains encountered in more detail and provide suggestions on what we think is a suitable approach to advance this critical area of health care delivery.
-
1)
We have shown that the most common theme used to define preventability was “a harm with an identifiable and modifiable cause.” In studies using this definition, the researchers apparently had confidence that they could identify a specific “cause” and conclude that it was preventable because there was a single correct and achievable management choice (e.g., one medication was ordered but another delivered, medication error), or because the raters determined that most practitioners would have followed a different (more desirable) management plan. Such criteria may risk becoming circular to some extent; a preventable cause is preventable. At a minimum, to minimize bias and inaccuracy, one would hope for assurance of the reproducibility and agreement of such judgments between observers (typically measured with a chance-adjusted measure such as the kappa statistic). Other characteristics such as criterion and convergent validity of the measure will be desirable but hard to achieve.The major threat to the validity of such a judgment is hindsight bias. To avoid this situation and preserve the validity of this definition, the direct cause of harm should be definitely detectable before it has an opportunity to cause harm (e.g., administering the wrong drug to the patient). Some studies attempted to prevent hindsight bias (e.g., Hayward et al [7]. gave instructions to reviewers of harm cases to not consider certain aspect of the patient history and process of care); however, this is always challenging in observational studies. This definition and protection from hindsight bias requires an all-or-nothing cause-and-effect relationship, the kind associated with large effects. For causes weakly associated with harm (e.g., low quality handoffs among trainees), studies with rigorous protection against bias (e.g., randomized trials) would be needed to establish that harm is indeed preventable by affecting a specific causal pathway (e.g., improved communication tools to enhance resident handoff). One challenge in using this definition is that it does not always include “near misses”. These are cases in which harm did not occur but was likely to take place.Medication events were the most common type of harm described with this theme (“a harm with an identifiable and modifiable cause”) (20 out of 58 articles), and for those, only 7/20 had agreement measures reported, with the median kappa = 0.69 (range 0.49-0.93); consistent with good agreement. We have no data regarding the agreement from the other 13 studies examining medications that embraced this definition for preventable harm. One might suspect it would be harder to assign preventability to non-medication events where the process of care is more complex and less explicitly defined than it would be for adverse medication events, where pharmaceutical prescription, transcription, distribution, and ultimately administration is typically well worked out. However, there were measures of agreement in 8/38 studies addressing more complex causal chains associated with harm under this definition, with a median kappa = 0.68 (0.23-0.81). Thus, using this definition, the processes used and developed locally lead to good agreement in determination of preventable harm when measured. However, it is important to recognize here that the included studies judged preventability a posteriori; thus, hindsight bias has clearly inflated the observed agreement. There were no assessments of criterion or convergent validity.
-
2)
In some ways, the second most common definition (“harm where a reasonable adaptation to a process of care will prevent recurrence of harm”) is an extension of the most common definition reviewed above. Using this definition, the root cause(s) of harm were identified and the reviewer(s) judged that modifying the processes of care could prevent this type of cause from repeating itself under the same or similar circumstances. Here, evaluation of drug events made up only 8/30 such studies, with the remainder examining a wide variety of medical events. Only one evaluated agreement between observers (kappa = 0.95) [15]. For non-medication event studies using this definition, only two studies measured agreement with Cohen’s kappa ranging from 0.68 to 0.83, consistent with moderate to substantial agreement. In 20, there was no assessment of agreement, and again, there were no external assessment of validity or control for hindsight bias. Perhaps most clearly here, this definition would require rigorous designs to establish with confidence that a certain modification could certainly lead to harm avoidance.
-
3)
The third most common definition of preventable harm was “harm where an existing guideline has not been adhered to”. This definition introduces slightly different criteria to the assessment of whether or not an event is determined to be preventable from the prior ones. It is assumed that the existing guideline applies to the given situation, and that if the guideline was followed correctly, any harm associated with care should be considered to be not preventable. Conversely, any harm occurring in a situation in which the guideline was not followed is credited as “preventable” even if there is the possibility for disconnection between cause and effect. It is a measure of the reliability of care associated with situations that can result in harm. Twenty studies used this criterion, and 4 reported agreement with median kappa = 0.57 (0.40-.99). Event types were mostly related to venous thromboembolism (4), healthcare associated infections (5), or adverse drug events (2); all events for which there are generally well-established care guidelines. It is not clear why there appeared to be less agreement (lower kappa) in these situations compared with other definition types. However, in several cases, the care guidelines used were externally developed (e.g., the ACCP guidelines for venous thromboembolism prophylaxis) [16], so this data might reflect some measure of criterion validity. Other definitions were not used as often and most did not have assessments of agreement. It is also important to recognize that a significant proportion of guidelines either have poor quality and lack rigor or are based on low quality evidence [17].
The quality of evidence
As acknowledged in the reviewed literature, most of measures of preventable harm were not created for comparison purposes; they were locally created to foster engagement of the local caregivers in efforts to improve patient safety. Measures based on administrative data, though less “subjective” and hence more applicable for cross-platform comparison, suffer from their own validity problems. First, such measures rely on administrative data, so that documentation and coding can improve “performance measures” more readily than actually bona fide improvement in safety and reliability. Secondly, administrative indices include adverse events that, by a reasonable standard with today’s science and technology, are not truly preventable. For instance, the skin organ system, like the liver organ system, at some point in the aging and critically ill patient may fail. Even with the best bedding materials, operative tables and conscientious nursing skills and diligent evidence-based care, some pressure ulcers are likely inevitable in contemporary practice. Failure to distinguish such events from those that may actually be preventable affects the credibility of the measure with clinicians, who may then dismiss other aspects of the measures.
An ideal metric would have characteristics of being accurate, valid, reproducible, and would have a high potential to engage caregivers in improvement work. As any other causal inference, the confidence in an estimate of a causal association between a modifiable event and the risk of harm is related to the validity of the evidence supporting such association. As in therapeutic associations, randomized trials are better suited to establish the preventable harm association when the risk of harm is moderate to high and the association small (when multiple contributing factors exist and noise and bias are of similar size as the signal or effect being detected). Large effects in a context where confounders are not problematic, compelling dose–response evidence, and protections against ascertainment bias of exposure and harm and selection bias further the quality of this evidence. All-or-nothing associations are associated with large and idiosyncratic harms and their link is obvious (e.g., wrong side surgery). However, the chain of events that led to such an event may be complex and convoluted and solutions to reduce the chance that this harmful process will repeat will likely require empirical evidence of efficacy. Given the known limitations of standards of care and guideline statements, their efficacy in preventing harm should also be empirically tested. Therefore, the overall quality of the current evidence in the field (strength of inference) remains quite limited.
Principles for forging ahead
Preventable harm, therefore, appears to be best defined by three criteria: (a) harm with an incidence that can be reduced by virtue of detecting and intervening or preventing a causal event or chain of events (an error, an error-prone process, deviation from best practice), (b) the causal event or chain of events by their nature can be detected before the harm takes place, (c) there is evidence that an intervention is efficacious in reducing or eliminating the harm by virtue of eliminating the offending cause or disrupting a harmful chain of events. In addition, events to be included in assessment for preventability ought to have clear and validated criteria for harm level. As we have seen, the available definitions offer elements of the first one or two of these criteria with many affected by hindsight bias (violation of the second criterion). Few push their definitions to require empirical evidence of preventability.
Given the low level of confidence the data afford regarding measures of preventability, perhaps it would be prudent to add a modifier that would reflect the extent of adherence to these standards, such as definite, probable, and plausible. Definitely preventable harms, for instance, will apply to wrong treatment situations or error-prone complex processes with empirical evidence from a randomized trial that an intervention (e.g., checklist) consistently reduces the risk of harm by a large extent. Definitely preventable harms are ready to become targets for improvement and accountability. Probably and plausibly, preventable harms may require additional empirical work to become targets for improvement and accountability. As we discussed above, as the evidence evolves (new technologies appear, new evidence supporting these technologies emerges), harms which may have been plausibly preventable or not preventable may become definitely preventable. This classification therefore allows the safety community to work not only to reduce harm and improve safety but also to enhance the quality of the science of healthcare delivery. Future primary research in this area is needed to advance the field. Randomized or quasi randomized trials (e.g., pre-post design, comparative controlled cohort studies) will be needed to test the effectiveness of interventions that target harm associated with the delivery of healthcare. These trials will help determine a possible ceiling for improvement and define what is preventable. We believe that establishing a definition of preventable harm is desirable and may help guide quality improvement efforts and safety initiatives; however, a proposed definition would need to be sufficiently nuanced to reflect the setting and purpose of the preventability designation.
Limitations
This systematic review has several limitations that are worth acknowledging. Judgments made by reviewers of the literature, despite the good inter-reviewer agreement, remain subjective. The definitions are clearly correlated and do not represent independent constructs. We focused our literature search on the last decade aiming at identifying a contemporary view; however, prior literature may provide relevant data. We searched using text-words in English which also may have led to omitting additional relevant information. The analysis in this systematic review was meta-narrative, i.e., non-quantitative; hence, providing more specific estimates for the operational characteristics of definitions was not feasible. Lastly, this review is focused on preventability and not ameliorability. Others have distinguished between preventable adverse drug reactions (caused by an error in management) and ameliorable (their severity could have been significantly reduced if health care delivery had been optimal) [18]. The ameliorability concept is not applicable to all harms (e.g., mortality) and the current systematic review does not address its definitions.