Skip to main content

U.S. hospital performance methodologies: a scoping review to identify opportunities for crossing the quality chasm



Hospital performance quality assessments inform patients, providers, payers, and purchasers in making healthcare decisions. These assessments have been developed by government, private and non-profit organizations, and academic institutions. Given the number and variability in available assessments, a knowledge gap exists regarding what assessments are available and how each assessment measures quality to identify top performing hospitals. This study aims to: (a) comprehensively identify current hospital performance assessments, (b) compare quality measures from each methodology in the context of the Institute of Medicine’s (IOM) six domains of STEEEP (safety, timeliness, effectiveness, efficiency, equitable, and patient-centeredness), and (c) formulate policy recommendations that improve value-based, patient-centered care to address identified gaps.


A scoping review was conducted using a systematic search of MEDLINE and the grey literature along with handsearching to identify studies that provide assessments of US-based hospital performance whereby the study cohort examined a minimum of 250 hospitals in the last two years (2017–2019).


From 3058 unique records screened, 19 hospital performance assessments met inclusion criteria. Methodologies were analyzed across each assessment and measures were mapped to STEEEP. While safety and effectiveness were commonly identified measures across assessments, efficiency, and patient-centeredness were less frequently represented. Equity measures were also limited to risk- and severity-adjustment methods to balance patient characteristics across populations, rather than stand-alone indicators to evaluate health disparities that may contribute to community-level inequities.


To further improve health and healthcare value-based decision-making, there remains a need for methodological transparency across assessments and the standardization of consensus-based measures that reflect the IOM’s quality framework. Additionally, a large opportunity exists to improve the assessment of health equity in the communities that hospitals serve.

Peer Review reports


Today, hospital performance is increasingly important given growing demands to control healthcare costs [1, 2]. Hospitals are being reimbursed based on their ability to deliver high quality care and deliver value to patients [3], and patients are taking a more active role in their healthcare decisions [4]. Performance measurements are progressively being linked to reimbursement in pay-for-performance models [5]. Yet, quality metrics used in the measurement of value-based care may not optimally reflect the quality of care provided. Therefore, a need exists to balance quality initiatives with financial feasibility (i.e., value-based care).

Commonly used domains for understanding quality are the Institute of Medicine’s (IOM) framework (safety, timeliness, effectiveness, efficiency, equitable, and patient-centeredness; the acronym referred to as “STEEEP”). Using these domains may help balance quality with value for particular measures. Moreover, employing the domains of STEEEP may reduce variation in how care is delivered and practiced, revealing differences that exist across geographic, cost, and personal (e.g. racial) characteristics [6, 7].

IOM STEEEP proposes domains for quality care, but does not articulate specific quality indicators nor how to combine these measures to assess quality performance as a whole. Measures of performance are dependent upon the availability of data. Existing means of measuring hospital performance may include regulatory inspection or reporting, surveys, and statistical indicators which are often combined into composite scores. Although many measures exist, no clear consensus has been reached on which measures should be used for measuring hospital performance. For example, few common scores or standardized measures exist across the various national hospital ratings systems [8].

Yet, it is clear that better and worse methods of measuring hospital performance exist [9], such as consensus-driven and evidence-based indicators endorsed by the National Quality Forum (NQF) [10] and the Agency for Healthcare Research and Quality (AHRQ) [11]. Moreover the Donabedian framework can help guide how comprehensively quality is assessed across assessments using different performance measures. However, there are not clear guidelines for assessments to incorporate specific methodologies and appropriate measures to fit within the IOM’s STEEEP framework. Examining these aspects of existing hospital performance assessments are a first step toward developing more transparent and robust methods for determining how accurately and comprehensively hospitals provide quality care.

The purpose of the scoping review is to provide a comprehensive analysis of United States (US) methodologies used to assess hospital performance and their measures as they correspond to the IOM’s STEEEP quality framework. Using the STEEEP framework, quality domains and respective gaps were identified across currently available assessments using a systematic approach. Robustness (e.g. number of data sources and measures) and transparency of methodologies, to understand how measures were combined to assess hospitals, were evaluated. Additionally, in the context of informing policy to support value-based, patient-centered care, opportunities were identified for hospital assessments to “cross the quality chasm” [12].


Study design

A scoping review [13] was conducted in accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) to identify studies that provide assessments of US-based hospital performance whereby the study cohort examined a minimum of 250 hospitals [14]. The review was designed to curate a comprehensive snapshot of recent and active methodologies regarding hospital performance in order to evaluate the current landscape. Therefore, inclusion criteria were limited to identifying published studies from 2017 to 2019 that included methodologies examining performance of 250 or more hospitals, which allowed for generalizable synthesis. Details of the methodology are provided in Additional File 1.

Search strategy

A systematic search query of MEDLINE via PubMed and the grey literature was conducted to identify references published or available online between September 1, 2017 to September 1, 2019. This timeframe supports the identification of recently published hospital performance assessments.

Screening process

Relevant references related to hospital performance assessment were screened and abstracted into standardized forms by independent dual review and conflict adjudication was provided by a third reviewer. Interrater reliability was determined by the kappa statistic [15].

Data extraction

The following criteria were abstracted into standardized forms for synthesis and evaluation: data source including origin of data, data linkage, availability, type, sample size, and observation period; cohort development including inclusion/exclusion criteria and data pre-processing; measure (see below); and score including composite calculation.

Measure characteristics

Beyond the data extracted from selected assessments as described above, specific measure or indicator characteristics were abstracted including name of measure, measure calculation, normalization, and explanation of why the measure was included, if any. Because measure characteristics were the focus, direct evaluation of the sensitivity of the measures was not conducted; however, data abstraction included how measures were chosen. Each measure was mapped, if possible, to categories within the Donabedian conceptual model of quality improvement which includes structural, outcome, and process categories, and the STEEEP framework for the domains of quality.

To determine if STEEEP mapped measures were supported by federal and non-profit organizations that lead consensus- and evidence-based measure reporting for healthcare quality, each measure was cross-referenced to AHRQ (prevention, inpatient, and patient safety quality categories) quality recommendations, and NQF endorsement.


Summary of included assessments

From 3058 unique records screened, 19 hospital performance assessments described in the literature met inclusion criteria (Fig. 1). Of those studies, five de novo assessments [16,17,18,19,20], six evaluations of organizations’ ratings [21,22,23,24,25], and eight organizations providing assessments (with shorthand designation noted in brackets) were identified [26,27,28,29,30,31,32,33]: (1) Consumer Reports® Hospital Ratings [Consumer Reports], (2) Healthgrades™ America’s Best Hospitals [Healthgrades], (3) The Centers for Medicare & Medicaid Services (CMS) Hospital Compare [Hospital Compare], (4) IBM® Watson Health® 100 Top Hospitals® [IBM], (5) Island Peer Review Organization (IPRO), Why Not The Best? [IPRO], (6) The Joint Commission America’s Hospitals [Joint Commission], (7) Leapfrog Top Hospitals [Leapfrog], and (8) U.S. News and World Report Best Hospitals Procedures and Conditions [US News].

Fig. 1
figure 1

Results of the literature search, Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram

Assessment methodologies overview

Four types of hospital assessments were identified: ranking, rating, listing, and evaluation-based studies. Ranking (IPRO, Hamadi et al. (2019) [16], Odisho et al. (2018) [19], Walkey et al. (2018) [34], Yokoe et al. (2019) [18]) assessments denoted a system by which all hospitals are arranged in order of ascending performance. Rating (Consumer Reports, Healthgrades, Leapfrog, US News; Al-Amin et al. (2018) [17]) assessments placed hospitals into relative quality groups. Listing assessments (Hospital Compare, IBM, Joint Commission) indicated hospital quality without comparison to other hospitals. Lastly, evaluation-based studies [20,21,22,23,24,25] provided critical examinations of hospital performance assessment methodologies from Hospital Compare [21, 23, 24, 35] and US News [21, 25].

Most of the assessments explained why specific measures were chosen for their particular methodology including the Consumer Reports, Hospital Compare, IBM, US News, and the de novo and evaluation-based studies. Reasons for including specific measures were wide-ranging, but centered on existing evidence that an indicator is associated with an endorsed quality outcome, such as mortality. Clear descriptions for why specific measures or indicators were chosen were not identified for Healthgrades, IPRO, Joint Commission, and Leapfrog.

Rather than addressing overall hospital performance, some studies assessed specific quality domains such as patient safety (e.g., surgical site infections [18], surgical procedures [20]), effectiveness (e.g., 30-day readmission [20]; 30-day mortality [34], and patient-centeredness (e.g., Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS)) measures [17].

Summary information about the data sources, cohort development, scoring, and model performance across assessments can be found in Additional File 2.

Performance measures

The kappa statistic for interrater reliability of data extraction was 0.69, including both Donabedian categorizations (e.g., structure, process, outcome) and STEEEP framework mapping. For simplicity of comparisons and to provide a subgroup analysis, this section will focus on the following eight organizations that provided overall hospital performance (i.e., reported and assessed information in more than one quality domain): Consumer Reports, Healthgrades, Hospital Compare, IBM, IPRO, Joint Commission, Leapfrog, and US News.

Most performance assessments used primarily outcome (n = 187) and secondarily process-driven indicators (n = 80) while three (IPRO, Leapfrog, US News) also included structural-based measures (n = 16) to assess quality according to the Donabedian conceptualization (Fig. 2a). Three assessments did not use multiple concepts in their methodologies; Healthgrades and IBM exclusively reported outcome measures while the Joint commission methodology was limited to process measures.

Fig. 2
figure 2

Frequency of (a) Donabedian categorizations and (b) percentage of STEEEP measures per assessment

Within the STEEEP quality framework, all assessments contained safety, five used timeliness, seven discussed efficiency, six used effectiveness, none explicitly reported equity, but five conducted risk- or disease severity-adjustments in models of other quality domains to address an equity-related issue (e.g., effectiveness and safety: race-adjusted mortality rate), and six included patient-centeredness indicators.

Across the assessments, measures were mapped (some to more than one domain); safety indicators (n = 168) were most commonly identified followed by effectiveness (n = 88), timeliness (n = 49), efficiency (n = 42), patient-centeredness (n = 33), and equity (n = 10) (using adjustments for equity-related variables) measures (Fig. 2b). Figure 2 summarizes the Donabedian conceptualization and STEEEP framework mapping of identified quality measures across assessments. Notably, some structural measures were unable to be mapped to STEEEP (e.g., adjusted operating profit margin, hospital-specific designations, percent of Medicare beneficiaries of all ages with diabetes or heart disease, and programs data).

Common themes among process and outcome measures mapped to the STEEEP framework were identified along with their respective weights to determine hospital scoring (Figs. 3 and 4). Large overlap or similarity of identified measures occurred in the following themes: safety and effectiveness domains included mortality, readmission, complications, and hospital acquired infections (HAIs); timely and efficient care regarded emergency department (ED) throughput and length of stay (LOS); and lastly, patient-centeredness was limited to patient experiences summarized by HCAPHS survey data (Fig. 3a). The weighting of these frequent STEEEP quality indicators varied widely across assessments or was not provided (Fig. 3b). Mortality weighting ranged from 2 to 50%; readmissions indicators contributed to roughly 20% of the score when weighted; complications weighting ranged from 10 to 50%; ED throughput weighting range was lowest with 4–10%; LOS was only weighted at 10% by one assessment; and HCAPHS survey data contributed 10–22% of the scoring. Figure 3b details the weights provided for other measures that were not commonly identified across assessments to demonstrate transparency of scoring, where possible.

Fig. 3
figure 3

Frequent quality domain (a) measure overlaps and (b) comparison of their weights among assessments a) Abbreviations: ED, emergency department; HCAHPS, Hospital Consumer Assessment of Healthcare Providers and Systems. b) Descriptions of “Other” across assessments. Consumer Reports: Other, efficient use of imaging process measures; Hospital Compare: Other, 4% efficient use of imaging and 4% effectiveness of care process measures (e.g., patients assessed and given influenza vaccination; percentage of patients who left the ED before being seen; percentage of patients who came to the ED with stroke symptoms who received brain scan results within 45 minutes of arrival; percentage of patients receiving appropriate recommendation for follow-up screening colonoscopy; percentage of patients with history of polyps receiving follow-up colonoscopy in the appropriate timeframe; percent of mothers whose deliveries were scheduled too early (1-2 weeks early), when a scheduled delivery was not medically necessary; percentage of patients who received appropriate care for severe sepsis and septic shock; patients who developed a blood clot while in the hospital who did not get treatment that could have prevented it; percentage of patients receiving appropriate radiation therapy for cancer that has spread to the bone). IBM: Other, 10% operating profit margin (no mapping) and 10% adjusted inpatient expense per discharge for efficiency. IPRO: Other, weight not provided for timely and effective 1) stroke care (thrombolytic therapy, antithrombolytic therapy by end of hospital day 2, VTE prophylaxis, discharged on antithrombolytic therapy, anticoagulation therapy for atrial fibrillation/flutter, discharged on statin medication, stroke education), and 2) blood clot prevention and treatment (VTE prophylaxis, intensive care unit VTE prophylaxis, incidence of potentially preventable VTE, anticoagulation overlap therapy, unfractionated heparin with dosages/platelet count monitoring, warfarin therapy discharge instructions; safety, early elective delivery rates; efficiency, spending per Medicare beneficiary and health care costs; structural HIT measures and imaging for efficiency and safety; efficiency, population health and utilization costs; structural measures from county health rankings data on health factors and health outcomes related to preventive care for safety. Joint Commission: Other, weight NP for process measures. Timely and effective 1) stroke care (thrombolytic therapy, antithrombolytic therapy by end of hospital day 2, VTE prophylaxis, discharged on antithrombolytic therapy, anticoagulation therapy for atrial fibrillation/flutter, discharged on statin medication, stroke education; assessed for rehabilitation, VTE discharge instructions, and 2) blood clot prevention and treatment (VTE prophylaxis, intensive care unit VTE prophylaxis, incidence of potentially preventable VTE, anticoagulation overlap therapy, unfractionated heparin with dosages/platelet count monitoring, warfarin therapy discharge instructions; safety, early elective delivery rates; safety and effectiveness of antenatal steroids; safety and effectiveness for inpatient psychiatric services (admission screening, physical restraint, seclusion, and justification for multiple antipsychotic medications); safety and effectiveness of preventive care for influenza immunization, tobacco use (screening, treatment provided or offered, treatment provided or offered at discharge), hearing screening, alcohol use (screening, brief intervention provided or offered, or other drug use treatment provide or offered at discharge); effectiveness of exclusive breast milk feeding; surgical care effectiveness and safety of urinary catheter removal and antibiotics within one-hour before first surgical cut; safety and effectiveness, children’s asthma care, home management plan of care; and timely acute myocardial infarction measures (fibrinolytic therapy within 30 minutes and primary percutaneous coronary intervention received within 90 minutes). Leapfrog: Other, 23.1% safety practice process measures (leadership structures and systems; culture measurement, feedback, and intervention; identification and mitigation of risks and hazards; nursing workforce; hand hygiene) and 11.5% HIT (computerized physician order entry and bar code administration) safety, timeliness, and efficiency. Notably, the weights provided by Leapfrog only sum to 97.3% rather than 100%. US News: Other, weight NP for process measures on effectiveness (patient flu immunization and worker flu immunization) and safety (noninvasive ventilation and transfusion); outcome measures on patient-centeredness and safety (discharge to location other than patient’s home); structural safety measures related to information on board certifications and specialties, number of patients (volume), nurse staffing, number of intensivists, and transparency (reporting of performance). Abbreviations: ED, emergency department; HCAHPS, Hosptial Consumer Assessment of Healthcare Providers and Systems; HIT, health information technology; NP, not provided; VTE, venous thromboembolism; -, not an included measure

Fig. 4
figure 4

Identifying hospital performance gaps in STEEEP across assessments (a) Summary of quality measures mapped to STEEEP across assessments. No specific measures for equity were identified, but risk adjustment may have been done using equity-based variables; (b) Equity measures represent an opportunity for improvement; additional considerations for new measures in quality are suggested. The color range key represents the number of measures identified across all assessments where the darker color of blue indicates a higher frequency. While these measures for equity consideration could also be considered measures of effectiveness, we suggest that the influence of variables such as race, ethnicity, gender, socioeconomic status, and geography (at a minimum) on health and healthcare delivery outcomes could be further examined here. This is an opportunity for stakeholders to evaluate relationships among various types of inequality with the hopes to identify mechanisms and possible interventions to promote health equity in their communities

With safety and effectiveness as overt priorities in hospital performance outcomes, 30-day mortality and 30-day readmission rates were commonly identified with the exception of Joint Commission and Leapfrog assessments; notably, Leapfrog used death rate of surgical inpatients with serious treatable conditions as a measure of mortality. These 30-day effectiveness of care measures identified varied in their risk- and severity- adjustments, as did patient conditions (acute myocardial infarction, chronic obstructive pulmonary disease, heart failure, pneumonia, and/or stroke) as components of these composite outcomes. Harm outcomes were also frequently represented across assessments (except Joint Commission) including medical and surgical complications and HAIs. Medical complications were occasionally grouped with HAIs when the AHRQ patient safety indicator (PSI) 90 was used; other medical complication measures examined pressure ulcer rates, iatrogenic pneumothorax rates, in-hospital falls and trauma, and venous thromboembolism (VTE) incidence. Surgical complications varied greatly, but the most frequently identified measures related to hip fracture treatment, hip and knee replacements, and postoperative respiratory failure and wound dehiscence rates. HAIs measures commonly included catheter-associated urinary tract infections (CAUTIs), Clostridium difficile (C. diff) infections, central-line associated bloodstream infections (CLABSIs), methicillin-resistant Staphylococcus aureus (MRSA) infections, severe sepsis and shock, and surgical site infections (SSIs).

Timely care outcomes that reduce wait times or harmful delays and efficient care outcomes that reduce cost and unnecessary resource utilization, as adapted from AHRQ and CMS definitions, were identified as common STEEEP domains. Four (Hospital Compare, IBM, IPRO, Joint Commission) assessments focused primarily on ED throughput measures, and LOS was examined by two assessments (IBM provided severity-adjusted LOS when compared to unadjusted LOS by US News). ED throughput measures considered median times from ED arrival to ED departure for both admitted and discharged ED patients as well as admit decision time, time to pain management, time to fibrinolytic therapy, and patients left without being seen.

Patient experience (patient-centeredness) outcomes were identified in most assessments except Healthgrades and Joint Commission. The results were derived from survey questions using HCAPHS data; most were a composite of multiple categories related to communication from provider, patient-provider relationships, receiving help when needed, controlling pain, cleanliness of room, quietness of room, likelihood to recommend hospital, and overall patient experience.

Equity-based measures were not stand-alone metrics to demonstrate the remediation of differences in the quality of health and healthcare across different populations in the communities that hospitals serve. Identified equity measures included risk- and disease severity-adjustments for covariates such as gender, geography, and socioeconomic status (e.g., Medicare/Medicaid dual eligibility as a proxy) and were used by five assessments (Consumer Reports, Healthgrades, IBM, IPRO, US News) in LOS, mortality, complications, and/or post-surgical infection measures (Fig. 4a).

Measure developers, both government and non-profit, provide endorsements using consensus and evidence-based review. These recommended measures allow comparisons of performance to recognized standards for the improvement of care and outcomes. Identified measures were mapped to AHRQ and NQF endorsements (Table 1). Using standardized quality indicators from the AHRQ as benchmarks, patient safety indicators were included by all assessments except the Joint Commission. AHRQ inpatient indicators were used by all assessments except IPRO, Joint Commission, and Leapfrog. AHRQ prevention measures were only used by IPRO. Upon examining NQF endorsements of AHRQ measures, all assessments used at least one measure endorsed by NQF in each AHRQ category.

Table 1 STEEEP quality domains according to the Donabedian framework with national endorsement mapping


Hospital performance is often assessed beyond the examination of quality measures, including financial health and employee health of the organizations being reviewed. This study intended to examine quality domains (i.e., STEEEP) and their use as part of hospital performance assessment, and identify relationships, if any, between the two. Coverage and weighting of measures mapped to the STEEEP framework varied across assessments, which indicates that there is limited consensus on how to best measure hospital quality. Moreover, disparate measures and methodological disagreement may foster cynicism and confusion [9] among stakeholders that include patients, providers, payers, purchasers, and policy makers. This does not mean quality assessments should be disregarded, but that they should be considered in the larger context of hospital performance.

Our identification of evaluation-based studies that critically examined assessment methodologies determined that ranked or rated hospitals do not necessarily associate with quality [21, 25]; high performing hospitals did not have better CMS-based outcomes compared to other low performers. A suggested reason for this difference is that performance may be skewed by factors not directly related to quality such as patient volume, where high-volume facilities had better ratings [23]. Moreover, a number of other hospital-level characteristics, such as academic tertiary care center status [35], have been associated with poor performance on CMS-calculated metrics [17, 35]. These examples demonstrate that no assessment methodology is perfect, but each has its own set of strengths to inform their intended audience for the improvement of care and clinical outcomes.

Process measures should reflect evidence-based practices that systematically improve care and prevent negative outcomes. Frequently identified process measures were primarily centered on the effectiveness of care; these protocols to reduce the variation in care and improve the safety and efficiency of healthcare delivery were focused on VTE prevention and treatment, communication practices related to education and discharge information for continued therapeutics, and preventive interventions including immunizations, screenings, and management of care. The majority of assessments included process measures in their methodology as they are tied to reimbursement, accreditation requirements, and/or state and federal mandates. However, providers are pushing for more outcome metrics specific to the patients they serve in addition to process metrics [36].

Develop patient-centered outcome measures along the continuum of care

Understanding patient outcomes is pivotal to provide value-based care and allows the opportunity to refine and improve care [37]. The majority of identified measures focused on outcomes across safety and effectiveness; currently, less consideration is provided to improve patient-centeredness, which should be equivalently emphasized in hospital assessment. Patient experience outcome measures were self-reported using HCAPHS surveys. These data cover a range of interactions that patients have with the healthcare system including care received from their doctors, nurses, and staff. These are particularly important measures as positive patient experiences are related to better health outcomes including lower readmission rates [38]; moreover, HCAPHS scores are commonly tied to value-based reimbursement. However, measurable value is generated by improving patient outcomes with particular conditions across the comprehensive continuum of care, and may involve multiple specialties at numerous care sites rather than on individual patient encounters.

Hospitals’ value-based performance depends upon health and healthcare received by its patient population prior to and following the care delivered within hospital walls. Measure developers should aim to incorporate outcome-oriented and patient-centered viewpoints, using a combination of clinical, claims, and patient-reported longitudinal data rather than using one dataset from a single site at one snapshot of time in the patient’s receipt of care. A significant roadblock to implementing these types of measures is the inability to easily exchange healthcare data, a lack of interoperability. Notably, the U.S. Department of Health and Human Services have recently implemented two rules (from the Cures Act and MyHealthEData initiative) (Available from: requiring both private and public entities to share information between patients and other parties in a private and secure manner. This access to health information will help resolve interoperability barriers when obtaining data required to support innovative patient-centered outcome measures.

Gaps in measuring hospital quality: equity and efficiency

As patient-centered models and value-based payments systems gain support, hospitals will need to monitor and evaluate services received outside their walls for reporting purposes and effective care management from the patient perspective. As such, hospitals will (and should) be held accountable for health outcomes outside of their facilities in the communities they serve. Although addressing health equity is difficult and will not be solved by hospitals alone, hospitals play an important role in community health and should attempt to address population health concerns in their area; moreover, this will require the reconsideration of how hospitals are incentivized to provide care.

Monitoring of health inequities, observing differences in health between subgroups within their community, is essential to achieve health equity. Recently Hamadi et al. (2019) demonstrated that healthcare quality measurement does not adequately adjust for the differences in serving these communities including access and affordability, prevention and treatment, and avoidable hospital use and cost; the level of minority presence and hospital reimbursement policies influence referral region health rankings [16]. Unfortunately there is limited consensus on direct measures of health equity, but opportunities exist to examine the relationship between population groups that exhibit disparities in health and healthcare delivery outcomes. Ideally, equity-related considerations should be a part of every quality domain (i.e., the remaining STEEP domains). Focusing on recommended equity measures, or examining other equity-related measures, such as hospital workforce diversity, are places to begin to address health equity. Healthcare is built on a foundation of rapport and trust, and both are garnered in part when providers emphasize cultural and linguistic competency in health and healthcare decision-making for their patients. Quality of care for a community can be improved by building a diverse workforce that represents the community.

Increasingly, social determinants of health (SDoH) are being recognized as important proxy measures of health equity as well as supportive of value-based care. Yet across assessments we find only one (IPRO) broached health equity issues related to SDoH using access to care data and percent inadequate social support. Preliminary return-on-investment analyses [39,40,41] and policy recommendations [42, 43] prioritize collection of actionable SDoH factors such as education, food, and housing to reduce costs using targeted community-based interventions. There are several US datasets, mostly public survey-based surveillance data, that can be used to extract insights on diverse populations (e.g., racial, ethnic, sexual/gender minority groups) and to assess health equity and/or social determinants [44,45,46,47]. However in order to develop richer and more diverse SDoH datasets, incentives to track and share SDoH data are needed in order to better achieve health equity in the context of value-based care.

This viewpoint, that hospitals should use SDoH data for the purposes of quality assessment of health equity, re-orients the traditional role of the hospital from an institution designed to cure, to a health system supporting population health. Arguably, this is not a simple task given the highly fragmented nature of healthcare today and payment incentives. However, there are some steps hospitals can take. For example, most hospitals complete a community needs assessment every three years to help identify resource needs of their community and develop programming, of which many relate to SDoH variables such as education, food, and housing; these gaps in care are addressed directly with local stakeholders to improve healthcare. By addressing the needs of the community, hospitals are taking a primary role to improve the outcomes of the communities that they serve. In doing so, there is greater value by addressing SDoH, which are associated with substantial costs to the healthcare system.

The transition towards measuring patient-centered outcomes will also depend on how well healthcare and social care can address the needs of those most at risk for poor outcomes. Expanded consideration of equitable care should influence the design of future quality measures, which will require increased development and testing of prevention and equity indicators. Potential publicly available equity measures (e.g., amenable mortality, life expectancy, or potential lives lost) are discussed in Fig. 4b. However, community surveillance-based measures will require standardized, accurate, and secure data platforms to access a comprehensive view into the health of individual patients. Additionally, the use of disaggregated datasets, from multiple sources, to represent inequity dimensions (e.g., socioeconomic, demographic, or geographic factors) can facilitate data-driven policies, programs, or practices to advance equity [48,49,50].

The ability of hospitals to support their communities through wellness initiatives is dependent upon fiscal solvency. Community benefit planning, including expansion of health equity programs, requires investment strategies dependent upon operating margins, which are linked to performance-based incentives. Only one assessment (IBM) provided insight into the financial health of hospitals using an outcome measure of operating profit margins, which was not mapped to STEEEP. Until policy incentives are shifted, equity assessments will likely be de-prioritized as hospitals follow suit with CMS and other guidelines for reimbursement.

An additional gap identified was fiscal insight into operating efficiency, how hospitals balance the need for quality with cost. These types of measures are increasingly important with the trend toward value-based care. This need is reinforced by the fact that 8% of US hospitals are at risk of closing and 10% are considered fiscally weak [51]. Only two assessments examined outcome measures related to cost of care (case-mix and wage-adjusted expense per discharge, IBM; and several cost reports related to Medicare reimbursement among several diagnosis related groups, IPRO).

Strategies to improve hospital performance assessment

Methodologies examining hospital performance were identified as limitations to many assessments in this review; the transparency and rigor of methodologies were frequently noted as low. Evaluation-based research of methodologies by peer-review to improve measure testing, model improvements, and scoring would be valuable for all assessments. Further, given the large quantity of data required to derive assessments; the use of data-driven analytics would increase study rigor. For example, a semi-supervised machine-learning algorithm applied to publicly-available quality measures for US hospitals provided a novel clustering and visualization method to identify differences in performance that are typically obscured by existing hospital rating systems [52].

Given the variety in types of hospitals examined, it is challenging to provide an apples-to-apples comparison for short-term acute care settings, so classification models to create more homogeneous hospital groupings would be advantageous. Additionally, artificial intelligence methods such as machine learning algorithms could be used to improve model performance, evaluate variables that are used to create hospital grouping classifications, or identify factors that are associated with high-performance.

Strengths and limitations

This review has several strengths. First, this is a novel review that examines assessments targeting a wide range of audiences; while other articles have compared hospital ratings [8, 9, 53, 54], none have objectively evaluated methodologies related to the IOM quality framework. Secondly, a rigorous, scoping methodology was applied in our approach. An exhaustive literature search was conducted for the time limit, including an evaluation of grey literature and web-based content that prioritized sensitivity over specificity, revealing a lack of peer-review for all assessments; however, some assessments were more transparent in their methodology than others. An opportunity exists for critical evaluation, including assessment of risk model performance and information on measure reliability and validity, by external reviewers to support credibility and trustworthiness of hospital performance measures.

Our results should be interpreted in the context of a few limitations. This scoping review search was limited in scope and comprehensiveness because only two years of articles were reviewed; however, we felt this search limit (2017–2019) was necessary in order to assess the current landscape of hospital performance methodologies. Additionally, performance methodologies that examined less than 250 hospitals were excluded to provide more generalizable results and as such, some relevant studies may have been missed. Measure mapping to Donabedian categories and the STEEEP framework had moderate interrater reliability (0.69), but the research team interpreted the measures using IOM definitions. As such, there were instances of disagreement between Donabedian categorizations made by the assessments and our review as well as mapping to STEEEP. Similarly, it was challenging to map measures exactly to AHRQ indicators. Reviewers used a standardized approach for labeling, to be consistent when measure definitions were not exact fits, but the domain and intent of the measure was the same. An additional limitation was the dependence on publicly available information during the chosen time frame; some assessments had removed web content, did not have timely web content updates, or lacked transparency to identify abstracted information and gaps in data. Notably, our US News evaluation was limited to the Procedures and Conditions rating report, but their Best Hospitals Honor Roll includes the Specialty ranking methodology; this approach allowed a more equivalent comparison to the acute care hospitals being examined, as no other assessments evaluated specialty hospitals.


There is a need for the standardization of consensus-derived quality measures that reflect the changing landscape of value-based care and patient-centered healthcare models. While safety and effectiveness were commonly measured quality indicators, there were less frequent uses of efficiency and timeliness, and no direct measures of equity were identified, only adjustments for covariates. Quality measure developers should consider patient-centered outcomes and include efficiency measures to assess cost and operating margins. Their impact on a hospital’s ability to support the expansion of equity-based programs, community-linked initiatives to address SDoH, and health disparities issues that greatly impact health and healthcare should be assessed.

Availability of data and materials

All supporting data has been provided in additional files.



Agency for Healthcare Research and Quality


The Centers for Medicare & Medicaid Services


emergency department


hospital acquired infections


Hospital Consumer Assessment of Healthcare Providers and Systems


International Business Machines


Institute of Medicine


Island Peer Review Organization


length of stay


National Quality Forum


Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews


Safety, Effectiveness, Efficiency, Equitable, and Patient-centered


venous thromboembolism


United States


  1. Hilsenrath P, Eakin C, Fischer K. Price-transparency and cost accounting: challenges for health care organizations in the consumer-driven era. Inquiry. 2015;52:0046958015574981.

  2. Martin AB, Hartman M, Washington B, Catlin A. National Health Care Spending In 2017: Growth Slows To Post-Great Recession Rates; Share Of GDP Stabilizes. Health Aff (Millwood). 2019;38(1):101377hlthaff201805085.

    Article  Google Scholar 

  3. Elliott MN, Beckett MK, Lehrman WG, Cleary P, Cohea CW, Giordano LA, et al. Understanding the role played by Medicare's patient experience points system in hospital reimbursement. Health Aff (Millwood). 2016;35(9):1673–80.

    Article  Google Scholar 

  4. Pomey MP, Ghadiri DP, Karazivan P, Fernandez N, Clavel N. Patients as partners: a qualitative study of patients' engagement in their health care. PLoS One. 2015;10(4):e0122499.

    Article  Google Scholar 

  5. Kahn CN 3rd, Ault T, Potetz L, Walke T, Chambers JH, Burch S. Assessing Medicare's hospital pay-for-performance programs and whether they are achieving their goals. Health Aff (Millwood). 2015;34(8):1281–8.

    Article  Google Scholar 

  6. Baicker K, Chandra A, Skinner JS. Geographic variation in health care and the problem of measuring racial disparities. Perspect Biol Med. 2005;48(1 Suppl):S42–53.

    Article  Google Scholar 

  7. Buchan HA, Duggan A, Hargreaves J, Scott IA, Slawomirski L. Health care variation: time to act. Med J Aust. 2016;205(10):S30–s3.

    PubMed  Google Scholar 

  8. Austin JM, Jha AK, Romano PS, Singer SJ, Vogus TJ, Wachter RM, et al. National hospital ratings systems share few common scores and may generate confusion instead of clarity. Health Aff (Millwood). 2015;34(3):423–30.

    Article  Google Scholar 

  9. Shahian DM, Normand SL, Friedberg MW, Hutter MM, Pronovost PJ. Rating the raters: the inconsistent quality of health care performance measurement. Ann Surg. 2016;264(1):36–8.

    Article  Google Scholar 

  10. Ferrell B, Connor SR, Cordes A, Dahlin CM, Fine PG, Hutton N, et al. The national agenda for quality palliative care: the National Consensus Project and the National Quality Forum. J Pain Symptom Manag. 2007;33(6):737–44.

    Article  Google Scholar 

  11. Nguyen MC, Moffatt-Bruce SD, Van Buren A, Gonsenhauser I, Eiferman DS. Daily review of AHRQ patient safety indicators has important impact on value-based purchasing, reimbursement, and performance scores. Surgery. 2018;163(3):542–6.

    Article  Google Scholar 

  12. Reid PP, Compton WD, Grossman JH, Fanjiang G. Crossing the Quality Chasm; 2005.

    Google Scholar 

  13. Grant MJ, Booth A. A typology of reviews: an analysis of 14 review types and associated methodologies. Health Inf Libr J. 2009;26(2):91–108.

    Article  Google Scholar 

  14. Tricco AC, Lillie E, Zarin W, O'Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169(7):467–73.

    Article  Google Scholar 

  15. McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb). 2012;22(3):276–82.

    Article  Google Scholar 

  16. Hamadi H, Apatu E, Loh CA, Farah H, Walker K, Spaulding A. Does level of minority presence and hospital reimbursement policy influence hospital referral region health rankings in the United States. Int J Health Plann Manag. 2019;34(1):e354–e68.

    Article  Google Scholar 

  17. Al-Amin M, Schiaffino MK, Park S, Harman J. Sustained hospital performance on hospital consumer assessment of healthcare providers and systems Survey measures: what are the determinants? J Healthc Manag. 2018;63(1):15–28.

    PubMed  Google Scholar 

  18. Yokoe DS, Avery TR, Platt R, Kleinman K, Huang SS. Ranking hospitals based on Colon surgery and abdominal hysterectomy surgical site infection outcomes: impact of limiting surveillance to the operative hospital. Clin Infect Dis. 2018;67(7):1096–102.

    Article  Google Scholar 

  19. Odisho AY, Etzioni R, Gore JL. Beyond classic risk adjustment: socioeconomic status and hospital performance in urologic oncology surgery. Cancer. 2018;124(16):3372–80.

    Article  Google Scholar 

  20. Fontana MA, Lyman S, Islam W, MacLean CH. When Stars Do Not Align: Overall Hospital Quality Star Ratings and the Volume-Outcome Association. JB JS Open Access. 2019;4(1):e0044.

  21. Lichtman JH, Leifheit EC, Wang Y, Goldstein LB. Hospital quality metrics: "America's best hospitals" and outcomes after ischemic stroke. J Stroke Cerebrovasc Dis. 2019;28(2):430–4.

    Article  Google Scholar 

  22. Emmert M, Meszmer N, Schlesinger M. A cross-sectional study assessing the association between online ratings and clinical quality of care measures for US hospitals: results from an observational study. BMC Health Serv Res. 2018;18(1):82.

    Article  Google Scholar 

  23. Frisch NB, Courtney PM, Darrith B, Della Valle CJ. Do higher-volume hospitals provide better value in revision hip and knee arthroplasty? Bone Joint J. 2017;99-b(12):1611–7.

    Article  CAS  Google Scholar 

  24. Shapiro JM. One-star rating for a five-star program: evaluating 'Hospital Compare'. Public Health. 2017;151:160–8.

    Article  CAS  Google Scholar 

  25. Wang DE, Wadhera RK, Bhatt DL. Association of Rankings with Cardiovascular Outcomes at top-ranked hospitals vs nonranked hospitals in the United States. JAMA Cardiol. 2018;3(12):1222–5.

    Article  Google Scholar 

  26. Consumer Reports [Available from: Accessed 1 Sep 2019.

  27. The Centers for Medicare & Medicaid Services (CMS) Hospital Compare [Available from: Accessed 1 Sep 2019.

  28. Healthgrades [Available from: Accessed 1 Sep 2019.

  29. IBM Watson Health [Available from: Accessed 1 Sep 2019.

  30. Why not the best IPRO [Available from: Accessed 1 Sep 2019.

  31. The Joint Commission [Available from: Accessed 1 Sep 2019.

  32. Leapfrog [Available from: Accessed 1 Sep 2019.

  33. US News and World Report [Available from: Accessed 1 Sep 2019.

  34. Walkey AJ, Shieh MS, Liu VX, Lindenauer PK. Mortality measures to profile hospital performance for patients with septic shock. Crit Care Med. 2018;46(8):1247–54.

    Article  Google Scholar 

  35. Hoyer EH, Padula WV, Brotman DJ, Reid N, Leung C, Lepley D, et al. Patterns of hospital performance on the hospital-wide 30-day readmission metric: is the playing field level? J Gen Intern Med. 2018;33(1):57–64.

    Article  Google Scholar 

  36. Squitieri L, Bozic KJ, Pusic AL. The role of patient-reported outcome measures in value-based payment reform. Value Health. 2017;20(6):834–6.

    Article  Google Scholar 

  37. Porter M. Value-Based Health Care Delivery. Annals of Surgery. 2008;248(4):503–09.

  38. Carter J, Ward C, Wexler D, Donelan K. The association between patient experience factors and likelihood of 30-day readmission: a prospective cohort study. BMJ Qual Saf. 2018;27(9):683–90.

    Article  Google Scholar 

  39. Ly A, Latimer E. Housing first impact on costs and associated cost offsets: a review of the literature. Can J Psychiatr. 2015;60(11):475–87.

    Article  Google Scholar 

  40. Martin SL, Connelly N, Parsons C, Blackstone K. Simply delivered meals: a tale of collaboration. Am J Manag Care. 2018;24(6):301–4.

    PubMed  Google Scholar 

  41. Ramon I, Chattopadhyay SK, Barnett WS, Hahn RA. Early childhood education to promote health equity: a community guide economic review. J Public Health Manag Pract. 2018;24(1):e8–e15.

    Article  Google Scholar 

  42. Dyakova M, Hamelmann C, Bellis MA, Besnier E, Grey CNB, Ashton K, et al. Investment for health and well-being: a review of the social return on investment from public health policies to support implementing the sustainable development goals by building on health 2020. 2017.

    Google Scholar 

  43. DeSalvo KBWY, Harris A, Auerbach J, Koo D, O’Carroll P. A call to action for public health to meet the challenges of the 21st century. Prev Chronic Dis. 2017;14:170017.

  44. National Health Interview Survey: National Center for Health Statistics; [Available from: Accessed 1 Sep 2019.

  45. Behavioral Risk Factor Surveillance System: Centers for DIsease Control and Prevention; [Available from: Accessed 1 Sep 2019.

  46. National Health and Nutrition Examiniation Survey: National Center for Health Statistics [Available from: Accessed 1 Sep 2019.

  47. American Community Survey: United States Census Bureau; [Available from: Accessed 1 Sep 2019.

  48. Corl K, Levy M, Phillips G, Terry K, Friedrich M, Trivedi AN. Racial and ethnic disparities in care following the New York state Sepsis initiative. Health Aff (Millwood). 2019;38(7):1119–26.

    Article  Google Scholar 

  49. Figueroa JF, Zheng J, Orav EJ, Epstein AM, Jha AK. Medicare program associated with narrowing hospital readmission disparities between black and white patients. Health Aff (Millwood). 2018;37(4):654–61.

    Article  Google Scholar 

  50. Bernheim SM, Parzynski CS, Horwitz L, Lin Z, Araas MJ, Ross JS, et al. Accounting for Patients' socioeconomic status does not change hospital readmission rates. Health Aff (Millwood). 2016;35(8):1461–70.

    Article  Google Scholar 

  51. Flanagan C. U.S. hospitals shut at 30-a-year pace, with no end in sight. Accessed 1 Sep 2019.

  52. Downing NS, Cloninger A, Venkatesh AK, Hsieh A, Drye EE, Coifman RR, et al. Describing the performance of U.S. hospitals by applying big data analytics. PLoS One. 2017;12(6):e0179603.

  53. Bardach NS, Asteria-Peñaloza R, Boscardin WJ, Dudley RA. The relationship between commercial website ratings and traditional hospital performance measures in the USA. BMJ Qual Saf. 2013;22(3):194–202.

    Article  Google Scholar 

  54. Hota B, Webb T, Chatrathi A, McAninch E, Lateef O. Disagreement between hospital rating systems: measuring the correlation of multiple benchmarks and developing a quality composite rank. Am J Med Qual. 2020;35(3):222–30.

Download references


We would like to thank Elisabeth Scheufele and Suwei Wang for data abstraction. We appreciate the feedback provided by Sarah Kefayati, Brett South, Gretchen Jackson, Henry Feldman, and Kathleen Vanenkevort during the preparation of the manuscript.


This research study was supported by IBM Watson Health.

Author information

Authors and Affiliations



Conceptualization: KJTC, JG, KR; Formal analysis: KJTC, MM, HH. Methodology: KJTC, MM. Project Administration: KJTC, MM; Supervision: KJTC; Validation: KJTC, MM, HH, EP; Writing – original draft: KJTC, MM. Writing – review and editing: all authors. The author(s) read and approved the final manuscript

Authors’ information

KJ Craig, PhD is an associate chief health officer at the Center for AI, Research, and Evaluation (CARE) within IBM Watson Health. She leads evidence-based practice strategies in the healthcare sector, particularly focusing on the roles of data and technology to improve health and healthcare. She obtained her PhD in Biochemistry and Molecular Biology from Georgetown University School of Medicine and the National Institutes of Health.

Mollie McKillop, PhD is a biomedical informatician at CARE within IBM Watson Health. Mollie has expertise in health care policy, public health, data science, and informatics.

She earned her PhD in Biomedical Informatics and MPH at Columbia University in the City of New York.

Hu Huang, PhD is a data scientist at CARE within IBM Watson Health. He received his Ph.D. in Bioinformatics and Computational Biology from the University of Minnesota, Twin Cities. His research interests lie in statistical machine learning and its applications in biomedical informatics.

Judy George, PhD is a data scientist at CARE within IBM Watson Health. She has worked in hospital quality and performance improvement for over 15 years in private and public sectors for the Agency for Healthcare Research and Quality, Veterans Health Administration, Department of Defense, and the Cleveland Clinic. She completed her PhD in Health Services Research and Masters in Healthcare Management and Policy and post graduate fellowship at the Cleveland Clinic Foundation.

Ekta Punwani, MHA provides leadership, direction, and guidance to health systems and hospitals in the areas of quality and patient safety, operations, performance improvement, and analytics to drive best practice implementation and achieve outcomes. She has served on the Board of Examiners for the National Baldrige Performance Excellence Program for multiple years and is also currently serving as a member of the NQF Measure Feedback Loop Committee.

Kyu Rhee, MD, MPP is vice president and chief health officer of IBM Corporation and IBM Watson Health. He has global responsibilities for transforming health through data, analytics, and AI for providers, payers, governments, and life science companies. Prior to joining IBM, he was Chief Public Health Officer at Health Resources & Services Administration. He served on and led national initiatives related to prevention, quality, and public health.

Corresponding author

Correspondence to Kelly J. Thomas Craig.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

All authors are employed by IBM Corporation. The authors have no conflicts germane to this study.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thomas Craig, K.J., McKillop, M.M., Huang, H.T. et al. U.S. hospital performance methodologies: a scoping review to identify opportunities for crossing the quality chasm. BMC Health Serv Res 20, 640 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: