Skip to main content

Interpretability, credibility, and usability of hospital-specific template matching versus regression-based hospital performance assessments; a multiple methods study



Hospital-specific template matching (HS-TM) is a newer method of hospital performance assessment.


To assess the interpretability, credibility, and usability of HS-TM-based vs. regression-based performance assessments.

Research design

We surveyed hospital leaders (January-May 2021) and completed follow-up semi-structured interviews. Surveys included four hypothetical performance assessment vignettes, with method (HS-TM, regression) and hospital mortality randomized.


Nationwide Veterans Affairs Chiefs of Staff, Medicine, and Hospital Medicine.


Correct interpretation; self-rated confidence in interpretation; and self-rated trust in assessment (via survey). Concerns about credibility and main uses (via thematic analysis of interview transcripts).


In total, 84 participants completed 295 survey vignettes. Respondents correctly interpreted 81.8% HS-TM vs. 56.5% regression assessments, p < 0.001. Respondents “trusted the results” for 70.9% HS-TM vs. 58.2% regression assessments, p = 0.03. Nine concerns about credibility were identified: inadequate capture of case-mix and/or illness severity; inability to account for specialized programs (e.g., transplant center); comparison to geographically disparate hospitals; equating mortality with quality; lack of criterion standards; low power; comparison to dissimilar hospitals; generation of rankings; and lack of transparency. Five concerns were equally relevant to both methods, one more pertinent to HS-TM, and three more pertinent to regression. Assessments were mainly used to trigger further quality evaluation (a “check oil light”) and motivate behavior change.


HS-TM-based performance assessments were more interpretable and more credible to VA hospital leaders than regression-based assessments. However, leaders had a similar set of concerns related to credibility for both methods and felt both were best used as a screen for further evaluation.

Peer Review reports


Benchmarking hospital performance is a cornerstone of hospital quality assessment [1]. However, differences in patient case-mix and illness severity must be accounted for in order to yield fair cross-hospital comparisons [1,2,3]. The most common approach to adjust for patient characteristics is to use regression models [3], but this approach has at least two key limitations. First, clinicians frequently question whether differences in patient populations have been sufficiently accounted for in regression models [4, 5], and this concern may limit the ability of regression-based performance assessments to drive positive change. Second, estimates from regression models are used to produce a standardized mortality ratio, which is a form of indirect standardization that compares an index hospital not directly to other hospitals, but to the other hospitals only if they were to admit hypothetical populations of patients similar to the index hospital [4]. Thus, no hospital is being judged against real patient care outcomes at other hospitals. As a result of these limitations, the National Academy of Medicine has recognized the need for greater transparency and interpretability of hospital benchmarking systems and called for dedicated research to improve the science of hospital performance assessment [6, 7].

Hospital-specific template matching (HS-TM) was proposed by Silber et al.4as a fairer and more transparent approach for assessing hospital performance. In this method, a representative sample of hospitalizations is selected from the hospital under evaluation, and the outcomes of the sampled hospitalizations are compared to outcomes of matched hospitalizations from a set of comparator hospitals with sufficiently similar patient case-mix to the hospital under evaluation [4, 8]. The performance assessment is thus customized for each hospital, providing a potentially fairer assessment than regression-based performance assessment [4, 8]. Furthermore, because the quality of matching can be readily reported, HS-TM provides greater transparency than regression [4]. In prior work, we have shown that HS-TM is feasible for hospital performance assessment in the Nationwide Veterans Affairs (VA) Healthcare system [8]. Despite the case-mix variation [9], VA hospitals could each be matched to enough comparator hospitals to support performance assessment across the entire system [8].

The statistical advantages and disadvantages of these two approaches have been explored in prior studies [4, 8]. However, while HS-TM has theoretical benefits over regression-based performance assessment and is feasible in the VA healthcare system [4, 8], it is unclear whether HS-TM is more interpretable, more credible, or more usable to end-users than the traditional regression-based performance assessments. Thus, in this study, we assessed the interpretability, credibility, and usability of HS-TM-based versus regression-based performance assessments among VA hospital leaders. To do this, we generated hypothetical hospital performance assessments using real VA patient data [8], then used surveys and semi-structured interviews of VA hospital leaders to assess the utility of HS-TM-based versus regression-based performance assessment. Interpretability and credibility were assessed quantitatively by survey. Actionability and specific concerns about credibility were assessed via semi-structured interviews.



The VA healthcare system is a large U.S. national integrated healthcare system for Veterans with approximately 130 hospitals, ranging from small rural hospitals to tertiary referral centers. VA has been a leader in the development and implementation of hospital performance assessment [2, 10, 11]. It was among the first healthcare systems to have an electronic health record and to measure and report risk-adjusted mortality [2, 10, 11]. VA mortality models are updated annually [12], and risk-adjusted 30-day mortality is a key outcome metric included in quarterly hospital performance assessments [12].

Study design

We used a multiple methods approach to assess the interpretability, credibility, and usability of HS-TM-based versus regression-based hospital performance assessments among end-users charged with maintaining and improving the quality of VA care. We first surveyed VA hospital leaders (Chiefs of Staff, Chiefs of Medicine, and Chiefs of Hospital Medicine) to assess their ability to correctly interpret hospital performance assessments of risk-adjusted 30-day mortality and to evaluate their confidence in interpretation and trust in the assessment. (While no single metric is sufficient to evaluate hospital quality, we selected 30-day mortality as the outcome of interest in this study because of its importance to performance assessment in the VA system as well as in other healthcare systems.)

Second, we completed semi-structured interviews with a subset of Chiefs of Medicine to further explore their concerns regarding credibility and the uses of HS-TM-based versus regression-based performance assessments. eTable 1 summarizes the target population, enrollment, research tools, sample size, and analysis methods for the survey and the interviews. The study was approved by the Ann Arbor VA Institutional Review Board with a waiver of written documentation of informed consent for the survey portion. All methods were performed in accordance with relevant guidelines and regulations.

Randomized survey

Chiefs of Staff, Chiefs of Medicine, and Chiefs of Hospital Medicine at approximately 130 nationwide VA hospitals were invited via group emails to complete an anonymous Qualtrics survey (Qualtrics, Provo, UT) from January through May 2021. Invitation emails were sent by VA leaders (e.g., VA Ann Arbor Chief of Staff) to promote participation, with reminder and final invitation emails sent by study staff. No compensation was provided for survey completion since we anticipated surveys would be completed during respondents’ VA tour of duty.

The full survey is provided in Additional Appendix 1; key aspects of the survey are presented in Table 1. The survey vignettes were developed using 2017 VA hospitalization data [8]. The survey language was adapted from a prior survey assessing the presentation of quantitative information [13] and refined iteratively, incorporating feedback from 5 study co-investigators, each of whom participated in a 1-h cognitive interview. The survey was then piloted by 7 MD and 1 PhD-trained colleagues to determine the median time for completion (11 min) before deploying to hospital leaders.

Table 1 Six items included in each survey vignette

Each survey included four hypothetical performance assessments (for four hypothetical hospitals)—two using HS-TM and two using regression. Each survey included hypothetical hospitals across a range of 30-day mortality (one above-average, one high-average, one average, and one below-average risk-adjusted mortality). The order of performance methods (HS-TM versus regression) and mortality category (above-average, high-average, average, and below-average) were randomized. For each vignette, participants received a description of the performance assessment method, a table showing the characteristics of hospitalizations included in the performance assessment, and a figure displaying outcomes of their hospital relative to their comparator hospitals. Participants were asked to assess the hospital’s performance relative to their comparators (above-average, average (including high-average), and below average), then rate their confidence in interpretation and trust in the performance assessment on a Likert scale. At the end of the survey, participants were asked about their overall impressions of HS-TM versus regression-based performance assessment methods.

Survey results are presented using standard descriptive statistics and Chi-square tests to compare results of HS-TM vs regression-based vignettes. Secondly, a series of logistic regression models were fit to measure the association between the performance assessment approach (HS-TM vs regression) and correct interpretation. In the serial models, we additionally adjusted for the mortality category, the respondent’s self-rated statistical knowledge, and the respondent’s confidence in their response. The models included a random intercept for the respondent to control for the repeated measures.

Semi-structured interviews

At the end of the survey, Chiefs of Medicine were asked to provide their contact information if amendable to participating in a confidential follow-up semi-structured interview. We invited only Chiefs of Medicine so that we would have just one interview participant per hospital. After completing written informed consent, Chiefs who expressed interest were invited for a 60-min semi-structured interview via video conference. The full interview guide is provided in Additional Appendix 2. During the interview, the participants were asked about two vignettes from their survey (one of each method), using an interview guide to elicit perceptions of credibility and usability. Additionally, we asked about interpretability, suggested improvements, and general impressions about performance evaluation. The interview guide was piloted with two physician colleagues and refined to improve clarity prior to use in the study.

Nine Chiefs of Medicine were interviewed via video conference. Interviews were audio-recorded, professionally transcribed, and redacted of identifying information. The sample size was guided by the criteria of “information power” [14]. We required fewer participants because the goal of the interviews was narrow; the participants were highly selected (limited to key leaders directly involved in evaluating hospital quality) [15, 16]; the feedback was anticipated to relate to known methodological limitations [3, 5,6,7]; and the interviews had high quality dialogue since they were conducted by an experienced, PhD-trained qualitative analyst (LT) with at least one quantitative expert (BMM and/or HCP) present to answer technical questions and probe responses as needed.

Interview transcripts were analyzed by LT, BMM, and HCP using content analysis [17]. We used preliminary codes (interpretability, credibility, usability, suggested improvements) based on the interview guide and allowed additional subcodes to emerge from the data. Transcripts were coded independently, then reconciled through discussion. Data were manually entered into separate code reports, which were reviewed and discussed as a team to finalize subcodes, summarize the key findings, and identify representative quotes.


Eighty-four VA hospital leaders completed at least one survey vignette (a response rate of approximately 21.5%), including 70 (83.3%) who completed all four vignettes and provided demographic data. Respondents included 17 (20.2%) Chiefs of Staff, 31 (36.9%) Chiefs of Medicine, and 36 (42.9%) Chiefs of Hospital Medicine. Descriptive characteristics of the respondents are presented in eTable 2. Respondents were 65.7% male. 52.9% were in their current role for 0–4 years, while 20.0% had been in their current role for \(\ge\) 10 years. Length of time practicing medicine varied: 5.8% (0–9 years), 26.1% (10–19 years), 31.9% (20–29 years), and 36.2% (30 years or more). The majority (77.1%) rated their statistical knowledge as “Good” or “Fair”.

Respondents completed 148 vignettes using HS-TM, in which the hypothetical hospital under evaluation had below-average mortality (37, 25%), average mortality (39, 26.4%), high-average mortality (36, 24.3%), and above-average mortality (36, 24.3%). Respondents completed 147 vignettes using regression, in which the hypothetical hospital under evaluation had below-average mortality (37, 25.2%), average mortality (38, 25.9%), high-average mortality (39, 26.5%), and below-average mortality (33, 22.4%).


Respondents interpreted 81.8% of HS-TM vignettes vs. 56.5% of regression vignettes correctly, p < 0.001 (Fig. 1). Survey respondents determined the hospital’s performance correctly more often when the hospital’s mortality was above or below average (compared to being no different from average). For example, among HS-TM vignettes, respondents correctly interpreted 97.3% (36/37) of below-average mortality and 94.4% (34/36) of above-average mortality vignettes, compared to 74.4% (29/39) of average and 58.3% (21/36) of high-average HS-TM mortality vignettes (eTable 3, eFigure 1). For regression vignettes, respondents correctly interpreted 89.2% (33/37) of below-average mortality and 87.9% (29/33) of above-average mortality vignettes, compared to only 31.6% (12/38) of average morality and 23.1% (9/39) of high-average mortality vignettes (eTable 3, eFigure 1). After adjusting for hospital mortality, the association of HS-TM with correct interpretation was even stronger (Table 2) and persisted after additionally adjusting for the respondent’s self-rated statistical knowledge and confidence in their interpretation (Table 2). Neither self-rated statistical knowledge nor confidence were associated with correct interpretation (Table 2). Overall, these analyses show that HS-TM-based performance assessments were more interpretable to the survey respondents than the regression-based assessments.

Fig. 1
figure 1

Accuracy, Confidence and Trust in the HS-TM-based vs Regression-Based Performance Assessments. Accuracy indicates whether the participant correctly classified the hospital as lower than average, average, or higher than average mortality. Confidence indicates how confident they were in their rating: Highly Confident, Moderately Confident, Slightly Confident, or Not at all Confident. Confidence is then dichotomized into Not Confident (Not at all Confident, Slightly Confident) or Confident (Moderately Confident, Highly Confident) and the p-value is the significance level of the difference in the percent Confident for HS-TM versus regression. Trust indicates their level of agreement with the following statement: I trust that the results of this performance report accurately reflect the mortality at my hospital relative to other hospitals. (Strongly Agree, Agree, Somewhat Agree, Neither Agree nor Disagree, Somewhat Disagree, Disagree, Strongly Disagree). The p-value indicates the significance level of the difference in the percent that trust the rating (Strongly Agree, Agree, or Somewhat Agree) using HS-TM versus regression

Table 2 Serial logistic regression models assessing the association between approach (HS-TM vs regression) and correct interpretation of performance assessment vignettes


Survey respondents reported that they “trust that the results of the performance report accurately reflected the mortality at [their] hospital relative to other hospitals” in 70.9% of HS-TM vignettes versus 59.2% of regression vignettes, p = 0.03 for the difference (Fig. 1). Results stratified by mortality category are shown in eTable 3 and eFigure 2.

While survey respondents trusted most performance assessments (70.9% of HS-TM and 59.2% of regression vignettes), the interview participants voiced many concerns about the credibility of performance assessments—most which were pertinent to both HS-TM and regression. The concerns, presented in Table 3, related to the following domains: (1) the inability to fully or correctly capture case-mix and illness severity from the electronic health record; (2) the inability to account for special hospital programs or referral centers (e.g., an organ transplant center where many patients with end-stage disease may be evaluated but not ultimately eligible for transplantation); (3) the comparison to hospitals elsewhere in the country, as opposed to VA or non-VA hospitals in the same geographic region; (4) the use of mortality as a measure of quality; (5) lack of a criterion or reference standard for acceptable or good performance; (6) small sample sizes and/or low event rates, such that assessments are under-powered and unstable; (7) the comparison to dissimilar hospitals (e.g., comparison of an urban referral hospital to a smaller rural hospital); (8) the generation of hospital rankings, particularly when hospitals are tightly clustered such that differences in rank do not necessarily reflect differences in outcomes; (9) the lack of transparency of performance assessments. Concerns 1–5 were equally relevant to both approaches. Concerns about small sample size were more pertinent to HS-TM, while concerns about ranking [8], lack of transparency, and comparison of dissimilar hospitals were more pertinent to regression. A fuller summary of interview responses related to fairness and credibility is presented in Additional Appendix 3.

Table 3 Concerns about the credibility of hospital performance assessments identified through thematic analysis of interview transcripts


Survey respondent agreed with the statement “Based on this performance report… I would convene a committee to determine where change is necessary to improve mortality at my hospital”, for 88.9% of HS-TM vignettes with above-average mortality, compared to 78.7% regression vignettes with above-average mortality (p = 0.25 for difference)—suggesting similar actionability of HS-TM vs regression-based assessments.

Survey participants described two primary uses of performance assessments: (1) to trigger a deeper dive and (2) to motivate behavior change (Table 4, Additional Appendix 4). Interview participants reported that they would use both HS-TM-based and regression-based performance reports similarly, but several expressed that HS-TM may be more helpful for identifying a true problem, while the ranking generated by regression-based performance assessments may be more helpful for motivating behavior change (eTable 4).

Table 4 Usability of Performance Assessments

A common sentiment among interview participants was that “in and of itself, the data doesn't say you're good, bad, or indifferent”. Rather, above-average mortality was consistently viewed as a trigger for further evaluation, described my participants as “a flag or an indicator for something that that we might need to respond to”, a “trigger for a deeper dive”, a “red flag”, or a “check oil light”. Most interview participants felt the deeper dive should occur to confirm and understand the potential issues raised in performance assessment before sending it to clinical staff. As a first step, interview participants would explore whether deaths were occurring on a specific service (e.g., medical vs. surgical) or subgroups of patients (e.g., ICU vs non-ICU), or even complete chart reviews of all deaths. They would consider unique circumstances related to their patient population or any specific care-related practices. In short, they would evaluate who died, why they died, and how they died to assess whether greater-than-average mortality was a one-time occurrence, a reflection of natural variation over time, or a marker of a broader problem. All interview participants felt it was inappropriate to use performance assessments for punishment or reward.

Besides serving as a trigger for a deeper dive, multiple interview participants reported that greater-than-average mortality can serve as strong motivation to improve processes and help one “get on it with a sense of urgency” and “impress upon certain stakeholders that this is indeed something that we need to devote some energy to… particularly if we find that there is a certain service line that seems to be over-represented in our mortality”. Finally, participants also noted that assessments indicating a mortality at or below the mean should not trigger complacency. Rather, hospitals should always look for opportunities to improve, although there is less urgency to do so when performance assessments suggest average or below average mortality.

Suggestions for improvement

Suggested improvements are presented in Additional Appendix 5. The most common suggestions were to: (1) use criterion standards rather than norm-reference (particularly since non-VA hospitals are not used to define the norm-reference) and (2) limit comparisons to similar hospitals, as defined by facility characteristics or geographic location.

Overall utility

When asked which method would be “more helpful for understanding mortality at your hospital relative to other hospitals”, most (72.5%, 50/69) survey respondents preferred HS-TM. Likewise, when asked which method would be “more helpful for driving change to improve care at your hospital”, most preferred HS-TM (78.3%, 54/69). Regarding distinctive features of these methods, 88.4% responded it was more important to be compared to hospitals treating similar patients (as in HS-TM) than to have all hospitalizations included in the performance assessment (as in regression). During semi-structured interviews, several participants expressed greater trust in HS-TM assessments, but participants nonetheless felt that—regardless of the method—they would primarily use performance assessments as a screen for doing a deeper dive. A summary of comments comparing the utility of HS-TM to regression is presented in eTable 4.


Hospital performance assessment is a key tool for monitoring the quality of hospital care and incentivizing performance improvement. However, while the breadth and complexity performance assessment has grown over the past few decades, there has been little assessment of the interpretability, credibility, or usability of performance assessments among the end-users charged with maintaining and improving the quality of hospital care [6]. Indeed, a National Academy of Medicine expert panel called for improving the robustness of performance assessment systems, including settings thresholds for interpretability such that assessments are understandable and usable by those with limited statistical knowledge and time [6].

We found that hospital performance assessments developed using hospital-specific template matching were more interpretable and more credible to VA hospital leaders than performance assessments developed using regression. The greater interpretability of hospital-specific template matching was robust to sensitivity analyses. Across a series of models including adjustment for additional factors including the mortality category of the hospital under evaluation, the respondent’s self-rated statistical knowledge, and the respondent’s self-rated confidence in their interpretation, HS-TM remained associated with increased likelihood of correct interpretation.

A second finding of this study was that hospital performance assessment served two key purposes in the perspective of VA hospital leaders: a trigger for further quality investigation and a tool for motivating behavior change. Among interview participants, HS-TM was generally considered to be a more reliable trigger, while hospital rankings generated by regression were considered more helpful for motivating behavior change. As a result of these differing strengths, HS-TM could be considered as a supplement or adjunctive method rather than a replacement for standard regression-based assessments. Importantly, the Chiefs of Medicine identified many potential threats to the credibility of both methods, and universally felt that further evaluation of the accuracy of performance assessments was needed before passing along the findings to front-line clinical staff.

This study extends the findings of prior studies of HS-TM. We previously showed that HS-TM was potentially feasible for use in the diverse VA healthcare system [8]. Each hospital could be matched to a sufficient number of comparison hospitals (median 38 hospitals) to detect standardized mortality ratios greater than 2.0 [8]. Here, we show that assessments generated via HS-TM are more interpretable and credible to VA hospital leaders. Our study also builds on limited prior work assessing clinician end-user’s ability to correctly interpret performance assessments. In a prior study examining clinicians’ interpretation of central line-associated bloodstream infection (CLABSI) quality data, clinicians answered questions testing increasingly difficult domains of interpretability: basic numeracy, risk-adjustment numeracy, and finally risk-adjustment interpretation [18]. Clinicians answered 82% of basic numeracy questions correctly, versus 70% of risk-adjustment numeracy and only 43% of risk-adjustment interpretation questions, underscoring the limited interpretability of risk-adjusted performance assessment among end-users [18]. Also concerning, respondents who accurately interpreted the data were more likely to view it as unreliable [19]. Our finding that HS-TM (which uses matching rather than regression adjustment to account for case-mix differences) was more interpretable than regression is consistent with this prior study showing limited interpretability of risk-adjusted data. However, reassuringly, HS-TM was not only associated with greater interpretability, but also with greater credibility.

Finally, our study is consistent with the broader literature on quantitative data interpretation. End-users have better comprehension and make better decisions when information is presented in a way that is easier to process and understand [20]. And, while the simplicity of data presentation is particularly important for individuals with low numeracy, even high numeracy individuals perform better when presented simpler information. Indeed, our study showed no association between self-rated statistical knowledge and correct interpretation of the performance assessment vignettes.

Our study should be interpreted in the context of several limitations. First, our survey response rate was approximately 21.5%, and it is possible that survey respondents may not generalize to VA leaders at large. However, our survey sample population was highly selected and relatively homogenous (limited to Chiefs of Staff, Chiefs of Medicine, and Chiefs of Hospital Medicine), which may mitigate the risk for bias due to the lower response rate. Second, we interviewed leaders within the VA healthcare system only, so it is unclear whether hospital leaders in other healthcare systems or countries would have similar reactions to HS-TM vs regression. However, the VA is a large and diverse system, with both small rural hospitals and tertiary referral centers [9]; interview participants represented a range of hospital types. One key benefit is the ability to personalize the assessment to diverse hospitals. A second key benefit is the improved interpretability. In a healthcare system or country where similar patient populations are treated across all hospitals, the benefits of a personalized assessment may be less important. However, such homogeneity is rare. Third, survey respondents were provided hypothetical vignettes, and it is possible that impressions of credibility may differ if HS-TM were used in practice. We decided to use hypothetical vignettes to randomize the hospital mortality category and differentiate the impact of the method vs mortality category on impressions of credibility, which would not have been possible using each respondent’s own hospital data. Fourth, we assessed only one quality outcome, mortality. Hospital quality is a complex and multi-faceted construct [21] which cannot be summarized by hospital mortality alone, or by any single metric. However, mortality is a key performance indicator, and the methods of HS-TM and regression can be applied to other outcomes such that the findings of improved interpretability and credibility are not necessarily specific to mortality only.


In this multiple methods study of VA hospital leaders, HS-TM-based performance assessments were more interpretable and more credible than regression-based assessments. However, both types of assessments had several threats to credibility and would be used for similar purposes by hospital leaders. The differing interpretability and credibility across performance assessment methods underscores the importance of evaluating, understanding, and optimizing interpretability and credibility of performance assessments among end-users.

Availability of data and materials

The datasets generated and/or analysed during the current study are not publicly available due the specifications of our IRB approval; however, annonomyzed data are available from the corresponding author on reasonable request.


  1. Escobar GJ, Greene JD, Scheirer P, Gardner MN, Draper D, Kipnis P. Risk-adjusting hospital inpatient mortality using automated inpatient, outpatient, and laboratory databases. Med Care. 2008;46(3):232–9.

    Article  PubMed  Google Scholar 

  2. Render ML, Kim HM, Welsh DE, et al. Automated intensive care unit risk adjustment: results from a National Veterans Affairs study. Crit Care Med. 2003;31(6):1638–46.

    Article  PubMed  Google Scholar 

  3. Silber JH, Rosenbaum PR, Ross RN, et al. Template matching for auditing hospital cost and quality. Health Serv Res. 2014;49:1446–74.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Silber JH, Rosenbaum PR, Ross RN, et al. A hospital-specific template for benchmarking its cost and quality. Health Serv Res. 2014;49:1475–97.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Lezzoni LI. The risks of risk adjustment. JAMA. 1997;278:1600–7.

    Article  Google Scholar 

  6. Austin JM, McGlynn EA, Pronovost PJ. Fostering transparency in outcomes, quality, safety, and costs. JAMA. 2016;316:1661–2.

    Article  PubMed  Google Scholar 

  7. Pronovost PJ, Austin JM, Cassel CK, et al. Fostering Transparency in Outcomes, Quality, Safety, and Costs: A Vital Direction for Health and Health Care | National Academy of Medicine. 2016;

  8. Vincent BM, Molling D, Escobar GJ, et al. Hospital-specific template matching for benchmarking performance in a diverse multihospital system. Med Care. 2021;59(12):1090–8.

    Article  PubMed  Google Scholar 

  9. Molling D, Vincent BM, Wiitala WL, et al. Developing a template matching algorithm for benchmarking hospital performance in a diverse, integrated healthcare system. Medicine (Baltimore). 2020;99(24):e20385.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Fihn SD, Francis J, Clancy C, et al. Insights from advanced analytics at the Veterans Health Administration. Health Aff (Millwood). 2014;33:1203–11.

    Article  Google Scholar 

  11. Render ML, Deddens J, Freyberg R, et al. Veterans Affairs intensive care unit risk adjustment model: validation, updating, recalibration. Crit Care Med. 2008;36(4):1031–42.

    Article  PubMed  Google Scholar 

  12. Prescott HC, Kadel RP, Eyman JR, et al. Risk-Adjusting Mortality in the Nationwide Veterans Affairs Healthcare System. J Gen Intern Med. Jan 13 2022;doi:

  13. Hawley ST, Zikmund-Fisher B, Ubel P, Jancovic A, Lucas T, Fagerlin A. The impact of the format of graphical presentation on health-related knowledge and treatment choices. Patient Educ Couns. 2008;73(3):448–55.

    Article  PubMed  Google Scholar 

  14. Malterud K, Siersma VD, Guassora AD. Sample size in qualitative interview studies: guided by information power. Qual Health Res. 2016;26(13):1753–60.

    Article  PubMed  Google Scholar 

  15. Marshall MN. Sampling for qualitative research. Fam Pract. 1996;13(6):522–5.

    Article  CAS  PubMed  Google Scholar 

  16. Hamilton AB, Finley EP. Qualitative methods in implementation research: an introduction. Psychiatry Res. 2019;280: 112516.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Hsieh HF, Shannon SE. Three approaches to qualitative content analysis. Qual Health Res. 2005;15(9):1277–88.

    Article  PubMed  Google Scholar 

  18. Govindan S, Chopra V, Iwashyna TJ. Do Clinicians Understand Quality Metric Data? An Evaluation in a Twitter-Derived Sample. J Hosp Med. 2017;12(1):18–22.

    Article  PubMed  Google Scholar 

  19. Govindan S, Wallace B, Iwashyna TJ, Chopra V. Do Experts Understand Performance Measures? A Mixed-Methods Study of Infection Preventionists. Infect Control Hosp Epidemiol. 2018;39(1):71–6.

    Article  PubMed  Google Scholar 

  20. Peters E, Klein W, Kaufman A, Meilleur L, Dixon A. More Is Not Always Better: Intuitions About Effective Public Policy Can Lead to Unintended Consequences. Soc Issues Policy Rev. 2013;7(1):114–48.

    Article  PubMed  Google Scholar 

  21. Carini E, Gabutti I, Frisicale EM, et al. Assessing hospital performance indicators. what dimensions? evidence from an umbrella review. BMC Health Serv Res. 2020;20(1):1038.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


We would like to thank the following individuals: Theodore Iwashyna, Thomas Valley, Andrew Admon, Elizabeth Viglianti, Michael Sjoding, and John Donnelly for piloting the survey; Elizabeth Viglianti and Max Wayne for piloting our semi-structured interview guide; Michael Shwartz, Marthe Moseley (former director of the VA Inpatient Evaluation Center), and Joseph Francis for providing insight throughout the study; Mark Hausman, Sanjay Saint, and Melver Anderson for emailing the initial survey invitation to Chiefs of Staff, Medicine, and Hospital Medicine, respectively.


This work was supported by VA IIR 17–2019 (HCP) from the United States (U.S.) Department of Veterans Affairs, Health Services Research and Development Service. This manuscript does not represent the views of the Department of Veterans Affairs or the US government.

Author information

Authors and Affiliations



HCP, BMM, TPH, JBS, AKR, and AMR contributed to design and conception of the study. HCP, LT, and CKH contributed to data acquisition. All authors interpreted the data and contributed to intellectual content development. HCP, BMM, and LT drafted the manuscript text. BMM and HCP prepared the tables and figures. All authors reviewed the manuscript, provided critical feedback, and provided approval of the manuscript.

Corresponding author

Correspondence to Hallie C. Prescott.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the Ann Arbor VA IRB. Written informed consent was obtained from interview participants; there was a waiver of written documentation of informed consent for survey respondents.

Consent for publication


Competing interests

The authors report no financial conflicts of interest. HCP and AKR serve on the methods advisory committee to the VA’s inpatient evaluation center.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McGrath, B.M., Takamine, L., Hogan, C.K. et al. Interpretability, credibility, and usability of hospital-specific template matching versus regression-based hospital performance assessments; a multiple methods study. BMC Health Serv Res 22, 739 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: