### Setting

For the purpose of this model analysis the DHD group gave its formal permission to use an anonymised LMR dataset from 89 Dutch hospitals covering the years 2005 to 2009. Nineteen hospitals were excluded from the calculation because the quality of their patient registration data was insufficient.

This study excluded any experimental research, either on human or animal subjects.

### The HSMR model used in this study

We performed HSMR calculations using two models, shown as model 1 and model 2.

For *model 1*, we used the Dr Foster model [1], applied in the Netherlands in 2010. This Dutch model included clinical admissions only, so no day cases. Furthermore only *in-hospital* deaths were counted. There are differences between the Dutch and the UK model. The Dutch model used 50 Clinical Classifications Software (CCS) groups based on an ICD-9 coding, whereas the UK used 56 CCS groups based on ICD-10 coding, of which 42 groups were the same as in the Dutch model. Furthermore, the UK model adjusted for palliative care and for the number of previous emergency admissions within one year. The Dutch model did not adjust for either of these.

### HSMR in brief

The HSMR ('hospital standardised mortality ratio') is the ratio between the observed number of in-hospital deaths and the predicted number of deaths, determined by comparing the patient casemix with the national average. The outcome is standardised around 100. A value above 100 indicates higher mortality than average, below 100, lower mortality. If the calculation is applied to one of the 50 diagnostic groups, then we speak of the SMR ('standardised mortality ratio').

The Dutch HSMR model of 2010 adjusts for the following **casemix properties**: year of discharge, sex, age at admission, admission type, comorbidity (charlsonindex), social deprivation, month of admission, source of referral, diagnostic group, and, in part, for the casemix on the primary diagnostic level.

Each **diagnostic group** is composed of a number of underlying diseases: ICD-9 codes (International Classification of Diseases Ninth edition) as determined by the Clinical Classification Software (CCS). This tool clusters various ICD-9 diagnoses into a manageable number of meaningful categories [12], indicated as 'CCS diagnostic groups'. The selected CCS diagnostic groups have a relatively high mortality and together cover over 80% of the total number of hospital deaths.

For *model 2*, we took model 1 and added an adjustment for the frequency of readmission, as described by Van den Bosch WF, et al [10]. The authors demonstrated that frequently admitting patients was associated with lower mortality ratios *per admission*, for which an additional correction would be needed. In their publication the patients were grouped into 'patient view' classes P(m). We have also applied this to our study. Here the admission frequency m was equal to the number of times a patient was admitted to the same hospital during the five-year study period.

For example, a patient who was admitted ten times during the five-year period to hospital X, to be treated for one or more diseases, possibly with different diagnoses, was part of the patient view class P(10) and not part of any other patient view class. This patient contributed ten admissions to P(10) which were all accounted for in the regression calculation.

We were able to retrieve the number of times that a patient was admitted through the unique patient identification number that all of the 70 hospitals included were using. We have grouped the patient view classes into eight categories: m = 1, m = 2, m = 3, m = 4, m = 5-6, m = 7-9, m = 10-20 and m > 20. This is in order to limit the number of categories for the regression calculation and to avoid categories becoming too small.

### Agreement, or otherwise of the two models

We have calculated to what extent both models did agree, or did not agree, as follows:

1. *Relative change*. To what extent did SMR_{model 1} differ from SMR_{model 2} per CCS diagnostic group, per hospital? Using the formula SMR_{delta} = (SMR_{model2}/SMR_{model1} -1)*100%, we calculated the shifts for all of the 3500 SMRs (50 diagnostic groups times, 70 hospitals), occurring when changing from model 1 to model 2. We represented these in a frequency distribution. In a similar way we have calculated and represented the 70 shifts that occurred for the HSMRs of the 70 hospitals. We also compared all pairs of death predictions of model 1 and model 2 per admission by calculating regressions coefficients (R^{2}) on the SMR level and the HSMR level. These metrics were used as a measure of statistical 'distance' between both models.

2. *Significance scores* of SMRs and HSMRs. We determined per SMR how many hospitals with a significantly high SMR score according to model 1 turned out not to be significantly high according to model 2. And vice versa, in other words: how often a significantly high SMR score in model 2 turned out not to be a significantly high score in model 1. In a similar way we calculated these differences on the hospital level for HSMRs.

### Quality metrics of the two models

We have calculated and compared the quality metrics of the models with respect to:

- 1.
*Discrimination*, expressed in 'c-statistics' on the SMR level and on the HSMR level. This statistical measure indicates how well a regression model is able to predict mortality. Each predicted outcome per admission is compared to the observed outcome: died or survived. A c-statistic of 0.5 has no predictive value: 50% right, 50% wrong. Values above 0.75 suggest good discrimination. A value of 1 is perfect. The overall c-statistic for the Dutch HSMR (model 1) scored above 0.85.

- 2.
*Calibration*, according to Hosmer and Lemeshow. This represents a statistical test for goodness-of-fit that is frequently used in risk prediction models. The test assesses whether or not the observed event rates match expected event rates in subgroups of the model population. The test specifically identifies subgroups as the deciles of fitted risk values. Models in which expected and observed event rates in subgroups are similar, are called well-calibrated [13].

- 3.
*Explanatory power*, using the pseudo R^{2} statistic according to the 'Nagelkerke R square', in order to assess the degree to which the additional adjustment for readmission changes the unexplained variance in the data.

### Sensitivity of the HSMR model to adjustment for readmission

We have investigated to what extent the model was sensitive to variations in the length of the period under review. We took three scenarios: a period of review of one year (2009), two years (2008-2009) and five years (2005-2009). We then examined the statistical distance between model 1 and model 2, and the three model quality metrics on the HSMR level.