Study design and data
We conducted a retrospective cohort study of patients with sepsis admitted to non-federal general acute care hospitals in the Commonwealth of Pennsylvania in the United States during calendar years 2012 and 2013. First, we developed a de novo risk-adjustment model using 2012 administrative data. Next, we examined the construct validity of our model by examining the stability of hospital rankings over time (comparing the 2012 administrative model to the 2013 administrative model) and after addition of clinical laboratory variables (comparing the 2012 administrative model to a 2012 clinical model with both administrative and laboratory data). In this context, a valid administrative model would produce relatively stable performance estimates over time (i.e. with few exceptions, hospitals that are high performers one year would be high performers the next year). A valid administrative model would also yield performance estimates that are similar to those estimated from a more granular clinical model which better accounts for variation in risk.
We used the Pennsylvania Health Care Cost Containment Council (PHC4) database. PHC4 collects administrative data on all hospital admissions in Pennsylvania and makes them available for research, including both demographic information and International Classification of Diseases—version 9.0—Clinical Modification (ICD-9-CM) diagnosis and procedure codes. Unlike most administrative claims-based data sets, these data also contain a selection of laboratory values obtained on the day of admission, enabling us to create a clinical model in addition to the standard administrative model [8]. We augmented these data with the Pennsylvania Department of Health vital status records to capture post-discharge mortality.
Patients and hospitals
All encounters for patients meeting the “Angus” definition of sepsis—either an explicit ICD-9-CM code for sepsis or co-documentation of ICD-9-CM codes for an infection and an organ dysfunction—were eligible for the study [9, 10]. We chose the Angus definition because it is the broadest administrative definition of sepsis and has undergone rigorous clinical validation (10). We excluded admissions to non-short term and non-general acute care hospitals as these hospitals were not the focus of our study. We also excluded admissions less than 20 years of age, admissions for which gender or age were missing, and admissions at hospitals that were not continuously open and admitting patients for the duration of the study period. To maintain independence of observations, if a single patient had multiple encounters within a study year, then we randomly included a single encounter per year.
Base model for risk-adjusted mortality
We first created a base logistic regression model for risk-adjusted mortality using exclusively risk-adjustment variables that are available in administrative data. The primary outcome variable for this model was all-cause mortality within 30 days of the admission date, as determined using the Pennsylvania vital status records. The model was based on five categories of risk-adjustment variables hypothesized to be associated with sepsis outcomes based on prior work [9, 11, 12]: demographics, admission source, comorbidities, organ failures present on admission, and infection source.
Demographic variables were obtained directly from the claims and included age and gender. Gender was modeled as an indicator covariate, and age was modeled as a linear spline by age quintile. Admission source was obtained directly from the claims and modelled as an indicator covariate defined as admission through the emergency department versus admission from another source. Comorbidities were defined using ICD-9-CM codes in the manner of Elixhauser [13] and modelled as indicator covariates. Organ failures present on admission were defined in the manner of Elias [12] and modelled as indicator covariates. For comorbidities and organ failures present on admission, we excluded from the model any designation that had less than a 1% prevalence in our sample population.
Infection source was modeled as hierarchical infection categories in which we assigned each patient an infectious source category identified using ICD-9-CM diagnosis codes (see Additional file 1: Table S1). We created the categories from the Angus sepsis definition [9] which we further divided into 12 groups: septicemia, bacteremia, fungal infection, peritoneal infection, heart infection, upper respiratory infection, lung infection, central nervous system infection, gastrointestinal infection, genitourinary infection, skin infection, and other infection source. For patients with multiple ICD-9-CM codes indicating multiple infection sources, we assigned them the single infection source category associated with the highest unadjusted mortality. In ranking the infectious sources based on their unadjusted mortality, we used 2011 data in order to avoid model overfitting. The final variable was modelled as a series of mutually exclusive indicator covariates with upper respiratory infection as the reference category.
Augmented mortality model including laboratory variables
We next created an augmented logistic regression model for risk-adjusted mortality using all of the variables from the base model plus selected laboratory values obtained on the day of admission. The list of available laboratory values including their units, frequency, averages, and ranges is available in Additional file 1: Table S2 and S3. Values outside the plausible range, such as negative data points for non-calculated laboratory values, were recoded as missing.
We used a multi-step process to determine not only which lab variables to include in our model but also their functional forms. First, we used locally weighted scatterplot smoothing to visually assess inflection points in the relationship between each numeric laboratory value and 30-day mortality [14]. Based on visual inspection of these plots and standard reference values from our hospital’s laboratory, we categorized each variable into between two and five categories, with one category representing a normal result and the other categories representing non-normal extremes: very low, low, high, and very high. For arterial pH and arterial pCO2, which are interdependent, we performed an additional step in which we created a single combined variable for which the categories were permutations of the non-normal categories defined for pH and pCO2, respectively, as previously performed [15].
For each patient, we assigned an appropriate category for every laboratory test based on the reported result. If the patient had more than one result available for a given laboratory test, we selected the value that would be included in the category associated with a higher mortality rate. When a laboratory test result was missing, we assumed it to fall into the normal range and assigned the normal category, as is standard in physiological risk-adjustment models [15].
Next, we used Bayesian information criterion (BIC)-based stepwise logistic regression to identify the laboratory value covariates to be included in the model. This regression included all the covariates in the claims-based model. Laboratory values that did not contribute to a maximal BIC were excluded from the final model. Each laboratory value’s categories were assessed in the BIC regression as a group and ultimately either included in or excluded from the model as a group, so as not to partially remove categories for a given laboratory value. Laboratory values deemed contributory by the BIC regression entered the final model as categorical variables with the normal category as the reference group.
Risk-standardized mortality rates
Based on these models we use mixed-effects logistic regression to create risk-standardized hospital-specific 30-day mortality rates. These rates account for variation in both risk and reliability across hospitals: they account for variation in risk in that they control for the different baseline characteristics of sepsis patients across hospitals; they account for reliability in that the rates for small hospitals, which are more susceptible to random variation than rates for large hospitals, are adjusted toward the state-wide mean [16].
We calculated hospital-specific risk adjusted mortality rates by dividing each hospital’s predicted mortality (using the base model plus a hospital-specific random effect) by each hospital’s expected mortality (using the base model without a hospital-specific random effect), generating a risk-standardized mortality ratio. Multiplying the risk-standardized mortality ratio by the mean 30-day mortality of the state-wide sample yielded a hospital-specific risk-standardized mortality rate.
We performed this process separately for 2012 and 2013 without laboratory data and then again for 2012 with laboratory data, resulting in three sets of hospital-specific mortality rates: 2012 administrative rates, 2013 administrative rates, and 2012 clinical rates.
Analysis
For all models we assessed discrimination, using the C-statistic, and calibration, using the slope and intercept of regression lines fit to the calibration plots. We assessed the validity of our administrative model by examining the consistency of hospital rankings over time and with the addition of laboratory data. As noted above, we assumed that a valid model would yield hospital rankings that did not markedly change between years or after the addition of laboratory values. We generated scatter plots to compare the hospital-specific risk-standardized mortality rates between the 2012 and 2013 administrative rates; and between the 2012 administrative and clinical rates, calculating a coefficient of determination. Additionally, for each of the three sets of hospital-specific mortality rates, we calculated performance quintiles, with the outer quintiles representing the highest and lowest performing 20% of hospitals, respectively. We compared the composition of the quintiles between the 2012 and 2013 administrative rates and then between the 2012 administrative and clinical rates. We considered hospital movement of one quintile or less between comparison groups to be a marker of stability.
Data management and analysis was performed using Stata version 14.0 (StataCorp, College Station, Texas). All aspects of this work were reviewed and approved by the University of Pittsburgh institutional review board.