Social determinants of health and hospital readmissions: can the HOSPITAL risk score be improved by the inclusion of social factors?

Background The HOSPITAL Risk Score (HRS) predicts 30-day hospital readmissions and is internationally validated. Social determinants of health (SDOH) such as low socioeconomic status (SES) affect health outcomes and have been postulated to affect readmission rates. We hypothesized that adding SDOH to the HRS could improve its predictive accuracy. Methods Records of 37,105 inpatient admissions at the University of Chicago Medical Center were reviewed. HRS was calculated for each patient. Census tract-level SDOH then were combined with the HRS and the performance of the resultant “Social HRS” was compared against the HRS. Patients then were assigned to 1 of 7 typologies defined by their SDOH and a balanced dataset of 14,235 admissions was sampled from the larger dataset to avoid over-representation by any 1 sociodemographic group. Principal component analysis and multivariable linear regression then were performed to determine the effect of SDOH on the HRS. Results The c-statistic for the HRS predicting 30-day readmission was 0.74, consistent with published values. However, the addition of SDOH to the HRS did not improve the c-statistic (0.71). Patients with unfavorable SDOH (no high-school, limited English, crowded housing, disabilities, and age > 65 yrs) had significantly higher HRS (p < 0.05 for all). Overall, SDOH explained 0.2% of the HRS. Conclusion At an urban tertiary care center, the addition of census tract-level SDOH to the HRS did not improve its predictive power. Rather, the effects of SDOH are already reflected in the HRS. Supplementary Information The online version contains supplementary material available at 10.1186/s12913-020-05989-7.


Background
Hospital readmissions represent a significant expense within the US health system accounting for $17 billion of preventable healthcare costs [1]. The Hospital Readmissions Reduction Program (HRRP) penalizes hospitals with higher excess 30-day readmission rates by reducing Medicare reimbursement. Thus, many groups have published predictive models to identify which patients are at high risk for readmission.
The HOSPITAL Risk Score (HRS), introduced by Donzé et al. in 2013, is a commonly used predictive model for readmissions that has been internationally validated [2,3]. The score consists of 8 factors which spell "HOSPITAL" acronymically: Hemoglobin, discharge from an Oncology service, Sodium level at discharge, any coded Procedure during the hospital stay, Index admission Type, number of previous Admissions in the prior year, and Length of stay. However, the score's ability to predict readmission risk in the real world has been questioned as the score uses exclusively clinical factors and does not include social determinants of health (SDOH), known contributors to readmission risk [4]. Similarly, when developing metrics for expected readmission rates for individual hospitals, the Centers for Medicare and Medicaid Services (CMS) only included age and gender as non-clinical risk factors for readmission [5].
Authors have recommended adjustment of readmission algorithms to include SDOH to improve their predictive accuracy [6]. By ignoring SDOH in the determination of readmission rates, they argue, CMS may unfairly penalize hospitals that care for the most vulnerable Americans [7]. To ameliorate this, CMS recently adjusted their algorithm to compare readmission rates only between hospitals with similar proportions of low-income patients [8]. This effort resulted in fewer safety-net hospitals being penalized. While a step in the right direction, hospitals who serve patients with unfavorable SDOH still lack a tool to be able to reliably predict which patients are at highest risk for readmission.
The purpose of this study is to assess whether the addition of SDOH to the HRS improved its predictive ability. We hypothesized that the HRS may be improved by integrating more data, specifically pertaining to nonclinical SDOH intrinsic to patients or their communities.

Patient population and study design
We queried a dataset containing all adult patients admitted to our center from 2014 to 2016. As this study measured readmission back to our center, we sought to avoid confounding by excluding patients who lived outside of the city of Chicago, those who were discharged to any location aside from home, and those who lived in very sparsely populated areas of the city.

Data sources and measures
We calculated the HRS for each patient using data available in our electronic health record (EHR) according to the method described by Donzé [3]. Points were assigned and summed for the following components of the HRS: Hemoglobin < 12 g/dL (1 point), discharge from an Oncology service (2 points), Sodium level at discharge < 135 mEq/L (1 point), any coded Procedure during the hospital stay (1 point), urgent or emergent Index admission Type (1 point), number of previous Admissions in the prior year (0-1, 0 points, 2-5, 2 points, > 5, 5 points), and Length of stay ≥5 days (2 points). Additional variables extracted from the EHR included age, gender, race, ethnicity, laboratory values, vital signs, number of prior readmissions and emergency department (ED) visits, and comorbidities. Fifteen census tract-level SDOH variables which comprise the Social Vulnerability Index (SVI) were obtained from CDC and the census tract-level violent crime rate was obtained from the City of Chicago Data Portal. Neighborhoods then were grouped into sociodemographic clusters using classifications obtained from a published cross-sectional spatial analysis using data from the US Census Bureau [9]. This method provided an objective and evidence-based way to group neighborhoods by common SDOH. The Census block group-level Area Deprivation Index (ADI) was obtained from the University of Wisconsin Neighborhood Atlas [10] and the Census tract-level Hardship Index (HI) was obtained from the City of Chicago Data Portal.
We considered all admissions to our center within the study period as index admissions. Readmissions were defined as any additional admission to our center within 30 days of an index admission. As such, some admissions served as both an index themselves and a readmission for a prior index. The outcome of 30-day readmission was defined dichotomously as the presence or absence of one or more readmissions within 30 days of an index admission.

Statistical analysis
Continuous variables were represented as median (interquartile ranges) as determined by visualizing the variables, while categorical variables were expressed as frequencies and percentages. A Spearman rank correlation test was completed to assess for the multicollinearity of clinical and social variables (Fig. S1). Since social variables were the variables of interest for the study and they showed multicollinearity, they were grouped into components using Principal Component Analysis (PCA). Cronbach alpha tests were performed to confirm the internal consistency of each component; each component had a Cronbach alpha value above 0.57. Bartlett's test of sphericity (p < 0.001) and Kaiser-Meyer-Olkin (KMO) sampling adequacy (MSA = 0.85) were conducted to further confirm adequate sample size and correlation between the social variables. Variables included in the PCA analysis were percentages for poverty, unemployment, per capita income, disability, single parent households, minority, no vehicle, no high school diploma, age 65 years and above, limited English, crowding, multiunit living, and violent crime rate. After scaling the data, PCA with varimax rotation was performed on unsampled, randomly sampled, gender-stratified, and disease-stratified datasets. Scree plot was used to determine the optimal number of components needed to explain the total variance. A four component PCA solution containing social variables with a cutoff of PCA loading of 0.62 was found to explain 80% of the total variance. In addition, a parallel analysis was used to confirm the use of a four component PCA. Spearman correlation analysis was performed on the components to confirm their independence, and then based upon the items in each of the components, they were named the following categories: low income, no high school diploma, no vehicle and multiunit living, and age 65 years and above or disabled.
Since most patients at our center reside in the extreme poverty and suburban affluent clusters, we randomly sampled 600 patients from each of those clusters to create a balanced dataset. A multivariable linear regression was used to test for significance against the HOSPITAL score with low income, no high school diploma, no vehicle and multiunit living, age 65 years and above or disabled, congestive heart failure (CHF), valvular heart diseases, hypertension, diabetes mellitus, renal diseases, liver diseases, chronic obstructive pulmonary disease (COPD), atrial fibrillation (AF), dyslipidemia, and coronary artery disease as independent variables derived from PCA analysis. Receiver Operating Curve analysis (ROC) was used to determine the c-statistic for models with HRS and social variables and HRS without social variables. Statistical significance was defined as a p-value < 0.05 for two-tailed tests. Data were analyzed using RStudio version 3.5.1 (RStudio: Integrated Development for R, RStudio, Inc. 2015, Boston, MA). Statistical models were performed using these packages in R: psych (version 2.0.7), corrplot (version 0.84), FactoMineR (version 2.3), and ade4 (version 1.7).
Multiple sensitivity analyses were performed by repeating our analysis after replacing the SVI with the HI and again with the ADI. Variables within the HI include crowding, poverty, unemployed and age 16 and above, no high school diploma and age 25 and above, age 18 and under and 64 and above, and per capita income. A two component PCA solution containing social variables with a cutoff of PCA loading of 0.6 was found to explain 83% of the total variance. Based upon the variables in each component, they were named low income and no high school diploma. A multivariable linear regression was used to test for significance against the HOSPITAL score using low income, no high school diploma, CHF, valvular diseases, hypertension, diabetes mellitus, renal diseases, liver diseases, COPD, AF, dyslipidemia, and coronary artery disease as independent variables. ROC was used to determine the c statistic for each model.
The ADI was used to perform a multivariable linear regression and was used to test for significance against HOSPITAL score using the state ADI ranking, CHF, valvular diseases, hypertension, diabetes mellitus, renal diseases, liver diseases, COPD, AF, dyslipidemia, and coronary artery disease as independent variables. ROC was used to determine the c statistic for each model.

Discussion
In this study, we sought to determine if the predictive performance of the HRS could be improved by integrating SDOH into its structure (Social HRS). Surprisingly, we found that adding SDOH as variables did not improve the HRS' performance. Rather, it appears that patients with poor SDOH are clinically more ill and this increased illness is already captured in the HRS.
In support of this conclusion, we found that patients who had both unfavorable SDOH such as older age, disability status, low SES, without vehicles, and who are in multiunit living, and chronic diseases such as CAD, liver disease, and pulmonary disease had significantly higher HRS. These conditions have high morbidity and mortality at baseline, both of which may be exacerbated by unfavorable SDOH, leading to more frequent readmissions. However, even in these populations, SDOH only explained 0.2% of the HRS. SDOH, by definition, are independently associated with health outcomes and life   expectancy [11]. Patients with unfavorable SDOH tend to have more chronic medical conditions and present to the hospital with more advanced disease [12,13]. Thus, the clinical factors included in the HOSPITAL score, such as hemoglobin, number of admissions in the last year, and length of stay, likely already reflect the effects of SDOH. Therefore, addition of SDOH to HRS does not appear to improve its predictive power. These findings are consistent with a study by Bernheim et al. in which adjusting for SES did not affect estimated readmission rates [14]. Similarly, a study out of Ontario found no link between SES and readmission [15]. Our study builds on this prior work by demonstrating for the first time that the HRS is objectively higher in patients with poor SDOH and that addition of SDOH to the HRS is not necessary for predictive accuracy.
Notably, programs such as the Coalition project that attempted to reduce admissions among high utilizers with interventions targeting SDOH have had limited impact on readmission rates [16]. These results were obtained in the context of a universal health care system, which may have mitigated issues with access to healthcare. While there are disparities in healthcare access in the United States and within the population our institution serves, our study was specifically focused on patients who were admitted to the hospital and therefore did have access to healthcare. Further analysis could include patients without insurance or otherwise less access to healthcare.
These results do not imply that SDOH do not influence readmission rates. Multiple studies have demonstrated that SDOH such as race, socioeconomic status, and education contribute to a higher risk of readmission [17][18][19][20]. A study by Barnett et al. found that half of the difference in readmission rates between hospitals with highest and lowest rates of readmission could be explained by patient characteristics outside of the hospital's control [6]. Additionally, a metanalysis by Van Walraven et al. found that predictive models for readmission that included SDOH in their algorithms were able to identify twice as many avoidable readmissions as those that used only clinical factors [18,21]. These models have been found to be weaker when applied to patient populations with poor SDOH, which potentially makes models like the HRS less useful in safety-net hospitals [22].
The mechanisms by which SDOH influence readmissions are complex and difficult to define. For example, for the cross-section of unmarried men with low incomes, Social HRS was lower than HRS. Thus, even though having a low SES is considered an unfavorable SDOH, within this intersection, patients were less likely to be readmitted within 30 days. This may be because unmarried men are less likely to interface with doctors. The 2017 MENtion it Survey by Cleveland Clinic showed that only 61% of men go to their doctor even after developing symptoms that they describe as "unbearable," and that 83% of married women remind their husbands to attend annual checkups [23]. A qualitative study that interviewed physicians at seven hospitals with high readmissions rates found that most physicians asserted that readmissions were influenced by factors such as patient trust and willingness to participate as well as other social factors [24]. Additional patient attributes such as social support and personal resilience factors such as patient adaptability and biologic stress mechanisms also influence disease severity, which in turn influences readmission rates [25].
Our study has several limitations. First, we utilized census tract-level SDOH in this analysis. Individual-level SDOH are influenced but not entirely explained by neighborhood factors. Patient-level data may more accurately encapsulate resilience factors and lead to a different conclusion. The SVI was used in this manuscript because it is easily available at the census-tract level, well validated, and included in other communitylevel tools. Our findings were also similar when the ADI or HI were substituted for the SVI. The authors acknowledge that other factors such as legal status and environmental factors may alter the results and we believe further studies exploring these factors' potential contribution to readmission risk should be undertaken.
Additionally, participants were studied at a single tertiary care center that serves a large population of urban poor as well as patients with advanced illnesses. Patients seen at our institution who have more favorable SDOH likely traveled a longer distance to our center and may have been self-selected due to the severity of their illness. These patients may have been on a trajectory toward frequent readmissions and similarly would have a higher HRS. To address this, we sampled a balanced dataset and found similar results. However, our dataset remains bereft of patients outside of a metropolitan area and would likely not be generalizable to hospitals that serve more rural populations. This could be an area for further research.
While we have tested for multicollinearity among variables, correlation of two variables does not equate to a linear combination of the vector space and linear dependence is rarely influenced by two dimensions alone. Correlation of two variables does not provide information about the relative importance of each variable. The authors acknowledge these limitations of our models. This study is further limited by the lookback period length (30 days). While similar results were obtained when the analysis was stratified by the presence or absence of an admission in the prior 30 days, it is possible that other lookback period lengths may produce different results.
Finally, this study examined patients admitted to our center and readmitted back to our center. We were not able to determine if patients were admitted to a different center and then readmitted here, or admitted here and then readmitted elsewhere. However, we have previously found that 95% of patients discharged from our center who require readmission are readmitted back to our center with only 5% readmitted elsewhere [26]. This ratio has been stable for many years at our center, including the time of the present study.

Conclusion
The addition of SDOH does not improve the predictive accuracy of the HRS. Rather, the effects of unfavorable SDOH manifest as overall worse health which is already captured in the HRS.
Additional file 1: Fig. S1. Correlation plot of HRS components and SDOH: Components of the HRS showed minimal collinearity with SDOH. Fig. S2. ROC for patients with and without admission 30 days before index: When stratified by the (a) presence or (b) absence of a prior admission within the prior 30 days, the addition of SDOH to the HRS did not improve its performance, similar to the unstratified dataset. Fig. S3. ROC for HRS and ADI + HRS or HRS and HI + HRS: Repeating our analysis by substituting the (a) ADI or (b) HI for the SVI produced similar results to our initial analyses; the addition of measures of SDOH did not improve the predictive performance of the HRS. Table S1. PCA Component Scores, all patients. Table S2. PCA Component Scores, randomlysampled balanced dataset. Table S3. PCA Component Scores, patients with heart failure. Table S4. PCA Component Scores, patients with atrial fibrillation. Table S5. PCA Component Scores, patients with coronary artery disease. Table S6. PCA Component Scores, patients with COPD. Table S7. PCA Component Scores, patients with liver disease. Table S8. PCA Component Scores, patients with obesity. Table S9. PCA Component Scores, patients with pulmonary disease. Table S10. PCA Component Scores, patients with valvular heart disease. Table S11. PCA Component Scores, female patients. Table S12. PCA Component Scores, male patients. Table S13. Linear Regression Estimates, all patients. Table  S14. Linear Regression Estimates, randomly-sampled balanced dataset. Table S15. Linear Regression Estimates, patients with heart failure. Table  S16. Linear Regression Estimates, patients with atrial fibrillation. Table  S17. Linear Regression Estimates, patients with coronary artery disease. Table S18. Linear Regression Estimates, patients with COPD. Table S19. Linear Regression Estimates, patients with liver disease. Table S20. Linear Regression Estimates, patients with obesity. Table S21. Linear Regression Estimates, patients with pulmonary disease. Table S22. Linear Regression Estimates, patients with valvular heart disease. Table S23. Linear Regression Estimates, female patients. Table S24. Linear Regression Estimates, male patients. Table S25. PCA Component Scores, all patients. Table  S26. PCA Component Scores, randomly-sampled balanced dataset. Table  S27. PCA Component Scores, patients with heart failure. Table S28. PCA Component Scores, patients with atrial fibrillation. Table S29. PCA Component Scores, patients with coronary artery disease. Table S30. PCA Component Scores, patients with COPD. Table S31. PCA Component Scores, patients with liver diseases. Table S32. PCA Component Scores, patients with obesity. Table S33. PCA Component Scores, patients with pulmonary disease. Table S34. PCA Component Scores, patients with valvular heart disease. Table S35. PCA Component Scores, female patients. Table S36. PCA Component Scores, male patients. Table S37. Linear Regression Estimates, all patients. Table S38. PCA Component Scores, randomly-sampled balanced dataset. Table S39. PCA Component Scores, patients with heart failure. Table S40. PCA Component Scores, patients with atrial fibrillation. Table S41. PCA Component Scores, patients with coronary artery disease. Table S42. PCA Component Scores, patients with COPD. Table S43. PCA Component Scores, patients with liver disease. Table S44. PCA Component Scores, patients with obesity. Table S45. PCA Component Scores, patients with pulmonary disease. Table S46. PCA Component Scores, patients with valvular heart disease. Table S47. Linear Regression Estimates, female patients. Table S48. Linear Regression Estimates, male patients.