Population and data sources
Data for this study came from the electronic medical record of Grady Health System (GHS) in Atlanta, Georgia [15, 16]. Cases and controls were identified through a master list of patients seen in the Emergency Department (ED) between 2011 and 2013. This list included a patient’s unique Medical Record Number (MRN) and their disposition from the ED (admitted, discharged, or deceased).
For the purpose of this study, high-utilizer patients (HUPs) were defined as those with three or more admissions to inpatient services in a single calendar year, representing the 95th percentile of most inpatient admissions. Between 2011 and 2013, this group included over 3000 individual patients. A total of 250 cases were randomly selected from the initial cohort. Of these, three patients with end-stage renal disease receiving dialysis at GHS were excluded from the sample, as their admissions were attributable to policies requiring them to receive their dialysis sessions in the inpatient setting. This resulted in a final sample of 247 cases.
Controls were defined as any patient with one or two inpatient admissions in a single calendar year between 2011 and 2013. This control group was selected to allow focus on the differences between patients who require hospitalization but are not high-utilizers, a comparison that would be helpful to both clinicians and policy makers. Control patients were selected in a 1:1 ratio from the original master list of patients, and were matched with cases for age, sex, and year of high use. If multiple control patients for a given year matched a case, a random number generator was used to select the control.
IRB approval was obtained from Emory University and the Grady Hospital Research Oversight Committee, and all analyses and study procedures complied with HIPAA regulations.
Data collection
Data were collected from the GHS Electronic Medical Record (EMR) (Epic Systems Corporation, Verona, WI) via retrospective chart reviews completed between October 2014 and February 2015. Data abstraction was done in accordance with the study protocol (Additional file 1). Patient information was collected and managed using RedCap electronic data capture tools hosted at Emory University [17]. All data collected was restricted to the year of high-utilization for cases or the matched year for controls unless otherwise noted.
Demographic data included sex, birthdate, race/ethnicity, street address, county of residence, zip code, and whether or not the patient was deceased. Patient age was calculated as of the index admission during the year of interest. For analytical purposes, race was converted to a dichotomous covariate (Black and non-Black) for the logistic regression analysis. Patient income was defined by grouping zip codes and classifying household income using the U.S. Census Bureau’s 2009–2013 American Community Survey 5-Year Estimates [18]. Patients were defined as deceased if their death was documented in the medical chart at any point between 2011 and when chart abstraction occurred, or if they were discharged to home or inpatient hospice and had no further follow-up at GHS. Death outside of these two situations was not captured.
Variables abstracted included medical and social factors; these were selected based on literature review [4, 7–12]. Medical information included all current and past diagnoses, reasons for admission, and number of outpatient medicines prescribed. Medical diagnoses and reasons for admission were abstracted from the history and physical (H&P) and discharge summaries. Medical conditions were recorded individually, but were grouped by organ system for analysis; a full list of both diagnoses and their corresponding disease category can be found in Additional file 2. Chronic disease was defined as any cardiac, neurological, hematological, renal, or endocrine diagnosis. Multiple chronic conditions (MCCs) were defined as having 3+ of the aforementioned chronic diagnoses. The number of outpatient medications was defined as the total count of unique outpatient prescriptions.
Weight and height were abstracted from the first H&P in which they were documented. Body mass index (BMI) was calculated using the formula weight (kg)/height2 (m).
Non-clinical factors included social characteristics, “habit” variables, and insurance status and payers. Social characteristics included housing status, employment status, and current or past incarceration. These data were collected from H&Ps, administrative charts, or social worker (SW) notes. Tobacco, alcohol, and drug use were classified as “habit” variables and were collected from the same sources. Current use of tobacco, alcohol, or recreational drugs was defined as self-reported use or documentation of a positive urine drug or blood alcohol test in one of the above sources. History of use was positive if a patient had either current use or a self-reported history of use. Negative use was defined as both explicitly stated denial of use as well as no statement or documentation of use. Substance use was only considered as a non-clinical characteristic, it was not grouped under the “psychiatric” disease category. Patients were considered to be insured if they had insurance coverage at any point during the year of interest; specific payer information was collected from discharge summaries and SW notes.
Data regarding health care use included: the number of ED visits and inpatient admissions, admission and discharge dates, and discharge disposition. The number of ED visits included all ED visits, whether or not they resulted in an admission. Disposition was categorized as home, nursing home, sub-acute rehabilitation (SAR), or hospice (home or inpatient). Discharge summaries provided information regarding admission and discharge dates and disposition.
Analysis
This study included 247 cases (HUPs) and 247 controls. SAS software (SAS Version 9.2, Cary, NC) was used for all statistical analyses. Descriptive analyses were used to compare cases and controls’ demographic, social, and medical characteristics. Chi-square tests of homogeneity and independent t-tests were used to determine whether differences between cases and controls were statistically significant (p < 0.05) for categorical and continuous variables, respectively.
We used multivariable logistic regression models to identify which demographic, social, and medical factors were associated with HUP status. For this analysis, HUP status was treated as the primary outcome of interest, while the demographic, medical, and social variables were the exposures. In the main logistic regression models, age and sex were not included as cases and controls were matched on these variables; however, in sensitivity analyses, age and sex were included to determine if point estimates were affected by their exclusion. Univariate and multivariate regressions were run to determine associations between covariates and high-utilizer status. First, each covariate was entered into preliminary bivariate logistic regression models to determine the association with HUP status. Covariates were then grouped into categories: demographics, insurance payer, social history, habits, and medical history. Four logistic regression models were created that adjusted for each thematic group sequentially and then for all five covariate groups together. Alpha was set at 0.05. Odds ratios and confidence limits were used as measures of association, and r-squared values were used to determine the variance explained by each model.
A parsimonious model was created using forward stepwise selection with the fully adjusted logistic regression model. The significance level for entry into the model was set at 0.10 [19]. The Hosmer-Lemeshow Goodness of Fit Test was used to evaluate the model’s fit, and t-squared values were used to determine the total variance explained by the parsimonious model.
To determine how clinical diagnoses affected associations between covariates and HUP status we created fully adjusted logistic regression models that varied by medical history; this allowed us to compare models that included all diagnoses, any chronic diagnosis, or the diagnosis of multiple chronic conditions. In this paper we discuss the multivariate model factoring in multiple chronic conditions; the two additional models are provided in Additional file 3.
We conducted an additional sensitivity analysis that excluded deceased patients to determine whether removing those patients where death might have been imminent, a proxy for illness severity, would affect the associations between exposures and HUP status.
To investigate if HUP status, independent of other demographic, social, and medical factors, was associated with higher odds of mortality, we pooled cases and controls, and HUP status was treated as the primary exposure for mortality. For this analysis, we performed an additional logistic regression model with the same methods described above. Age and sex were included in these models.