Data source
We used 2014 Centers for Medicare and Medicaid Services (CMS) claims for participants of the Health and Retirement Study (HRS) linked with sociodemographic data from their most recent, preceding HRS interview. In brief, the HRS is a nationally representative survey of individuals aged ≥ 51 years. The survey has been conducted biennially since 1992 with refreshment samples added every 6 years [17]. HRS survey data was linked to Medicare administrative data for age-eligible fee-for-service beneficiaries. The study protocol was approved by Oregon Health and Science University—Research Integrity Office Institutional Review Board (STUDY00017034).
Study samples
The selection of participants (N = 4,993) for the study of hospitalizations was based on enrollment in fee-for-service Medicare Part A and B in 2014, age ≥ 65 years, being NH White or NH Black, and complete information on contributors including the Chronic Condition Data Warehouse (CCW) algorithms. At least three years of enrollment in the Medicare fee-for-service program was required to identify the conditions based on the CCW algorithms [18, 19]. From this sample, a subset (N = 1,120) of the study sample with at least one hospitalization was defined for the study of potentially avoidable hospitalizations. The details of the selection of study participants are available in Supplementary Figure S1.
Variables
Chronic conditions up to the time of HRS interview were the main exposure variables: hypertension, hyperlipidemia, anemia, rheumatoid arthritis/osteoarthritis, ischemic heart disease, heart failure, chronic kidney disease, diabetes, depression, chronic obstructive pulmonary disease (COPD), fibromyalgia/chronic pain/fatigue, atrial fibrillation, acquired hypothyroidism, Alzheimer’s disease and related disorders/senile dementia (ADRD), stroke/transient ischemic attack, anxiety disorders, osteoporosis, cancer, obesity, asthma, pressure and chronic ulcers, acute myocardial infarction, mobility impairments, substance abuse (both drug and alcohol), multiple sclerosis, spinal injury, hip fracture, autism spectrum disorder, bipolar disorder, hepatitis, HIV and schizophrenia. The chronic conditions were identified from linked Medicare beneficiary files and administrative claims using CCW algorithms [18, 19]. A description of the methodology to ascertain each chronic condition can be found at the CCW website [18, 19].
Sociodemographic factors were identified in HRS data and include: sex (male as reference); race categorized as NH White and NH Black; education as a continuous variable of years in school; and wealth as a continuous variable with increments of $10,000. Wealth was truncated at its 95th percentile ($2,044,220) and any value of wealth above $2,044,220 was given that value. Age as a continuous variable was included from the CMS claim for participants’ first event in 2014.
The outcomes were hospitalizations and potentially avoidable hospitalizations occurring in 2014 CMS claims. Hospitalizations were identified as a binary indicator (no/yes) by all inpatient hospital claims in the Medicare claims data. Potentially avoidable hospitalization was constructed as a binary indicator (no/yes) based on the definition by Segal et al. 2014 specifically developed for Medicare-Medicaid Eligible Beneficiaries [20] and widely used in studies of Medicare claims data [21,22,23,24]. Our potentially avoidable hospitalization variable used the classification based on both institutional and non-institutional settings that included the following nine inpatient hospital claims with the 9th revision of International Classification of Disease diagnosis categories: COPD, chronic bronchitis and asthma; congestive heart failure; constipation, fecal impaction, and obstipation; dehydration, volume depletion including acute renal failure and hyponatremia; hypertension and hypotension; poor glycemic control; seizures; urinary tract infection; and weight loss (failure to thrive) and nutritional deficiencies. The ICD-9 codes are available in Supplementary Table S1. Conditions for institutional settings only were not included in this study.
Statistical models
Descriptive statistics were conducted using means with standard deviations or medians with interquartile ranges (IQR) for continuous variables and frequencies with percentages for categorical variables for each of the outcomes.
Statistical models were constructed in two steps. First, to identify chronic conditions with the highest variable importance in predicting the outcomes, conditional inference random forests were implemented using the R package ‘party’ [25]. This non-parametric, machine learning method uses bootstrap aggregation to create multiple decision trees, each using a random sample of variables as split candidates, and collects their results. Recursive binary partitioning is conducted by the decision trees to explore the relationship between multiple explanatory variables and one outcome. In this process, a decision tree is constructed by testing the null hypothesis of independence between each variable and the outcome. If the hypothesis cannot be rejected, the algorithm is stopped. The variable with the greatest reduction of heterogeneity in the outcome is selected and a binary split of the variable is performed. Each forest was created using 1,500 trees and we repeated the analyses three times with different random seeds to confirm the robustness of results. The number of potential variables to try at each potential split were set to the default (square root of the number of predictors in the model). From these analyses, we obtained a ranking of variable importance in predicting the outcome. The conditional inference random forest differs from the random forest implemented in the R package ‘randomForest’ by 1) being unbiased when predictor variables are of different types and 2) include a conditional permutation importance measure that helps evaluate the importance of correlated predictor variables [26]. Second, the top three chronic conditions identified in the ranking were included in multivariable logistic regression analyses adjusted for sociodemographic factors to identify risk estimates (adjusted odds ratios [aORs]) and 95% confidence intervals (CIs) and quantify the association for each of the two outcomes.