Predictive ability of an expert-defined population segmentation framework for healthcare utilization and mortality - a retrospective cohort study

Background Population segmentation of patients into parsimonious and relatively homogenous subgroups or segments based on healthcare requirements can aid healthcare resource planning and the development of targeted intervention programs. In this study, we evaluated the predictive ability of a previously described expert-defined segmentation approach on 3-year hospital utilization and mortality. Methods We segmented all adult patients who had a healthcare encounter with Singapore Health Services (SingHealth) in 2012 using the SingHealth Electronic Health Records (SingHealth EHRs). Patients were divided into non-overlapping segments defined as Mostly Healthy, Stable Chronic, Serious Acute, Complex Chronic without Frequent Hospital Admissions, Complex Chronic with Frequent Hospital Admissions, and End of Life, using a previously described expert-defined segmentation approach. Hospital admissions, emergency department attendances (ED), specialist outpatient clinic attendances (SOC) and mortality in different patient subgroups were analyzed from 2013 to 2015. Results 819,993 patients were included in this study. Patients in Complex Chronic with Frequent Hospital Admissions segment were most likely to have a hospital admission (IRR 22.7; p < 0.001) and ED visit (IRR 14.5; p < 0.001) in the follow-on 3 years compared to other segments. Patients in the End of Life and Complex Chronic with Frequent Hospital Admissions segments had the lowest three-year survival rates of 58.2 and 62.6% respectively whereas other segments had survival rates of above 90% after 3 years. Conclusion In this study, we demonstrated the predictive ability of an expert-driven segmentation framework on longitudinal healthcare utilization and mortality. Electronic supplementary material The online version of this article (10.1186/s12913-019-4251-6) contains supplementary material, which is available to authorized users.


Background
Population segmentation of patients into parsimonious and relatively homogenous subgroups or segments based on healthcare requirements can aid healthcare resource planning and the development of targeted intervention programs for a specific patient subgroup [1,2]. With an understanding of the current and future healthcare requirements for each segment, more targeted and efficient care can be delivered for each specific patient segment. This is especially critical in Singapore with rapidly ageing population and increasing chronic disease burden [3]. Healthcare expenditure is predicted to exponentially increase from Singapore Dollars (SGD) $4 billion (USD $2.98 billion) in 2011 to SGD $12 billion (USD $8.94 billion) in 2020 [4]. Healthcare in Singapore is mainly under the responsibility of the Singapore Ministry of Health (MOH) which uses a mixed financing system that includes nationalized healthcare insurance schemes and deductions from the compulsory savings plan Central Provident Fund (CPF), for Singapore citizens and permanent residents [5]. In order to effectively deliver effective and targeted care for an ageing population and cope with increasing healthcare costs, it is crucial to have a deep understanding of population's health characteristics and healthcare needs. Population segmentation is a critical first step in the development of effective healthcare policy because it provides policy makers with more detailed information about specific health characteristics and healthcare needs of each population segment which allows for tailored health intervention programs for different segments. This eventually leads to better policy decisions on healthcare resources allocation and planning.
There are two major approaches to population segmentation -1) data-driven approach where segmentation is done using statistical analysis (e.g. clustering analysis, latent class analysis, classification tree) on empirical health data and 2) expert-defined approach where segments are decided via experts' review and consensus on current evidence in literature. These two approaches are not mutually exclusive and a hybrid approach may have both data and experts input. Some examples of a data-driven approach include Lafortune's latent class analysis of a trial's data [6], Liu et al's study of the Taiwan National health Insurance survey participants [7] and Van der Laan at al's demand-driven segmentation model [8]. In these studies, health related data, including medical, behavioral, functional and socio-demographic data were used to derive various segments and profile each segment's characteristics.
Alternatively, segments can also be defined a-priori through experts' review and consensus on current evidence in literature. Examples of published expertdefined approaches include Lynn et al's Bridges to Health person-centered segmentation framework, [9] Kaiser Permanente's Senior Segmentation Algorithm for elderly persons aged 65 years or older [10] and National Academy of Medicine Patient Taxonomy [11]. In our previous work, we assessed the feasibility of segmenting a general patient population into six segments defined by Singapore Health Systems Regional Health System (SingHealth RHS) experts [12]. In our previous work, we found this framework to be feasible as a proof of concept to identify patient segments with distinct healthcare utilization and mortality patterns [12]. However, in the previous study, we were not able to assess the predicative ability of patient segment membership on long-term healthcare utilization and mortality. It is important that validation and adjustment need to be pursued before clinical and policy application in a healthcare system [13]. In our policy context, the segmentation approach needs to be validated against long-term healthcare utilization and mortality. This is also a critical gap in literature where it is not clear whether population segments by expert-defined segmentation approaches have different long-term healthcare utilization and mortality.
In this study, we aimed to address this critical gap by assessing the predictive ability of our expert defined segmentation approach on 3-year healthcare utilization (defined as hospital admissions, emergency department attendances, and specialist outpatient clinic attendances) and mortality rate.

Study design
We conducted a retrospective study to segment all adult patients (≥ 21 years of age in Year 2012) who utilized healthcare services at SingHealth RHS in 2012. Patients were excluded if they were below 21 years of age. This study was approved by SingHealth Centralized Institutional Review Board (CIRB 2016/2294). De-identified data from 2012 to 2015 were extracted from the electronic health records (EHRs) using the Oracle Business Intelligence and Enterprise Edition (OBIEE) Software [14]. The extracted variables included socio-demographic data, chronic diseases, healthcare utilization (hospital admissions, emergency department attendances and specialist outpatient clinic attendances) and mortality.

Segmentation classification
A previously described segmentation framework was used [12]. The experts who developed the current framework are senior health administrators with extensive experience in both health policy and clinical care. This is to ensure policy and implementation relevance in our healthcare system setting. Patients were segmented into six non-overlapping subgroups: Mostly Healthy, Stable Chronic, Serious Acute, Complex Chronic without Frequent Hospital Admissions, Complex Chronic with Frequent Hospital Admissions, and End of Life. The definitions and examples of the segments are elaborated in Additional file 1 and Additional file 2. We defined frequent hospital admissions as 3 or more hospital admissions in past 12 months, which is a proxy for high cost users [15][16][17][18].

Statistical analysis
We firstly compared the socio-demographics and hospital utilization in baseline year 2012 between each segment using Chi-square for categorical variables and oneway ANOVA test for continuous variables. Using the start date of 1st January 2013 as time of entry into the study for all patients, we calculated the time to survival as the number of days from entry to death (for patients who are deceased on or before 31st December 2015) or 1094 days for censored patients (number of days from entry to 31st December 2015). Kaplan-Meier survival curves were plotted and differences in the survival plots were analyzed using log-rank test. To determine if there are differences in the hospital utilization from year 2013 to 2014, we first conducted bivariate analyses between the population segment and the hospital utilization using ANOVA or Chi-square test. As the count data for the utilization rate is over-dispersed where most of the patients actually have 0 utilization, a negative binomial regression model was used to model the hospital utilization with the Mostly Healthy segment as the reference group for the segments, and adjusted for age, gender, and ethnicity and past hospital utilization. We used the survival time as the exposure variable for the negative binomial regression model. We also conducted two-degree freedom Chi-square test between each pair of segments to test for significant difference of hospital utilization. All analyses were performed on STATA/IC 13.1.

Patient baseline characteristics and acute hospital utilization
A total of 819,993 patients were included and segmented into the six segments with the proportions shown in Table 1. The overall mean age of the study population was 49.8 years with standard deviation (SD) of 17.2. There are more female than male patients (58% vs. 42%). The differences in age and gender between the segments are statistically significant with p < 0.001.
There is a trend of increasing hospital utilization in 2012 as we moved down the segments from Mostly Healthy to Complex Chronic with Frequent Hospital Admissions ( Table 1). The differences between the six segments are all statistically significant with p < 0.001 for ED visits, SOC visits and hospital admissions. Not unexpectedly, patients in the Complex Chronic with Frequent Admissions segment had more frequent admissions, as this was a criterion for inclusion in this segment. However, this pattern of increased utilization in this group was also seen for SOC and ED attendances, suggesting that this segment does have increased healthcare utilization in multiple areas.

Bivariate analyses of segments and hospital utilization from year 2013 to 2015
The trend that we observed for hospital utilization from year 2013 to 2015 is similar to the trend for hospital utilization in 2012 where there is an increasing number of ED visits, SOC visits and hospital admissions from the Mostly Healthy segment to the Complex Chronic with Frequent Hospital Admissions (    (Additional file 3). The log-rank test for equality of the six survival distributions showed statistically significant difference between the six segments (p < 0.001).

Discussion
Our study supports that our previously developed sixsegment framework is predictive of long-term healthcare utilization and mortality. Healthcare utilization and mortality increased with the complexity of the segments, suggesting that our segmentation approach was able to discriminate between patients of varying healthcare needs and risk of mortality. Patients in the Complex Chronic with Frequent Hospital Admissions segment represented 0.5% of the study population, but accounted for the highest risk of hospital admissions and ED visits per patient, and second highest risk of SOC visits in the following 3 years (2013-2015) after the initial healthcare encounter in 2012. Moreover, about one in three patients in this segment died within the next 3 years. This suggests that patients in segment had high healthcare burden that requires further investigation into disease management, psychosocial environment and quality of community care within the segment. Equally worth noting is the End of Life segment that accounted for highest SOC visits. This is likely due to the nature of patients within the End of Life segmentmany of them have metastatic cancer with frequent outpatient appointments.
For the Mostly Healthy, Serious Acute, and Stable Chronic segments, survival rates were similar from 2013 to 2015 although there was an increasing gradient of healthcare utilization over the same period of time. This is important information in population health Abbreviations: ED Emergency Department, SOC Specialist Outpatient Clinic. Numbers were presented as mean ± standard deviation or number (%) as appropriate a Chi-square for categorical variables and one-way ANOVA test for continuous variables management which does not only consider survival but also healthcare resource consumptions and service planning. In a healthcare system where increasing healthcare spending is of particular concern, healthcare resource consumption trends are relevant and of particular interest to our policy makers. There are several strengths of our approach. Firstly, this simple categorization can be easily replicated in most healthcare systems as the variables and healthcare utilization measures used in our study are commonly available in other healthcare systems. Some of the recently implemented segmentation framework such as those used in British Columbia, Canada [19] and Northern London, UK [20] used similar domains of information as our framework. While our study successfully identified six distinct segments with different long-term healthcare utilization and mortality, we are cognizant that even within each segment, patients may have differing healthcare needs. The utility of the current segmentation approach is less about specific disease treatment for a specific patient over a single healthcare encounter, which requires individualization of management plan by each patient-healthcare provider pair, but more relevant at policy level in planning what types of health services are needed for each segment at population level. Our segmentation framework is practical, with each segment corresponding to a predominant site of care and bundle of interventions. For example, subjects in the Mostly Healthy and the Serious Acute segments require mainly community-based health promotion activities and lifestyle interventions. This will guide population health policy and lead to more resources in preventive services development and health promotion efforts. Patients in the Stable Chronic segment require mainly primary care to avoid progression to complications while patients in the Complex Chronic with Frequent Hospital Admissions segment and Complex Chronic without Frequent Hospital Admissions segment may benefit from more aggressive and multi-disciplinary services for case management. For the End of Life segment, hospice care is typically needed to manage symptoms and to avoid events such as unnecessary hospitalizations that may be expensive and potentially risky. By knowing there is an End of Life segment and what is the proportion of entire patient population that belong to this segment, healthcare policy makers can allocate appropriate health resources in developing advanced care plans and shared care with appropriate specialists and/or team-based care, community case coordinators to optimize quality of life. Our study has several limitations. First, variables in our dataset were restricted to those routinely collected in our EHRs. We were hence unable to refine the segmentation using information on functional status and socioeconomic variables which play important roles in influencing health related behavior and health services utilizations [21]. Secondly, our population database is unable to account for cross-utilization of healthcare services outside of the SingHealth or out of hospital deaths.
Data-driven segmentation approaches also provide an attractive alternative to generating evidence-based insights of a population's health status. These approaches include unsupervised techniques such as clustering analysis and latent class analysis, and supervised techniques such as classification and regression. A key strength of data-driven approaches is the potential to group similar patients according to their similarity in several dimensions or characteristics [22]. Non-apparent latent classes or clusters can then be identified based on similar characteristics. Data-driven frameworks, although easy to standardize and explicit in methodology, may not always be relevant and practical at policy and implementation level in a particular healthcare system. Experts driven methods are likely to have implementation feasibility and policy implications but may not have the rich insights from large volume of health data. It is each healthcare system's decision to adopt either expertsdriven, data-driven, or a hybrid approach taking into considerations scientific evidence and specific policy contexts and priorities.