Predictive ability of an expert-defined population segmentation framework for healthcare utilization and mortality - a retrospective cohort study
BMC Health Services Research volume 19, Article number: 401 (2019)
Population segmentation of patients into parsimonious and relatively homogenous subgroups or segments based on healthcare requirements can aid healthcare resource planning and the development of targeted intervention programs. In this study, we evaluated the predictive ability of a previously described expert-defined segmentation approach on 3-year hospital utilization and mortality.
We segmented all adult patients who had a healthcare encounter with Singapore Health Services (SingHealth) in 2012 using the SingHealth Electronic Health Records (SingHealth EHRs). Patients were divided into non-overlapping segments defined as Mostly Healthy, Stable Chronic, Serious Acute, Complex Chronic without Frequent Hospital Admissions, Complex Chronic with Frequent Hospital Admissions, and End of Life, using a previously described expert-defined segmentation approach. Hospital admissions, emergency department attendances (ED), specialist outpatient clinic attendances (SOC) and mortality in different patient subgroups were analyzed from 2013 to 2015.
819,993 patients were included in this study. Patients in Complex Chronic with Frequent Hospital Admissions segment were most likely to have a hospital admission (IRR 22.7; p < 0.001) and ED visit (IRR 14.5; p < 0.001) in the follow-on 3 years compared to other segments. Patients in the End of Life and Complex Chronic with Frequent Hospital Admissions segments had the lowest three-year survival rates of 58.2 and 62.6% respectively whereas other segments had survival rates of above 90% after 3 years.
In this study, we demonstrated the predictive ability of an expert-driven segmentation framework on longitudinal healthcare utilization and mortality.
Population segmentation of patients into parsimonious and relatively homogenous subgroups or segments based on healthcare requirements can aid healthcare resource planning and the development of targeted intervention programs for a specific patient subgroup [1, 2]. With an understanding of the current and future healthcare requirements for each segment, more targeted and efficient care can be delivered for each specific patient segment. This is especially critical in Singapore with rapidly ageing population and increasing chronic disease burden . Healthcare expenditure is predicted to exponentially increase from Singapore Dollars (SGD) $4 billion (USD $2.98 billion) in 2011 to SGD $12 billion (USD $8.94 billion) in 2020 . Healthcare in Singapore is mainly under the responsibility of the Singapore Ministry of Health (MOH) which uses a mixed financing system that includes nationalized healthcare insurance schemes and deductions from the compulsory savings plan Central Provident Fund (CPF), for Singapore citizens and permanent residents . In order to effectively deliver effective and targeted care for an ageing population and cope with increasing healthcare costs, it is crucial to have a deep understanding of population’s health characteristics and healthcare needs. Population segmentation is a critical first step in the development of effective healthcare policy because it provides policy makers with more detailed information about specific health characteristics and healthcare needs of each population segment which allows for tailored health intervention programs for different segments. This eventually leads to better policy decisions on healthcare resources allocation and planning.
There are two major approaches to population segmentation – 1) data-driven approach where segmentation is done using statistical analysis (e.g. clustering analysis, latent class analysis, classification tree) on empirical health data and 2) expert-defined approach where segments are decided via experts’ review and consensus on current evidence in literature. These two approaches are not mutually exclusive and a hybrid approach may have both data and experts input. Some examples of a data-driven approach include Lafortune’s latent class analysis of a trial’s data , Liu et al’s study of the Taiwan National health Insurance survey participants  and Van der Laan at al’s demand-driven segmentation model . In these studies, health related data, including medical, behavioral, functional and socio-demographic data were used to derive various segments and profile each segment’s characteristics.
Alternatively, segments can also be defined a-priori through experts’ review and consensus on current evidence in literature. Examples of published expert-defined approaches include Lynn et al’s Bridges to Health person-centered segmentation framework,  Kaiser Permanente’s Senior Segmentation Algorithm for elderly persons aged 65 years or older  and National Academy of Medicine Patient Taxonomy . In our previous work, we assessed the feasibility of segmenting a general patient population into six segments defined by Singapore Health Systems Regional Health System (SingHealth RHS) experts . In our previous work, we found this framework to be feasible as a proof of concept to identify patient segments with distinct healthcare utilization and mortality patterns . However, in the previous study, we were not able to assess the predicative ability of patient segment membership on long-term healthcare utilization and mortality. It is important that validation and adjustment need to be pursued before clinical and policy application in a healthcare system . In our policy context, the segmentation approach needs to be validated against long-term healthcare utilization and mortality. This is also a critical gap in literature where it is not clear whether population segments by expert-defined segmentation approaches have different long-term healthcare utilization and mortality.
In this study, we aimed to address this critical gap by assessing the predictive ability of our expert defined segmentation approach on 3-year healthcare utilization (defined as hospital admissions, emergency department attendances, and specialist outpatient clinic attendances) and mortality rate.
We conducted a retrospective study to segment all adult patients (≥ 21 years of age in Year 2012) who utilized healthcare services at SingHealth RHS in 2012. Patients were excluded if they were below 21 years of age. This study was approved by SingHealth Centralized Institutional Review Board (CIRB 2016/2294). De-identified data from 2012 to 2015 were extracted from the electronic health records (EHRs) using the Oracle Business Intelligence and Enterprise Edition (OBIEE) Software . The extracted variables included socio-demographic data, chronic diseases, healthcare utilization (hospital admissions, emergency department attendances and specialist outpatient clinic attendances) and mortality.
A previously described segmentation framework was used . The experts who developed the current framework are senior health administrators with extensive experience in both health policy and clinical care. This is to ensure policy and implementation relevance in our healthcare system setting. Patients were segmented into six non-overlapping subgroups: Mostly Healthy, Stable Chronic, Serious Acute, Complex Chronic without Frequent Hospital Admissions, Complex Chronic with Frequent Hospital Admissions, and End of Life. The definitions and examples of the segments are elaborated in Additional file 1 and Additional file 2. We defined frequent hospital admissions as 3 or more hospital admissions in past 12 months, which is a proxy for high cost users [15,16,17,18].
We firstly compared the socio-demographics and hospital utilization in baseline year 2012 between each segment using Chi-square for categorical variables and one-way ANOVA test for continuous variables. Using the start date of 1st January 2013 as time of entry into the study for all patients, we calculated the time to survival as the number of days from entry to death (for patients who are deceased on or before 31st December 2015) or 1094 days for censored patients (number of days from entry to 31st December 2015). Kaplan-Meier survival curves were plotted and differences in the survival plots were analyzed using log-rank test. To determine if there are differences in the hospital utilization from year 2013 to 2014, we first conducted bivariate analyses between the population segment and the hospital utilization using ANOVA or Chi-square test. As the count data for the utilization rate is over-dispersed where most of the patients actually have 0 utilization, a negative binomial regression model was used to model the hospital utilization with the Mostly Healthy segment as the reference group for the segments, and adjusted for age, gender, and ethnicity and past hospital utilization. We used the survival time as the exposure variable for the negative binomial regression model. We also conducted two-degree freedom Chi-square test between each pair of segments to test for significant difference of hospital utilization. All analyses were performed on STATA/IC 13.1.
Patient baseline characteristics and acute hospital utilization
A total of 819,993 patients were included and segmented into the six segments with the proportions shown in Table 1. The overall mean age of the study population was 49.8 years with standard deviation (SD) of 17.2. There are more female than male patients (58% vs. 42%). The differences in age and gender between the segments are statistically significant with p < 0.001.
There is a trend of increasing hospital utilization in 2012 as we moved down the segments from Mostly Healthy to Complex Chronic with Frequent Hospital Admissions (Table 1). The differences between the six segments are all statistically significant with p < 0.001 for ED visits, SOC visits and hospital admissions. Not unexpectedly, patients in the Complex Chronic with Frequent Admissions segment had more frequent admissions, as this was a criterion for inclusion in this segment. However, this pattern of increased utilization in this group was also seen for SOC and ED attendances, suggesting that this segment does have increased healthcare utilization in multiple areas.
Bivariate analyses of segments and hospital utilization from year 2013 to 2015
The trend that we observed for hospital utilization from year 2013 to 2015 is similar to the trend for hospital utilization in 2012 where there is an increasing number of ED visits, SOC visits and hospital admissions from the Mostly Healthy segment to the Complex Chronic with Frequent Hospital Admissions (Table 2). Patients in the End of Life segment had the most SOC visits (mean 43.2, SD 50.8) among all six segments but they had significantly less ED visits (mean 0.88, SD 1.67) and hospital admissions (mean 1.33, SD 2.10) than patients in the Complex Chronic with Frequent Hospital Admissions segment (mean 4.00, SD 7.29 for ED visit; mean 4.49, SD 6.32 for hospital admissions). The hospital utilization is significantly different for the six segments with p < 0.001.
Multivariable negative binomial regression on hospital utilization from year 2013 to 2015
As compared to the Mostly Healthy segment, patients in all other segments have significantly higher ED visits (p < 0.001) after adjusting for age, gender, ethnicity and hospital utilization in year 2012 (Table 3). Patients in the Complex Chronic with Frequent Admissions segment have 14.5 times (95% Confidence Interval: 13.49–15.64) ED visits compared to patients in the Mostly Healthy segment. Patients in the End of Life segment also have a highly increased risk of having ED visits compared to patients in the Mostly Healthy segment with an incident rate ratio (IRR) of 9.56 (95% CI: 8.51–10.75).
For SOC, compared with Mostly Healthy segment, all the other segments have significantly higher utilization than the (all p < 0.001). After adjusting for the baseline variables and hospital utilization in 2012, patients in the End of Life segment have 11.50 times (95% CI: 10.68–12.39) SOC utilization compared to patients in the Mostly Healthy segment. Patients in the Complex Chronic with Frequent Admissions segment also have a significantly higher utilization than patients in the Mostly Healthy segment (IRR 7.71, 95% CI: 7.31–8.13).
Lastly, compared to the Mostly Healthy segment, all other segments have significantly high inpatient admissions with IRRs > 1 (p < 0.001). Patients in the Complex Chronic with Frequent Admissions segment had the highest IRR of 22.66 (95% CI: 21.07–24.37) for hospital admissions from 2013 to 2015. Patients in the End of Life segment have the second highest IRR of 16.18 (95% CI: 14.49–18.07).
For each model, the Chi-square tests showed that there are significant differences between all pair-wise segments with p < 0.001.
Analysis of survival time
Day 0 was taken at 1st January 2013. At the end of 2013, the survival rates for patients in the End of Life and Complex Chronic with Frequent Hospital Admissions segments were 74.6 and 81.7% respectively, while the survival rates for Complex Chronic without Frequent Hospital Admissions, Stable Chronic, Serious Acute and Mostly Healthy segments were all > 95%.
At the end of the second year (2014), the survival rates for patients in the End of Life and Complex Chronic with Frequent Hospital Admissions segments were 64.6 and 71.0% respectively, while the survival rates for Complex Chronic without Frequent Hospital Admissions, Stable Chronic, Serious Acute and Mostly Healthy segments were all > 93%.
Overall, patients in the End of Life segment had the worst survival rate (58.2%), followed by patients in the Complex Chronic with Frequent Hospital Admissions (62.6%) at the end of 3 years (end of 2015). Throughout the 3 years 2013–2015, the survival rates for patients in the Mostly Healthy, Serious Acute and Stable Chronic segments were indistinguishable from each other and higher than the other three segments (Additional file 3). The log-rank test for equality of the six survival distributions showed statistically significant difference between the six segments (p < 0.001).
Our study supports that our previously developed six-segment framework is predictive of long-term healthcare utilization and mortality. Healthcare utilization and mortality increased with the complexity of the segments, suggesting that our segmentation approach was able to discriminate between patients of varying healthcare needs and risk of mortality. Patients in the Complex Chronic with Frequent Hospital Admissions segment represented 0.5% of the study population, but accounted for the highest risk of hospital admissions and ED visits per patient, and second highest risk of SOC visits in the following 3 years (2013–2015) after the initial healthcare encounter in 2012. Moreover, about one in three patients in this segment died within the next 3 years. This suggests that patients in segment had high healthcare burden that requires further investigation into disease management, psychosocial environment and quality of community care within the segment. Equally worth noting is the End of Life segment that accounted for highest SOC visits. This is likely due to the nature of patients within the End of Life segment – many of them have metastatic cancer with frequent outpatient appointments.
For the Mostly Healthy, Serious Acute, and Stable Chronic segments, survival rates were similar from 2013 to 2015 although there was an increasing gradient of healthcare utilization over the same period of time. This is important information in population health management which does not only consider survival but also healthcare resource consumptions and service planning. In a healthcare system where increasing healthcare spending is of particular concern, healthcare resource consumption trends are relevant and of particular interest to our policy makers.
There are several strengths of our approach. Firstly, this simple categorization can be easily replicated in most healthcare systems as the variables and healthcare utilization measures used in our study are commonly available in other healthcare systems. Some of the recently implemented segmentation framework such as those used in British Columbia, Canada  and Northern London, UK  used similar domains of information as our framework. While our study successfully identified six distinct segments with different long-term healthcare utilization and mortality, we are cognizant that even within each segment, patients may have differing healthcare needs. The utility of the current segmentation approach is less about specific disease treatment for a specific patient over a single healthcare encounter, which requires individualization of management plan by each patient-healthcare provider pair, but more relevant at policy level in planning what types of health services are needed for each segment at population level. Our segmentation framework is practical, with each segment corresponding to a predominant site of care and bundle of interventions. For example, subjects in the Mostly Healthy and the Serious Acute segments require mainly community-based health promotion activities and lifestyle interventions. This will guide population health policy and lead to more resources in preventive services development and health promotion efforts. Patients in the Stable Chronic segment require mainly primary care to avoid progression to complications while patients in the Complex Chronic with Frequent Hospital Admissions segment and Complex Chronic without Frequent Hospital Admissions segment may benefit from more aggressive and multi-disciplinary services for case management. For the End of Life segment, hospice care is typically needed to manage symptoms and to avoid events such as unnecessary hospitalizations that may be expensive and potentially risky. By knowing there is an End of Life segment and what is the proportion of entire patient population that belong to this segment, healthcare policy makers can allocate appropriate health resources in developing advanced care plans and shared care with appropriate specialists and/or team-based care, community case coordinators to optimize quality of life.
Our study has several limitations. First, variables in our dataset were restricted to those routinely collected in our EHRs. We were hence unable to refine the segmentation using information on functional status and socioeconomic variables which play important roles in influencing health related behavior and health services utilizations . Secondly, our population database is unable to account for cross-utilization of healthcare services outside of the SingHealth or out of hospital deaths.
Data-driven segmentation approaches also provide an attractive alternative to generating evidence-based insights of a population’s health status. These approaches include unsupervised techniques such as clustering analysis and latent class analysis, and supervised techniques such as classification and regression. A key strength of data-driven approaches is the potential to group similar patients according to their similarity in several dimensions or characteristics . Non-apparent latent classes or clusters can then be identified based on similar characteristics. Data-driven frameworks, although easy to standardize and explicit in methodology, may not always be relevant and practical at policy and implementation level in a particular healthcare system. Experts driven methods are likely to have implementation feasibility and policy implications but may not have the rich insights from large volume of health data. It is each healthcare system’s decision to adopt either experts-driven, data-driven, or a hybrid approach taking into considerations scientific evidence and specific policy contexts and priorities.
In this study, we demonstrated the predictive ability of an expert-driven segmentation framework on longitudinal healthcare utilization and mortality.
Availability of data and materials
The dataset used for this study can be found at Harvard Dataverse: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/XTXCYD.
Central Provident Fund
Incidence Rate Ratio
Ministry of Health
Singapore Health Services
Specialist Outpatient Clinic
Porter ME, Pabo EA, Lee TH. Redesigning primary care: a strategic vision to improve value by organizing around patients’ needs. Health Aff. 2013;32:516–25.
Porter ME. What is value in health care? N Engl J Med. 2010;363:1–3.
Department of Statistics Singapore. Population trends, 2017. Singapore: Ministry of Trade & Industry, Republic of Singapore; 2017. https://www.singstat.gov.sg/-/media/files/publications/population/population2017.pdf
Aziz IS. Healthcare spending to hit S$12 billion by 2020: Tharman. TODAY Newspaper. 2014. http://www.todayonline.com/singapore/healthcare-spending-hit-s12-billion-2020-tharman. Accessed 4 Mar 2018.
Costs and Financing | Ministry of Health. https://www.moh.gov.sg/content/moh_web/home/costs_and_financing.html. Accessed 12 May 2018.
Lafortune L, Béland F, Bergman H, Ankri J. Health status transitions in community-living elderly with complex care needs: a latent class approach. BMC Geriatr. 2009;9:6.
Liu LF, Tian WH, Yao HP. Utilization of health care services by elderly people with National Health Insurance in Taiwan: the heterogeneous health profile approach. Health Policy. 2012;108:246–55.
Eissens van der Laan MR, van Offenbeek MAG, Broekhuis H, Slaets JPJ. A person-centred segmentation study in elderly care: towards efficient demand-driven care. Soc Sci Med. 2014;113:68–76.
LYNN J, STRAUBE BM, BELL KM, JENCKS SF, KAMBIC RT. Using population segmentation to provide better health Care for all: the “bridges to health” model. Milbank Q. 2007;85:185–208. https://doi.org/10.1111/j.1468-0009.2007.00483.x.
Zhou Y. Improving Care for Older Adults: a model to segment the senior population. Perm J. 2014:18–21. https://doi.org/10.7812/TPP/14-005.
Long P, Abrams M, Milstein A, Anderson G, Apton KL, Dahlberg ML, et al. Effective care for high-need patients opportunities for improving outcomes, value, and health. 2017. https://nam.edu/wp-content/uploads/2017/06/Effective-Care-for-High-Need-Patients.pdf. Accessed 7 Dec 2018.
Low LL, Kwan YH, Liu N, Jing X, Low ECT, Thumboo J. Evaluation of a practical expert defined approach to patient population segmentation: a case study in Singapore. BMC Health Serv Res. 2017;17:771.
Moons KGM, Altman DG, Vergouwe Y, Royston P. Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ. 2009;338:b606.
Low LL, Lee KH, Hock Ong ME, Wang S, Tan SY, Thumboo J, et al. Predicting 30-day readmissions: performance of the LACE index compared with a regression model among general medicine patients in Singapore. Biomed Res Int. 2015;2015.
Low LL, Liu N, Wang S, Thumboo J, Ong MEH, Lee KH. Predicting frequent hospital admission risk in Singapore: a retrospective cohort study to investigate the impact of comorbidities, acute illness burden and social determinants of health. BMJ Open. 2016;6:e012705.
Low LL, Liu N, Wang S, Thumboo J, Ong MEH, Lee KH. Predicting 30-day readmissions in an Asian population: building a predictive model by incorporating markers of hospitalization severity. PLoS One. 2016;11:e0167413.
Low LL, Tay WY, Ng MJM, Tan SY, Liu N, Lee KH. Frequent hospital admissions in Singapore: clinical risk factors and impact of socioeconomic status. Singap Med J. 2016;2016:1–16.
Saxena N, You AX, Zhu Z, Sun Y, George PP, Teow KL, et al. Singapore’s regional health systems—a data-driven perspective on frequent admitters and cross utilization of healthcare services in three systems. Int J Health Plann Manag. 2017;32:36–49.
British Columbia Ministry of Health. THE HEALTH SYSTEM MATRIX 6.1: Understanding the Health Care Needs of the British Columbia Population through Population Segmentation. 2015. https://www2.gov.bc.ca/assets/gov/health/conducting-health-research/data-access/health_system_matrix_61_definitions.pdf. Accessed 30 Mar 2018.
North West London whole systems integrated care | what approach should we take? http://integration.healthiernorthwestlondon.nhs.uk/section/what-approach-should-we-take-. Accessed 2 Oct 2017.
National Center for Health Statistics, Centers for Disease Control and Prevention. Health, United States, 2011: With special feature on socioeconomic status and health; 2012. https://doi.org/10.1080/01621459.1987.10478476.
Low LL, Yan S, Kwan YH, Tan CS, Thumboo J. Assessing the validity of a data driven segmentation approach: a 4 year longitudinal study of healthcare utilization and mortality. PLoS One. 2018. https://doi.org/10.1371/journal.pone.0195243.
The authors would like to acknowledge the SingHealth Regional Health System Population Segmentation workgroup, Mr. Tan Wee Boon and Miss Chia Boong Kheng for their support.
This research received grant funding from SingHealth Foundation Health Services Research (Aging) Startup Grant SHF/HSRAg004/2015 and SingHealth Nurturing Clinician Scientist Award Academic Clinical Programme Funding FY 2016 Cycle 2. The funding bodies played no role in the design, execution, analysis, and write-up of the study.
Ethics approval and consent to participate
This study was approved for ethics with a waiver for patient consent by SingHealth Centralized Institutional Review Board (CIRB 2016/2294).
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Low, L.L., Kwan, Y.H., Ma, C.A. et al. Predictive ability of an expert-defined population segmentation framework for healthcare utilization and mortality - a retrospective cohort study. BMC Health Serv Res 19, 401 (2019). https://doi.org/10.1186/s12913-019-4251-6