Profiles and predictors of healthcare utilization: using a cluster-analytic approach to identify typical users across conventional, allied and complementary medicine, and self-care

Introduction The identification of typologies of health care users and their specific characteristics can be performed using cluster analysis. This statistical approach aggregates similar users based on their common health-related behavior. This study aims to examine health care utilization patterns using cluster analysis; and the associations of health care user types with sociodemographic, health-related and health-system related factors. Methods Cross-sectional data from the 2012 National Health Interview Survey were used. Health care utilization was measured by consultations with a variety of medical, allied and complementary health practitioners or the use of several interventions (exercise, diet, supplementation etc.) within the past 12 months (used vs. not used). A model-based clustering approach based on finite normal mixture modelling, and several indices of cluster fit were determined. Health care utilization within the cluster was analyzed descriptively, and independent predictors of belonging to the respective clusters were analyzed using logistic regression models including sociodemographic, health- and health insurance-related factors. Results Nine distinct health care user types were identified, ranging from nearly non-use of health care modalities to over-utilization of medical, allied and complementary health care. Several sociodemographic and health-related characteristics were predictive of belonging to the respective health care user types, including age, gender, health status, education, income, ethnicity, and health care coverage. Conclusions Cluster analysis can be used to identify typical health care utilization patterns based on empirical data; and those typologies are related to a variety of sociodemographic and health-related characteristics. These findings on individual differences regarding health care access and utilization can inform future health care research and policy regarding how to improve accessibility of different medical approaches.


Introduction
The use of health services is a very individual process, yet it is shaped by institutional, cultural and social circumstances. Plenty of research has investigated determinants of health care use. Those include contextual characteristics and their influence on health care access [1], patient segmentation [2], digital interventions in relation to health behaviors [3] and the relation of physical activity and healthcare utilization [4] to predict health care utilization, and improve health care provision while at the same time targeting costs.
Research however often does not include the integrative approach of complementary medicine usage and user's typology. So far, a comprehensive understanding of the patterns of health care use, and its individual determinants is missing. Examining health care utilization patterns may assist in understanding over-and underutilization, and related factors, and has the potential to shape health care provision, research and policy.
Analysis of health care utilization patterns has been advanced by new statistical methods and increasing computing power to employ more accurate and adequate (big) data analysis methods such as model-based cluster analysis and classification. Such methods have been used before to analyze homeless shelter utilization patterns [5], dietary [6] or other health related behavior patterns [7].
Cluster analysis is a statistical technique used to recognize natural patterns of subjects, i.e. to group them in such way that subjects in one group or cluster are more alike than subjects in different groups or clusters with regards to defined patterns, such as health care utilisation. Similarity of subjects can be defined by various methods and indices including the Euclidean distance between observations or the density estimation of subject distribution. Using this approach, one can identify empirical health care utilization patterns, and define typologies of consumers and their specific characteristics based on the pattern.
To our knowledge no study has determined general health care utilization patterns, this is, determined patterns of conventional, allied medicine, complementary medicine, and self-care in a nationally representative sample. While there is a study on pattern of complementary medicine use in the US-based National Health Interview Survey (NHIS) (2012), it was limited to children with mental health issues [8].
This study aims to examine NHIS health care utilization patterns in adults using a cluster analytic approach; and the associations of cluster patterns with sociodemographic, health-related and health-system related factors.

Methods
The nationally representative NHIS monitors the health status and health care access and utilization of the noninstitutionalized US population on a yearly basis including the use of complementary and alternative medicine (CAM) therapies every 5 years. For this analysis data from the Family Core, the Sample Adult Core, and the Adult Complementary and Alternative Medicine questionnaires from 2012 were merged. The more recent NHIS 2017 no longer assessed the totality of CAM modalities, but was limited to a few selected approaches. Thereby, data on the use of common treatment modalities such as acupuncture, osteopathy, supplements or herbal medicine were missing. We therefore chose to investigate the more comprehensive 2012 dataset.
The Family Core and the Sample Adult Core questionnaires collected data on socio-demographic characteristics including age, gender, ethnicity, region, marital status, education, and annual household income; selfperceived general health status, diagnosed conditions and diseases; and health care coverage, access and utilization. The Adult Complementary and Alternative Medicine questionnaire collected data on the use of complementary and alternative medicine.
Health care use in the past 12 months was queried with several question designs, such as whether a practitioner was consulted, or an intervention was used ("During the past 12 months, did you see a practitioner/ use/ practice …?"). Visits to the dentist and emergency room were queried differently, see Table 1. All items were coded as or recoded into binary variables (used vs. not used in the past 12 months) for the cluster analysis. A number of complementary and alternative therapies were combined to respond to the lack of observations. For example, all queried herbal medicines were combined into one variable (used herbal medicine vs. did not use herbal medicine); the same was done for non-vitamin supplements, vitamins and minerals, native healers, osteopathy and craniosacral therapy, Tai Chi and qigong, all forms of meditation, exercises and medical diets (see Table 1).
A total of 42,366 households were eligible and 34,525 adults provided data (79.7% response rate) [9]. The final analysis was conducted on 32,017 (75.6%) adults providing complete health care utilization data for all modalities. Population-based estimates were calculated using weights calibrated to the 2010 census-based population estimates for age, gender, and ethnicity of the US civilian non-institutionalized population. By using the population weights the full dataset of 34,525 adults represents a total of 234,9 million US adults.

Statistical analysis
Distribution of frequencies of health care use and sociodemographic data within each cluster are presented as relative percentages (%) in Tables 2 and 3 respectively. To identify possible user typologies and their utilization pattern, the cluster analysis approach was used. This method is able to handle complex data structures and designs in big data scenarios. The cluster analysis was performed using binary variables only, i.e. all modalities were coded as "used in the past 12 months" vs. "not used in the past 12 months", independent of the frequency of use. For the ease of interpretation, cluster analyses based on probabilities are preferred over other models [10,11]; thus, a model-based clustering using finite normal mixture modelling was chosen. This method provides functions for parameter estimation via the Expectation Maximization (EM) algorithm for normal mixture models with the possibility of accounting for different covariance structures and the integration of the Bayesian Information Criterion (BIC) for model selection.
The cluster analysis was performed using the package mclust [12] for the statistical software R [13], this package also handles sampling weights if needed. The fitted model was specified as "EII" -spherical, equal volume multivariate mixture, which specifies a spherical distribution   with both volume and shape equal enabling the parameters to be better estimated. These parameters consider the within-group covariance matrix.
To determine the statistical fit of the respective cluster solutions the following indices were independently determined: Bayesian Information Criterion (BIC) [14,15], the Dunn Index [16], Silhouette [17], the Davies-Bouldin index [18], and the C-index [19]. The following order by which the results of the cluster analysis were valuated was used: Silhouette Width, C-index and Dunn-index, BIC and Davies-Bouldin index, these indices are presented at Table 5.
The cluster solutions also contained information on the probabilities with which the person fit into the cluster. They are based on a classification matrix generated by the cluster analysis. This matrix contains the probability that the observation belongs to each cluster. From this classification matrix the uncertainty values were defined and only those subjects who had a 95% probability of belonging to the respective cluster (n = 30,251; 87.6%) were considered for further analysis.
Sociodemographic, health related characteristics and health care access were compared between the respective clusters. The variable classes were based on a similar study from NHIS [20]. The following sociodemographic predictors were considered: age (categories: 18-29; 30-39; 40-49; 50-64, 65 or older), gender (categories: female; male), ethnicity (categories: non-Hispanic White; Hispanic; African American; Asian; Other), US region (categories: West; Northeast; Midwest; South), marital status (categories: not in relationship; in relationship), education (categories: less than college; some college or more), employment (categories: employed, not employed) and annual household income (categories: less than $20,000; $20,000 to $34,999; $35,000-$64,999; $65,000 or more). Additionally, health related factors such as general health status (categories: excellent, very good, good, fair or poor), medical conditions/diseases (no chronic condition, one chronic condition, two chronic conditions, three or more chronic conditions), BMI (categories: < 18.5; 18.5-25; 25.5-30; 30.5 or more), health behaviors such as smoking (categories: nonsmoker, smoker), alcohol consumption (categories: alcohol abstainer; light drinker; regular or heavy drinker), and exercise behavior (categories: low level exerciser, moderate level exerciser, high level exerciser); health insurance coverage (categories: no insurance, public health insurance, private health insurance) and the affordability of prescription medication, mental care, dental care, eyeglasses and specialists (categories: could afford, could not afford) were also used as potential predictors.
Backward stepwise regression analyses employing a likelihood-ratio-statistic were conducted for each cluster to determine predictors for belonging in that cluster, and adjusted odds ratios with 95% confidence intervals were calculated. All potential predictors were included in the logistic regression analyses, and a sample size adjusted weight was used considering for design effects. Statistical significance was set at p ≤ 0.05.
The regression models and distributions analyses were performed with Statistical Package for Social Sciences software (IBM SPSS Statistics for Windows, release 22.0. Armonk, NY: IBM Corp.).

Results
A total of 32,017 of 34,525 subjects (92.7%) provided full data on health care use and therefore were selected for the cluster analysis, 30,251 (87.6%) were considered for the estimation of sample representativeness. Based on the population weight the sample of 30,251 sample represents a total of 205.2 million US adults.
Based on the individual health care utilization patterns, the optimal cluster solution as per Silhouette and the C-index was identified as having 9 clusters (Table 5), with cluster size ranging from 2067 to 5117 observations. The model fit indices were 0.21 for Silhouette, and 0.24 for C-index, indicating sufficient statistical fit.
The prevalence of health care utilization within each cluster are shown in Table 2. Sociodemographic, healthand health-insurance related associations for belonging in either cluster can be found in Tables 3 and 4. A graphic visualization from health care utilization patters within each cluster is presented in Fig. 1. The following clusters of health care user types have been identified, and analyzed with regards to their characteristics. The respective frequencies are displayed at Table 3.
Cluster 2 -"High users of vitamins and mineral supplementation": Mid-aged (50-64 years old; 26.3%), female adults (56.7%) living in the Southern region of the country (35.3%) with one chronic condition (57.9%), and/or private health insurance (68.5%) more likely show average health care utilization in general. Relative to the general population they have lower consultation rates with GPs and specialists, and higher utilization of vitamins and mineral supplementation (Table 2). Ethnicity,    (Table 4). Cluster 3 -"Underutilization of health care, GP, specialists, ER and dentists": Single (55%), male (56.1%) adults living in the South (41%) with high school but not college education (52.8%), small income (< $20,000; 36.3%), no chronic conditions (74.1%), and/or average health status (28%), but no health insurance (43.1%) more likely belonged to Cluster 3. Compared to Cluster 1 and 2, adults of Asian ethnicity also more likely belonged to this cluster (6.5%). Adults in this cluster underutilize conventional health care, including visits to GPs, specialists, ER and dentists relative to the general population. In contrast, health care interventions and behavior not associated with visits to a health professional are used on an average level ( Table 2). This is the only cluster where age was not a relevant variable for the statistical model, together with BMI and Smoking and Drinking habits (Table 4).
Cluster 6 -"Underutilization of CAM and self-care": Young or mid-aged (41.5%), male (54.4%), and/or non-White individuals (36.1%), not living in the West (81.5%), having at least high-school education (50.3%), with good to very good health (65.5%) despite one chronic condition (58.4%) more likely show an over-utilization of GPs and dentists, average utilization of other conventional and allied health care, and an under-utilization of CAM and self-care (Table 2). Region, Health Status and Smoking habits were not relevant in the model (Table 4).
Cluster 7 -"Underutilization of every health care but dental care": Very young (35.9%), male (58.8%), and/or non-White individuals (39.8%), not living in the West (78.6%), with no chronic condition (78%) more likely show under-utilization of every health care modality with the exception of dental care (Table 2). Education, Income, Marital and Health Status, BMI, Health Insurance and Smoking habits were not relevant (Table 4).
Cluster 8 -"Underutilization of every health care modality": Very young (29.8%), female adults (52.5%), living in the West (27.5%), with higher education (49.1%), very good health (77.5%) and/or no chronic conditions (77.5%) more likely show under-utilization of every health care modality, and over-utilization of dental care and vitamin/mineral supplementation (Table 2). Ethnicity, Income, Health Insurance and Alcohol Consumptions were not included in the final model (Table 4).
Cluster 9 -"Overutilization of GPs, ER and herbal medicines": Young to mid-aged (34.3%), male individuals (54.4%) not living in the West (82.4%), with no highschool or college education (71.4%), and/or with very low income (39.6%), poor health status (26.8%) with at least one chronic condition (25.2%), and/or public health insurance (33.9%) more likely show over-utilization of GPs, emergency rooms, and herbal medicines, and under-utilization of all other health care modalities ( Table 2). For Cluster 9 only Ethnicity was not considered in the logistic model (Table 4).

Summary of results
The cluster analysis identified several types of health care user based on their health care utilization patterns. However, no single optimal cluster solution could be consistently favored by all indices. The preferred cluster solution identified 9 types of health care users, who showed significant differences in the utilization patterns with substantial rates of over-and underutilization of certain health care modalities. The regression analysis further found sociodemographic, health and health insurance related factors predictive of being a member of a respective cluster. The presence or nonpresence of multiple chronic conditions was the only variable identified in the logistic model to be significant as a predictor for all 9 clusters.

Model fit and indices
The cluster analysis was performed with different numbers of clusters until the optimal final 9-cluster solution was identified. Because the dataset is relatively big, both regarding observations and variables, and the used method (mclust) iterates many parameters and run different estimations, the run time per adjusted model was considerably high (several days considering all models).
The final solution was achieved after evaluating for both statistical model properties -the cluster validation measures in Table 5, and for the theoretical interpretation of the similarity of characteristics. There was unfortunately not a single solution where all five statistical indices were optimal.
In a review of clustering methods, mclust has shown a better overall cluster performance and ability regarding handling different data types [21]. This enhances a good level of certainty regarding the chosen method. The method incorporates the complex sampling design of the NHIS enabling a better model adequacy.

Typology of health care users, and their impact
The cluster analysis identified 9 different types of health care users, who showed substantial differences in utilization of certain health care modalities. Members in the first cluster for example showed a substantial overutilization of practically every health care modality, while those in the fifth cluster were using almost no health care at all. Several sociodemographic and health-related characteristics were predictive of belonging into the respective clusters, including age and gender, education, income, ethnic origin, health care coverage, and health status. Several findings deserve attention. Below we summarized the findings into groups of differences and commonalities between the profiles of health care utilization (i.e. the clusters).

Healthy aging
Members of the first cluster showed overutilization of health care, and that might be explained by the fact that they were more likely to be above 50 years of age and to have multiple chronic conditions. Chronic medical conditions are increasingly prevalent among older adults [22]. However, health status alone cannot fully explain the overutilization of health care modalities. Members in cluster 4, for example, also are more likely to be at the same age range, and to have chronic conditions. However, they are using far less practitioner-based health care modalities as compared to those in cluster 1. Clusters 1 and 4 probably reflect 'more successful' and 'less successful' pathways to aging. They also represent a substantial proportion of the US population, making this finding even more important given the aging of the US population (and in other industrialized nations). It has been known for some time that a small proportion of Americans account for the majority of healthcare expenditures and there are concerted efforts to better manage this utilization pattern [23]. Additionally, lifestyle factors that contributed to healthy aging such as non-smoking and social support [24], physical activity and diet quality [25] should be in focus of the coordination of public health care.
Clusters 2, 6 and 9 are mid-aged, and are likely to suffer from one or two (early stage) chronic conditions; individuals in cluster 2 strongly use self-care, those in 6 and 9 strongly utilize more conventional practitioner-based health care. We could not surely predict where these individuals would end up in higher age, and only taking action in time might prevent them from ending up in the over-user group.

CAM
Cluster 6 shows an underutilization of CAM while an overutilization could not be detected in any cluster. This might indicate that we need to have a closer look at the efficacy and dissemination of CAM in the population. A program that would encourage the use of evidence-based CAM approaches such as the one applied for veterans [26] could be applied.

Gender influence and self-care
We identified gender differences regarding health-and self-care behavior. Female individuals are more likely to use health-care in general (Cluster 1) or specifically vitamins & minerals supplementation (Cluster 2). On the other hand, males often underutilize health-care (Clusters 3 and 7) and self-care (Cluster 5). Research has shown consistently that men tend to neglect self-care [27] and engage less in health-related self-care behaviors [28]. From a public health perspective it is essential to raise awareness for the need of self-care in male populations.
Several factors might limit self-care besides gender: education/health literacy, self-efficacy, access or costs. Barriers need to be identified, and attempts to increase self-care utilization for improving overall health, preventing chronic conditions, and lower the costs associated with health care are needed.

Healthcare coverage
A substantial proportion of participants (43.1% in Cluster 3 and 53.4% in Cluster 5) reported to have no health insurance. Clusters 4 (74.5%), 6 (72.4%) and 8 (72.9%) have the biggest proportion of private health insurance coverage. Other factors associated with significantly higher health-care use or problems when underserving those in need should be identifiable. Health status is reported as very good or excellent for a large proportion of participants in cluster 7 (72.2%) and 8 (77.5%), indicating a probable association with health insurance type.

Preventive, curative or aesthetic intervention
Several health care interventions may not be related to a medical condition, but to preventative or aesthetic needs. Dentist visits for example are often for prevention, or for aesthetic purposes. Men were found to value dental care (Clusters 3 and 7) despite appearing 'less caring' about other health-care interventions. Dental care however is expensive, and not surprisingly, the highest frequency of participants who did not utilize dental care as frequently where those having no health insurance (Clusters 3 and 5).

Implications for research
Cluster analyses have been used before in health research, for example in research of homeless shelter utilization patterns [5]. Several types of homeless people have been identified (e.g. transitionally homeless, episodically homeless, chronically homeless), and those types were associated with different usage patterns.
In this analysis, we used a model-based clustering approach based on finite normal mixture modelling, and the model was evaluated with several indices of cluster fit. This method has shown to be effective in differentiating and clustering individuals in 9 different groups with similar characteristics within and dissimilarities between clusters.
The findings of this cluster analysis may have important implications for health policy. They highlight distinct patterns of health care over-and underutilization associated with age, gender, socio-economical, ethnical and regional differences. By understanding health care utilization, interventional programs and prevention campaigns may be better tailored to specific groups of individuals with specific health care use patterns. Specifically, social inequalities and barriers to health care access can be addressed by tailoring health care to these groups.
The usage of CAM or dental care, where gender differences exist regarding self-care behavior, are areas that require attention. More awareness on the importance of health-care and program development to encourage and enable their utilization are crucial.

Limitations
The findings identified through this analysis must be considered in light of the study limitations. The data were drawn from a cross-sectional survey; as such, the results can only suggest associations for a particular time point. A longitudinal survey would be necessary to document changes in health care utilization patterns over time.
Health care utilization was further queried using binary variables only, i.e. it was only assessed whether participants had used a certain health care intervention or not; and no information on the number of consultations, or the out of pocket expenditure was analyzed. Influence of binary variables on the cluster analyses, and distribution (low prevalence for several interventions) could not be measured regarding its frequency but only regarding its presence. The survey is further based on self-report data and as such there is at risk of recall bias or measurement error.
The decision in favor of the 9-cluster solution was not only based on statistical indices but on theoretical interpretation as well. There were different solutions according to the indices, indicating that there is not one solution, but several possibilities and the selection of cluster specification and its indices was user determined. Unfortunately an optimal single solution, where all 5 indices fit the best, could not be achieved.
Nevertheless, the US National Health Survey is an internationally recognized epidemiological study, and the findings from this study provide useful first insights into the patterns of health care utilization.

Conclusions
In this analysis, we identified 9 types of health-care utilization patterns and their characteristics based on the similarity of their behavior. A model-based clustering approach based on finite normal mixture modelling and cluster fit indices were determined.
The clusters differentiate between health-care user types, ranging from nearly non-use of health care modalities to overutilization pattern of medical, allied and complementary health care. Several sociodemographic and health-related characteristics were predictive of belonging into a respective cluster, including age and gender, health status, education, income, ethnic origin, and health care coverage.
In conclusion, cluster analysis may be useful to identify typical health-care utilization patterns based on empirical data; and those typologies appear to be related to a variety of sociodemographic and health-related characteristics. Those findings can inform future health research and policy.