Skip to main content

Effects of multiple chronic conditions on health care costs: an analysis based on an advanced tree-based regression model



To analyze the impact of multimorbidity (MM) on health care costs taking into account data heterogeneity.


Data come from a multicenter prospective cohort study of 1,050 randomly selected primary care patients aged 65 to 85 years suffering from MM in Germany. MM was defined as co-occurrence of ≥3 conditions from a list of 29 chronic diseases. A conditional inference tree (CTREE) algorithm was used to detect the underlying structure and most influential variables on costs of inpatient care, outpatient care, medications as well as formal and informal nursing care.


Irrespective of the number and combination of co-morbidities, a limited number of factors influential on costs were detected. Parkinson’s disease (PD) and cardiac insufficiency (CI) were the most influential variables for total costs. Compared to patients not suffering from any of the two conditions, PD increases predicted mean total costs 3.5-fold to approximately € 11,000 per 6 months, and CI two-fold to approximately € 6,100. The high total costs of PD are largely due to costs of nursing care. Costs of inpatient care were significantly influenced by cerebral ischemia/chronic stroke, whereas medication costs were associated with COPD, insomnia, PD and Diabetes. Except for costs of nursing care, socio-demographic variables did not significantly influence costs.


Irrespective of any combination and number of co-occurring diseases, PD and CI appear to be most influential on total health care costs in elderly patients with MM, and only a limited number of factors significantly influenced cost.

Trial registration

Current Controlled Trials ISRCTN89818205

Peer Review reports


The concept of multimorbidity (also referred to as multiple chronic conditions) relates to the coexistence of several chronic diseases in an individual [1]. Especially among the aged, multiple chronic conditions are common [2, 3]. While there is no uniform cut-off point for multimorbidity, the coexistence of two or more and, alternatively, three or more chronic conditions are commonly used criteria [4]. In general, prevalence rates of multimorbidity among persons aged 65 and older have been widely reported to exceed 60% [5]. In a German study based on claims data from a large sample, van den Bussche et al. [6] found a prevalence of 62.1% for multimorbidity, defined as three or more conditions, among subjects aged 65 years or older and a mean number of 5.8 chronic conditions among these multimorbid subjects. As a result of demographic change, the prevalence of multimorbidity is expected to substantially increase in Germany and most other industrialized countries in the next decades.

Individuals with multiple chronic conditions consume a disproportionally large share of total health services. In a systematic literature review, Lehnert et al. [7] found ample evidence of a positive association between multimorbidity and health care costs. As a major result, the review reported that costs significantly increase with each additional chronic condition. Particularly physician visits, hospital use, and pharmaceuticals were found to elevate health care costs with each additional chronic condition. Yet the effect of additional chronic conditions on costs may depend on the number, type and combination of comorbidities with almost unlimited numbers of possible disease combinations. This heterogeneity can hardly be taken into account by traditional regression models.

This study aims to analyze the impact of multimorbidity on health care costs in Germany on all sectors of care. Instead of taking specific patterns of single disease combinations into account, our goal is to identify the most relevant diseases within arbitrary morbidity patterns influencing health care costs. Different from other studies, it tries to detect the underlying structure of cost data by using an improved tree-based graphical model. As a main advantage compared to traditional analytical methods, tree-based models allow to represent highly dimensional data in a simple manner and to easily interpret the results. Based on the method of automatic interaction detection (AID) introduced by Morgan and Sonquist [8], especially classification and regression trees (CART) have been widely used in health care research, including the analysis of comorbidity [9]–[11]. Nonetheless, CARTs tend to overfitting and a selection bias of covariates with a maximum number of possible splits. To overcome these weaknesses, advanced splitting algorithms like Chi-square automatic interaction detectors (CHAID) and conditional inference trees (CTREE) have been developed. As a limitation, CHAID requires categorical data and responses, while CTREE can deal with arbitrary scaled variables [12, 13]. To our knowledge, this analysis is the first to use CTREEs for the analysis of cost data of multimorbid patients.


Sample and data

Data were collected within the MultiCare Cohort Study. Details regarding the methods of the study and the cohort have been published elsewhere [14]. The analyses presented here are based on data from the MultiCare baseline assessment. Briefly, the MultiCare Cohort Study is a multicenter, prospective cohort study of multimorbid primary care patients selected randomly from the databases of 158 general practitioners’ (GP) offices at 8 study centers across Germany. The study’s aims are to investigate multimorbidity patterns over time, to identify patients’ resources and risk factors that influence the course of these patterns, and to analyze the somatic, psychological and social consequences of these patterns for the patients’ quality of life and functional status. Inclusion criteria were age between 65 and 85 years, at least one visit to the GP within the last three-month period and multimorbidity, defined as the coexistence of at least 3 chronic conditions from a list of 29 conditions comprising alcohol-related disorders, anemia, anxiety disorders, atherosclerosis/peripheral artery occlusive disease (PAOD), asthma/chronic obstructive pulmonary disease (COPD), cancer, cardiac arrhythmias, cardiac insufficiency, cardiac valve disorders, cerebral ischemia/chronic stroke, chronic ischemic heart disease, depression, Diabetes mellitus, dizziness, intestinal diverticulosis, joint arthrosis, lower limb varicosis, migraine/chronic headache, neuropathies, osteoporosis, Parkinson’s disease, psoriasis, renal insufficiency, rheumatoid arthritis/chronic polyarthritis, severe hearing loss, severe vision reduction, somatoform disorders, thyroid dysfunction and urinary incontinence. Patients were excluded from the study if they were no regular patients of the GP. Other exclusion criteria were inability to participate due to medical reasons (such as blindness and deafness), insufficient German language skills, residence in a nursing home, and inability to provide informed consent or participation in another ongoing study. A diagnosis of dementia was therefore an exclusion criterion.

A total of 24,862 patients from the databases of the participating GP practices were checked for inclusion and exclusion criteria. 7,172 patients fulfilled the criteria and were contacted for informed consent to participate. 3,317 patients agreed to participate and were available for the baseline interview within a time frame of 16 months. In retrospect, 128 of these cases had to be excluded either because, in direct contact, exclusion criteria were found to apply or because the patient died before the baseline interview. Thus, a final number of 3,189 patients were included in the study.

Of these, 1,051 patients (i.e. an approximate third of the cohort) were randomized into a subsample in which a comprehensive assessment of healthcare resource use was conducted in addition to the standard MultiCare assessment battery. Due to a missing value for health insurance, one case was excluded from this subpopulation. Thus, the analyses presented here are based on this subsample of N = 1050. Recruitment and baseline interviews took place between July 2008 and October 2009. The study was approved by the Ethics Committee of the Medical Association of Hamburg.


Multimorbidity was assessed by means of a standardized GP questionnaire which comprised 46 chronic conditions including the 29 conditions used as inclusion criteria. The list was newly compiled at the beginning of the MultiCare study with the aim of representing the most frequent chronic conditions in the population and is based on prevalence data (for details, see [6, 14]). Yet in order to ensure a wide range of diseases and syndromes, those with a prevalence of more than 25% were not used as inclusion criteria for the sample, as an unselected application of the three-condition-criterion would have resulted in an overrepresentation of these very frequent diseases and a small number of disease patterns in the study population [14]. Nevertheless, these highly prevalent conditions are frequently combined with the relatively lower prevalent ones and therefore still part of the sample.

In the 46 conditions, ICD-10 codes are classified together if diseases and syndromes are similar pathophysiologically or if ICD-codes of related disorders are used ambiguously in practice. At the beginning of the baseline interviews, the compilation of the list had not been quite finalized, and for this reason 7 of the conditions were not part of the standardized baseline GP questionnaire, but were assessed by means of open questions. This applies to chronic gastritis, insomnia, allergies, obesity, hypotension, sexual dysfunction and tobacco abuse. The conditions assessed in a standardized fashion in the GP questionnaire at baseline comprise the 29 condition used as inclusion criteria as well as chronic cholecystitis/gallstones, chronic low back pain, haemorrhoids, hypertension, lipid metabolism disorders, liver diseases, noninflammatory gynaecological problems, purine/pyrimidine metabolism disorders/gout, prostatic hyperplasia and urinary tract calculi. Dementia was also listed, but constituted an exclusion criterion at baseline.

Sociodemographic variables

Socio-economic status (education, income) was assessed with an established questionnaire [15]. The level of education was rated according to the international CASMIN classification [16]. Income is reported as net monthly income from all sources of income adjusted for household size (this is net income divided by the equivalized household size, for which a value of 1.0 is assigned to the householder, 0.5 is assigned to every other household member aged 15 or over, and 0.3 is assigned to every child under the age of 15).

Resource use

Resource use was recorded by means of a questionnaire administered as part of the MultiCare assessment battery. The resource use questionnaire was developed by our working group. It is based on versions used in previous investigations (e.g., [17]–[19]) and is available from the authors upon request. The questionnaire covers in patient treatment, out-patient physician treatment, pharmaceuticals, other kinds of out patient treatment (such as physical or occupational therapy), medical supplies and dental prostheses, nursing home care, professional nursing services and other paid help as well as informal care (Table 1). The items for informal care are based on an instrument by Neubauer et al. [20]. Assessment was retrospective and covered a period of 3 months, except for in patient treatment and nursinghome care, for which the period was 6 months. The questionnaire contains lists of common resources and services in order to minimize recall bias.

Table 1 Documented resource use and unit costs applied for calculation of costs

Health care costs

We adopted a societal perspective; therefore all resources and services used were recorded, regardless of whether they were covered by health or nursing insurance or paid for out-of-pocket. The cost categories analyzed in this study are direct costs of illness arising from the use of resources. We did not evaluate indirect costs due to lost productivity because of the advanced age of the subjects. Healthcare costs were calculated for a 6-month period, multiplying resource use by two in sections which covered a 3-month period. Costs were calculated from resource use as recorded in the questionnaire by means of unit costs. Resource categories and sources of unit costs are listed in Table 1. Informal care was valued using the replacement cost approach, i.e. it was assumed that the same amount of care by professional nursing services would have had to be paid for in the absence of an informal caregiver. Accordingly, hours of informal care were valued using the same hourly wage rate as for professional home care (see van den Berg [21] for an overview of methods for the valuation of informal care).

Cost were calculated in € at 2009 price levels. Unit costs that were unavailable at year 2009 values were inflated or deflated to year 2009 price levels by means of the consumer price index [22].

For statistical analysis, we categorized cost data as follows: 1) Costs of inpatient care comprising inpatient treatment in general hospitals, specialized psychiatric and neurological hospitals or rehabilitation clinics; 2) costs of outpatient care comprising outpatient physician treatment, other outpatient treatment, and medical supplies and dental prostheses; 3) costs of medication comprising pharmaceuticals; 4) costs of nursing care comprising nursing home care, professional home care and informal care.

Missing values

Missing values in items of the resource use questionnaire (below 1% for all items) were imputed using the means of the observed data for the respective items (conditional means). Dosage of medication was an exception, however, since medications and their dosage were too varied interindividually for mean imputation to be possible. Therefore costs for medication with missing values for dosage were calculated using a conservative rule, whereby the pharmacy retail price of one package of the drug per 3 months was applied. Missing values in items of the standard MultiCare assessment battery were imputed using the hot deck method, in which missing values are replaced using observed values from a responding unit that is as similar as possible to the non-responding unit [23, 24]. The proportion of missing values in those items which were used in our analysis was below 0.3%, except for income with 12.7% missings.

Statistical analyses

We used a conditional model based on a supervised learning tree-algorithm in order to visualize a hierarchical data partition and to detect the underlying structure and most influential variables on total costs and on costs of the four different cost sectors (inpatient care, outpatient care, medication, nursing care) separately. As covariates we used binary variables for those 41 of the 46 diseases which had a prevalence rate of ≥1% in the sample. In addition, we included a binary variable for obesity (defined as body mass index ≥ 30 kg/m2). A detailed list of the diseases taken into account is provided in the Results section. Additionally, female sex (reference category: male) and private health insurance (reference category: statutory health insurance) were added as dichotomous variables. Education was included as a categorical variable, taking low education as reference. Besides this, age and the logarithmic (log) transformed income were included as continuous covariates. Log-transformation was used to achieve a linear relationship of predictors and outcomes. No additional distributional assumptions concerning the error terms or outcomes were made.

Traditionally, classification and regression trees (CART) attempt to discriminate data into homogenous subsets. Thus, node A is split into two disjunctive subsets A ∩ {x i  ≤ c} and A ∩ {x i  > c} based on a single variable X i  = x i (see [25]). As splitting criteria, the impurity of node or entropy could be used. One main disadvantage of CARTs is their tendency to grow huge trees by selecting splitting variables which lead to a maximum tree size. As an attempt to reduce the tree size, the optimal cut subtree could be detected using minimal cost-complexity pruning based on cross validation (see [26]).

To overcome these disadvantages and as a superior approach to CART, we used in a first step a non-parametric conditional inference tree (CTREE)-algorithm predicated on recursive binary partitioning embedded in the framework of permutation tests introduced by Strasser and Weber [27]. Thus the distribution of the response Y is defined as conditional on a function g of a set of k arbitrarily scaled covariates X as f(Y|g(X 1, …, X k )).

A learning sample L n based on a random selection of n i.i.d. observations was used to fit the tree-structured regression model. A vector of dichotomous case weights w representing each node was used to create the disjunctive subsets w left,i  = w i I(X i A) and w right,i  = w i I(X i A) with I() as indicator function. A discrepancy measure of the form

T j L n , w = vec ( i = 1 n w i I X ij h ( Y i , ( Y 1 , , Y n ) ) T )

was used to establish a two sample statistic for all possible subsets of A with h() as influence function. vec is the vec-operator and ()T the transpose.

At each node a global null hypothesis H 0 : f(Y|X j ) = f(Y) was tested on a pre-specified α = 0.05 level of significance. To incorporate different scaled covariates, a maximization of the test statistic based on the conditional mean and conditional variance over all possible subsets was established. In case of acceptance, the tree-algorithm was interrupted and no further data-split was performed. Otherwise the covariate X j with the strongest influence on Y was selected as node, and the null hypothesis was tested in each subset of the tree. This approach guarantees the optimal sized tree is grown [12, 28]. To visualize the inherent structure, trees were plotted for total costs and each cost sector.

The cost means μ ^ i were predicted with regard to the number of case weights w i  = 1. To evaluate the prediction quality, both the squared error loss and the mean absolute error were calculated.

In addition, in order to evaluate model performance of CTREE, it was compared to traditional CART, which is an alternative tree-based algorithm.

In a second step we used ensemble methods in terms of conditional random forests. n TREE  = 500 random trees were grown to increase the performance of our predictions and to verify our results [25]. As a benefit, especially random forests can deal with large covariate lists and/or complex interaction structures. In contrast to random forests introduced by Breimann [29], which are based on CART, we implemented an unbiased random forest based on CTREE [30]. Based on the unbiased random forest variance, importance scores were calculated indicating the importance of certain variables for determining the response. Basically, variable importance measures the difference in prediction accuracy before and after randomly permuting single covariates.

The analysis was performed using the packages party and rpart in R 2.14.1 [31].


Sociodemographic and morbidity data

The mean age of the sample at baseline was 74.4 years, and 58.6% were female (Table 2). More than half of the participants were married (56.9%), while approximately one quarter were widowed (27.5%). 58.3% were living with their spouse or partner, and 35.2% were living alone in their own home. The proportions of subjects in assisted living (2.0%) or retirement homes (0.3%) were low. The majority of the sample had a low degree of education (61.8%), and mean household-size adjusted monthly income was € 1,440 . Only 4.3% of the participants were privately insured.

Table 2 Characteristics of the sample (N = 1,050)

On average, the participants had 7.0 chronic conditions, with no significant differences between men and women. The ten most prevalent conditions in the overall sample, in descending order, were hypertension (79.4%), lipid metabolism disorders (59.4%), chronic low back pain (51.0%), joint arthrosis (43.4%), Diabetes mellitus (38.2%), chronic ischemic heart disease (32.7%), obesity (31.3%), thyroid dysfunction (31.0%), cardiac arrythmias (28.5%) and osteoporosis (26.4%) (Table 3). However, there were some differences in rank order by gender. For instance, the prevalence of chronic ischemic heart disease was twice as high for men (43.3%) as for women (25.2%), for whom this condition only ranked eleventh. Osteoporosis, by contrast, was much more common among women, for whom this condition ranked eighth, than among men (30.7% vs. 5.8%). Among men thyroid dysfunction and lower limb varicosis were much less common than among women (20.1% vs. 38.8% and 14.3% vs. 29.4%), while the ten most prevalent conditions for men also included prostatic hyperplasia (28.3%, ranking seventh) and purine/pyrimidine metabolism disorders and gout (27.2%, ranking ninth). More details on the prevalence of chronic conditions in the MultiCare cohort have been reported elsewhere (see [32]).

Table 3 Prevalence of chronic conditions and rank order in the sample, overall and by gender


In this section we present the results of the conditional inference trees. At first we report on the analysis of total costs, followed by costs of inpatient care, outpatient care, medication and nursing care. All cost data refer to a 6-month period.

Total costs

Mean total costs in the whole sample amounted to € 3,671 (SD: € 6,996), ranging from € 23 to € 101,600. The identified tree model consisted of 5 nodes defining three homogenous subsets based on two dichotomous disease indicators (Figure 1). The first split was caused by the covariate Parkinson's disease (PD) at a significance level of p < 0.001. Given that PD occurs within the individual multimorbidity pattern, the model predicts mean total costs of € 11,042 (n = 24, SD: € 14,216) with no further split. If PD is not present, a further split is caused by another disease covariate indicating the presence of cardiac insufficiency (p < 0.001). Conditional on the absence of PD, predicted mean costs are € 6,081 (n = 129, SD: € 11,498) if cardiac insufficiency is present, and € 3,127 (n = 897, SD: € 5,535) if not. Thus, regardless of any other variables taken into account or co-existing combinations of diseases, total costs are influenced by the presence or absence of PD and cardiac insufficiency.

Figure 1
figure 1

Conditional independence tree for total costs. PD = Parkinson’s disease; CCI = cardiac insufficiency; mean costs = predicted mean total costs in € in 6-month period.

Costs of inpatient care

Mean costs of inpatient care in the whole sample amounted to € 1,096 (SD: € 4,029), ranging from € 0 to € 92,850. For inpatient care we identified a tree-based model consisting of only 3 nodes (Figure 2): the only split was caused by cerebral ischemia and/or chronic stroke (CI/CS) at a significantly level of p = 0.018. Predicted mean hospital costs are € 2,337 if CI/CS is present (n = 118, SD: € 9,373), and € 939 otherwise (n = 932, SD: € 2,653).

Figure 2
figure 2

Conditional independence tree for inpatient costs. CICS = cerebral ischemia and/or chronic stroke, mean costs = predicted mean inpatient costs in € in 6-month period.

Costs of outpatient care

Mean costs of outpatient care in the whole sample amounted to € 418 (SD: € 846), ranging from € 0 to € 25,120. For outpatient costs no split was detected at the α =0.05 level of significance. When increasing the significance level α to 0.2, a single split was achieved by cardiac insufficiency (p = 0.072).

Costs of medication

Mean costs of medication in the whole sample amounted to € 590 (SD: € 752), ranging from € 0 to € 15,440. With respect to medication costs, nine nodes were identified with four chronic conditions influencing costs (Figure 3). The first split was caused by chronic obstructive pulmonary disease (COPD), which has a highly significant (p < 0.001) impact on the medication costs. Given that COPD is present, a further split is caused by insomnia (p < 0.032). If COPD is present, mean predicted medication costs amount to € 1,623 if insomnia is also present (n = 19, SD: € 3,400) and € 727 if not (n = 228, SD: € 587). On the other hand, if there is no diagnosis of COPD, PD causes a further split (p < 0.001) with predicted mean medication costs of € 1,409 if PD is present (n = 19, SD: € 1,510). If PD is not present, a further split is caused by Diabetes mellitus (p < 0.001), with predicted mean costs of € 614 (n = 297, SD: € 591) if present and € 438 (n = 487, SD: € 487) if not.

Figure 3
figure 3

Conditional independence tree for medication costs. COPD = chronic obstructive pulmonary disease; PD = Parkinson’s disease; INS = insomnia; DM = Diabetes mellitus, mean costs = predicted mean medication costs in € in 6-month period.

Costs of nursing care

Mean costs of formal and informal nursing care in the whole sample amounted to € 1,290 (SD: € 4,815), ranging from € 0 to € 50,040. For costs of nursing care, nine nodes were identified (Figure 4). The first split was caused by PD, which has a highly significant impact on costs of nursing care (p < 0.001). If PD is present, our model predicts mean nursing care costs of € 7,014 (n = 24, SD: € 13,475). Given that PD is not present, a further split is caused by the logarithmic income ≤ € 6.98 (e€6.98≈ € 1,075) or > € 6.98 (p = 0.015). Given that logarithmic income is ≤€ 6.98, the predicted mean nursing care costs are € 1,909 (n = 359, SD: € 5,973). If the income is higher than that, a further split is caused by age, with three age groups being detected: at a first step a split is detected at >83 years (p < 0.001), leading to predicted mean nursing care costs of € 3,910 (n = 35, SD: € 8,278). In case of an age ≤83 years, a further split is caused by an age >76 years (p = 0.006), leading to predicted mean care costs of € 1,082 (n = 204, SD: € 3,418). Alternatively, given an age below 76 years, mean nursing care costs of € 335 (n = 428, SD: € 1,617) are predicted.

Figure 4
figure 4

Conditional independence tree for costs of nursing care. PD = Parkinson’s disease; rd age = rounded age; logincome = natural logarithm of income, mean costs = predicted mean costs of nursing care in € in 6-month period.

Conditional random forests and comparison with CART

Based on a conditional random forest consisting of 500 conditional random trees, variance importance scores were calculated to verify our findings. As a main result, the numerical order of these scores coincide with the structure described above throughout all sectors. Thus, we were able to confirm the accuracy of our analysis using ensemble methods. Independently of our specific sample, we detected the factors most influential on costs irrespective of the number and combination of co-occuring comorbidities.

For comparison with CTREE, classification and regression trees were computed for all cost sectors. As expected, CARTs lead to bigger grown trees. In particular, in the case of nursing care 19 nodes emerged, resulting in 10 subsets including as splitting criteria PD, logarithmic income, Diabetes mellitus, cerebral ischemia/chronic stroke, age, liver diseases and cancers. At the same time, similar to the CTREE algorithm, the first split was caused by PD. With respect to total cost, 11 nodes resulted using CART, including PD, cardiac insufficiency, cerebral ischemia/chronic stroke, osteoporosis and renal insufficiency. Thus, CART included three additional diseases compared to CTREE. In case of medication costs, 7 nodes were identified by CART consisting of PD, asthma/chronic obstructive pulmonary disease and logarithmic income. Furthermore, opposite to CTREE, CART resulted in 5 nodes for outpatient care. These include logarithmic income and osteoporosis as splitting variables. Finally, 9 nodes emerged for costs of inpatient care, including cerebral ischemia/chronic stroke, neuropathies, urinary incontinence and cardiac arrhythmia as split variables. Compared to CTREE, only cerebral ischemia/chronic stroke was used as nodes in both tree algorithms.

In order to control for overfitting and finding the optimal CART based tree size, minimal cost-complexity pruning was applied. This method was not able to find any suitable tree in any cost sector.

Mean absolute errors of CTREEs and the “not pruned” CARTs are reported in Table 4. For costs of outpatient care and costs of nursing care, CTREE achieved a lower mean absolute error value compared to CART. However, with respect to total costs, costs of medication and costs of inpatient care CART lead to smaller error terms. All splitting variables of CTREE were used as splitting variable of the trees grown by CART.

Table 4 Comparison of absolute mean error of CTREE and CART algorithms for different cost sectors


The partitioning conditional tree algorithm allows to detect the underlying structure of how certain diseases within arbitrary multimorbidity patterns influence the costs of health care. Using this statistical approach, we found various variables which are associated with total costs, inpatient costs, medication costs and nursing care costs in multimorbid elderly. These results were verified using ensemble methods.

With respect to total costs and independent from the other co-existing comorbidities, PD and cardiac insufficiency were identified as the most influencial variables, with PD being the more important one. Compared to patients not suffering from any of the two conditions, PD increases predicted mean total costs 3.5-fold to approximately € 11,000 per 6 months, and cardiac insufficiency 2-fold to approximately € 6,100.

The high total costs of PD are largely due to costs of nursing care, for which the respective partitioning tree model predicted more than € 7,000 on average in this patient group. When excluding nursing care from total costs, PD disappeared in the tree for total costs, while the split from cardiac insufficiency remained significant (p = 0.004), predicting mean total costs of € 3,790 (n = 132) if cardiac insufficiency is present and € 2,177 (n = 918) otherwise (tree not shown). The same reduced tree structure resulted when only excluding costs of informal nursing care from total cost, predicting mean total costs of € 4,052 if cardiac insufficiency is present and € 2,260 otherwise (p = 0.001; tree not shown), and reflecting that high nursing costs of PD are largely due to informal care.

If PD is not present, mean nursing care costs are influenced by income and age, with low income being associated with higher costs and, in those with higher income, age being associated with higher costs. Taking comparatively more affluent patients aged ≤76 years not suffering from PD as the reference group, patients with similar income in the age groups 77–83 and >83 cause more than 3-fold and almost 11-fold mean nursing costs, respectively, if PD is not present. If PD is present, mean nursing costs are elevated almost 21-fold compared to the same reference group, irrespective of age and income. In patients with comparatively low income without PD, mean nursing costs are increased almost 5-fold compared to the reference group irrespective of age.

PD was also found to increase medication costs. Yet concerning medication costs, the coexistence of COPD and insomnia was identified as being associated with the highest mean medication costs. Besides these conditions, Diabetes mellitus significantly increases medication costs if COPD and PD are not present. Compared to patients in whom neither Diabetes nor PD or COPD are present, Diabetes (without PD) increases mean medication costs by 40%, PD by 222%, and COPD by 66% or even 271% if insomnia is also present.

While the partitioning tree algorithm identified no variable significantly associated with outpatient costs, cerebral ischemia and/or chronic stroke (CI/CS) was found to increase inpatient costs 3.5-fold, with no other variables being significant in the model.

Except for costs of nursing care, socio-demographic variables did not significantly influence costs of care.

Strengths and limitations

In general, one main advantage concerning partitioning tree algorithms compared to traditional analytical methods is to be seen in the simplified representation of high dimensional data and its direct interpretability. Due to the chosen 0/1 recursive partitioning framework, the lack of smoothness - a common disadvantage of tree based modelling - could be neglected.

Compared to classical parametric regression techniques, tree-based decision models avoid any distributional assumption. Therefore, the estimation of the coefficients is not affected by misspecification. At the same time, trees aim to discriminate disjunctive homogeneous subsets by minimizing within-variance and maximizing between-variance.

As a main disadvantage, CART or related decision tree algorithms like C4.5 face high variance caused by the inherent binary partitioning method, leading to a propagation of the error effect of the first split. Besides this, due to their focus on the maximization of the information criteria, the problem of overfitting and a selection bias of covariates with a maximum number of possible splits as a result of the numerical optimization arises.

Instead of using traditional classification and regression trees (CART) or related tree algorithms like ID3 or C4.5, we applied a different approach embedded in the context of statistical inferential theory (see [29, 33]). We used a conditional inference tree (CTREE) based on multiple permutation tests which combines tree based regression and statistical theory of conditional inference. Opposite to CART or C4.5, CTREE controls for selection bias using splits based on statistical inference and significance values. Permutation tests are implemented to guarantee a solid stopping criteria. Thus, our model overcomes typical problems of classical tree algorithms.

To verify our results and detected splitting variables, variable importance scores were calculated based on conditional random forests for each cost sector.

When comparing CTREE results to CART, in all cost sectors CART lead to more splits on the one hand, while pruning lead to no suitable trees. On the other hand, CART verified the findings of CTREE by showing identical nodes and/or grown tree structures. Furthermore, CTREE lead to theoretically reasonable splitting variables. Based on these findings as well as on calculated error terms and results achieved from the conditional random forests, our study emphasizes the superiority of the CTREE algorithm.

Our statistical analysis was based on a pre-imputed master data set provided by the data management of the MultiCare study group which had used the hot deck method and conditional means for imputation of missing values. Although we are aware of benefits resulting from multiple imputation algorithms, we agreed on using the master data set for the sake of consistency and because the proportion of missing values in the variables used for our analysis was very small. Nevertheless, tree-based algorithms can handle complete data as well as missing data usually assuming Missing Completely At Random (MCAR).

Patients with response-limitations due to medical reasons (blindness, deafness, dementia, etc.) as well as nursing home residents were excluded from the study sample. Therefore the impact of respective chronic conditions on health care costs could not be analyzed. Yet conditions associated with response-difficulties may strongly influence health care costs. For example, dementia is a very important and prevalent condition in the elderly associated with high health care costs. Dementia is often present in late stages of different diseases, such as PD, CI, chronic stroke and others. Future studies analyzing the impact of multimorbidity on health care costs should therefore consider surrogate responders for data collection in such response-limiting conditions.


In elderly patients suffering from multiple chronic conditions, PD and cardiac insufficiency appear to be the chronic diseases most influential on total health care costs irrespective of the number and combination of other co-existing chronic conditions. Irrespective of any combination and number of co-occurring diseases, costs are significantly influenced by only a limited number of factors.

Author’s information

* Principal investigators

Hendrik van den Bussche and Martin Scherer



Automatic interaction detection


Classification and regression trees


Cardiac insufficiency


Chi-square automatic interaction detectors


Chronic obstructive pulmonary disease


Conditional inference trees


General practitioner


Peripheral artery occlusive disease


Parkinson's disease.


  1. Valderas JM, Starfield B, Sibbald B, Salisbury C, Roland M: Defining comorbidity: implications for understanding health and health services. Ann Fam Med. 2009, 7: 357-363. 10.1370/afm.983.

    Article  PubMed  PubMed Central  Google Scholar 

  2. van den Akker M, Buntinx F, Metsemakers JFM, Roos S, Knottnerus JA: Multimorbidity in general practice: Prevalence, incidence, and determinants of co-occurring chronic and recurrent diseases. J Clin Epidemiol. 1998, 51: 367-375. 10.1016/S0895-4356(97)00306-5.

    Article  CAS  PubMed  Google Scholar 

  3. Fortin M, Bravo G, Hudon C, Vanasse A, Lapointe L: Prevalence of multimorbidity among adults seen in family practice. Ann Fam Med. 2005, 3: 223-228. 10.1370/afm.272.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Marengoni A, Winblad B, Karp A, Fratiglioni L: Prevalence of chronic diseases and multimorbidity among the elderly population in Sweden. Am J Public Health. 2008, 98: 1198-1200. 10.2105/AJPH.2007.121137.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Anderson G: Chronic care: Making the case for ongoing care.

  6. van den Bussche H, Koller D, Kolonko T, Hansen H, Wegscheider K, Glaeske G, von Leitner E-C, Schaefer I, Schoen G: Which chronic diseases and disease combinations are specific to multimorbidity in the elderly? Results of a claims data based cross-sectional study in Germany. BMC Public Health. 2011, 11: 101-10.1186/1471-2458-11-101.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Lehnert T, Heider D, Leicht H, Heinrich S, Corrieri S, Luppa M, Riedel-Heller S, König HH: Review: health care utilization and costs of elderly persons with multiple chronic conditions. Med Care Res Rev. 2011, 68: 387-420. 10.1177/1077558711399580.

    Article  PubMed  Google Scholar 

  8. Morgan JN, Sunquist JA: Problems in the analysis of survey data and a proposal. J Am Stat Assoc. 1963, 58: 415-434. 10.1080/01621459.1963.10500855.

    Article  Google Scholar 

  9. Rauner MS, Harper P, Shahani A, Schwarz B: Economic impact of occupational accidents: resource allocation for AUVA'S prevention programs. Safety Science Monitor. 2005, 3: 9.

    Google Scholar 

  10. So ES, Chin YR, Lee IS: Relationsship between health-related behavioral and psychological factors and cardiovascular and cerebrovascular diseases comorbidity among korean adults with diabetes. Asian Nurs Res. 2011, 5: 204-209. 10.1016/j.anr.2011.11.002.

    Article  Google Scholar 

  11. Spivack SD, Shinozaki T, Albertini JJ, Deane R: Preoperative prediction of postoperative respiratory outcome, Coronary artery bypass grafting. Chest. 1996, 109: 1222-1230. 10.1378/chest.109.5.1222.

    Article  CAS  PubMed  Google Scholar 

  12. Hothorn T, Hornik K, Zeileis A: Unbiased recursive partitioning: A conditional inference framework. J Comput Graph Stat. 2006, 15: 651-674. 10.1198/106186006X133933.

    Article  Google Scholar 

  13. Kass GV: An exploratory technique for investigating large quantities of categorical data. Appl Stat. 1980, 29: 119-127. 10.2307/2986296.

    Article  Google Scholar 

  14. Schafer I, Hansen H, Schon G, Maier W, Hofels S, Altiner A, Fuchs A, Gerlach FM, Petersen JJ, Gensichen J, et al: The German MultiCare-study: Patterns of multimorbidity in primary health care - protocol of a prospective cohort study. BMC Health Serv Res. 2009, 9: 9-10.1186/1472-6963-9-9.

    Article  Google Scholar 

  15. Jöckel KHB B, Bellach BM, Bloomfield K, Hoffmeyer-Zlotnik J, Winkler J, Wolf C: Empfehlungen der Arbeitsgruppe "Epidemiologische Methoden" in der Deutschen Arbeitsgemeinschaft Epidemiologie der GMDS und der DGSMP zur Messung und Quantifizierung soziodemographischer Merkmale in epidemiologischen Studien. Messung soziodemographischer Merkmale in der Epidemiologie. Edited by: Ahrens WBBM, Jöckel KH. 1998, München: Urban & Vogel

    Google Scholar 

  16. Brauns HSS: Educational Reform in France, West-Germany and the United Kingdom: Updating the CASMIN Educational Classification. ZUMA-Nachrichten. 1999, 44: 7-44.

    Google Scholar 

  17. König HH, Born A, Heider D, Matschinger H, Heinrich S, Riedel-Heller SG, Surall D, Angermeyer MC, Roick C: Cost-effectiveness of a primary care model for anxiety disorders. Br J Psychiatry. 2009, 195: 308-317. 10.1192/bjp.bp.108.058032.

    Article  PubMed  Google Scholar 

  18. Heinrich S, Luppa M, Matschinger H, Angermeyer MC, Riedel-Heller SG, Konig HH: Service utilization and health-care costs in the advanced elderly. Value Health. 2008, 11: 611-620. 10.1111/j.1524-4733.2007.00285.x.

    Article  PubMed  Google Scholar 

  19. Leicht H, Heinrich S, Heider D, Bachmann C, Bickel H, van den Bussche H, Fuchs A, Luppa M, Maier W, Moesch E, et al: Net costs of dementia by disease stage. Acta Psychiatr Scand. 2011, 124: 384-395. 10.1111/j.1600-0447.2011.01741.x.

    Article  CAS  PubMed  Google Scholar 

  20. Neubauer S, Holle R, Menn P, Grassel E: A valid instrument for measuring informal care time for people with dementia. Int J Geriatr Psychiatry. 2009, 24: 275-282. 10.1002/gps.2103.

    Article  PubMed  Google Scholar 

  21. van den Berg B, Brouwer WBF, Koopmanschap MA: Economic valuation of informal care. An overview of methods and applications. Eur J Health Econ. 2004, 5: 36-45. 10.1007/s10198-003-0189-y.

    Article  PubMed  Google Scholar 

  22. Bundesamt S: Verbraucherpreisindizes für Deutschland. 2010, Wiesbaden: Statistisches Bundesamt

    Google Scholar 

  23. Andridge RR, Little RJA: A Review of Hot Deck Imputation for Survey Non-response. Int Stat Rev. 2010, 78: 40-64. 10.1111/j.1751-5823.2010.00103.x.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Schäfer I, Hansen H, Schön G, Höfels S, Altiner A, Dahlhaus A, Gensichen J, Riedel-Heller S, Weyerer S, Blank WA, et al: The influence of age, gender and socio-economic status on multimorbidity patterns in primary care, First results from the multicare cohort study. BMC Health Serv Res. 2012, 12: 89-10.1186/1472-6963-12-89.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Tutz G: Regression for Categorical Data. 2012, Cambridge: Cambridge

    Google Scholar 

  26. Bradford JP, Kunz C, Kohavi R, Brunk C, Brodley CE: Pruning Decision Trees with Misclassification Costs. ECE Technical Reports. 1998, 51:  .

  27. Strasser H, Weber C: On the Asymptotic Theory of Permutation Statistics. Mathematical Methods of Statistics. 1998, 8: 220-250.

    Google Scholar 

  28. Hothorn T, Hornik K, van de Wiel MA, Zeileis A: A Lego System for Conditional Inference. Am Stat. 2006, 60: 257-263. 10.1198/000313006X118430.

    Article  Google Scholar 

  29. Breiman L: Random Forests. Mach Learn. 2001, 45: 5-32. 10.1023/A:1010933404324.

    Article  Google Scholar 

  30. Strobl C, Boulesteix AL, Kneib T, Augustin T, Zeileis A: Conditional variable importance for random forests. BMC Bioinformatics. 2008, 9: 307-10.1186/1471-2105-9-307.

    Article  PubMed  PubMed Central  Google Scholar 

  31. R Development Core Team: R: A language and environment for statistical computing. 2009, Vienna:Austria: R Foundation for Statistical Computing

    Google Scholar 

  32. van den Bussche H, Schäfer I, Wiese B, Dahlhaus A, Fuchs A, Gensichen J, Höfels S, Hansen H, Leicht H, Koller D, et al: A comparative study demonstrated that prevalence figures on multimorbidity require cautious interpretation when drawn from a single database. J Clin Epidemiol. 2013, 66: 209-217. 10.1016/j.jclinepi.2012.07.019.

    Article  PubMed  Google Scholar 

  33. Quinlan R: Induction of decision trees. Mach Learn. 1992, 1: 81-106.

    Google Scholar 

  34. Krankenhausgesellschaft D: Bestandsaufnahme zur Krankenhausplanung und Investitionsfinanzierung in den Bundesländern. 2009, Berlin: Deutsche Krankenhausgesellschaft

    Google Scholar 

  35. Federal Statistical Office: Grunddaten der Krankenhäuser 2008. 2009, Wiesbaden: Statistisches Bundesamt

    Google Scholar 

  36. Federal Statistical Office: Kostennachweis der Krankenhäuser 2008. 2009, Wiesbaden: Statistisches Bundesamt

    Google Scholar 

  37. Krauth C, Hessel F, Hansmeier T, Wasem J, Seitz R, Schweikert B: Empirical standard costs for health economic evaluation in Germany - a proposal by the working group methods in health economic evaluation. Gesundheitswesen. 2005, 67: 736-746. 10.1055/s-2005-858698.

    Article  CAS  PubMed  Google Scholar 

  38. Verband der Ersatzkassen: Vergütungslisten für logopädische/sprachtherapeutische Leistungen. 2001, Berlin: Verband der Ersatzkassen (vdek)

    Google Scholar 

  39. Verband der Ersatzkassen: Vergütungsliste für ergotherapeutische Leistungen. 2002, Berlin: Verband der Ersatzkassen (vdek)

    Google Scholar 

  40. Verband der Ersatzkassen : Vergütungsliste für podologische Leistungen. 2007, Berlin: Verband der Ersatzkassen v(dek)

    Google Scholar 

  41. GKV-Spitzenverband: Festbeträge: GKV-Spitzenverband: Festbeträge.

  42. Bundesvereinigung K: Abrechnungshilfe für Festzuschüsse. 2009, Köln: Kassenzahnärztliche Bundesvereinigung

    Google Scholar 

  43. Rote Liste Service GmbH: Rote Liste 2008. Arzneimittelverzeichnis für Deutschland. 2008, Frankfurt/Main: Rote Liste Service GmbH

    Google Scholar 

  44. Federal Statistical Office: Pflegestatistik 2007. 2008, Wiesbaden: Statistisches Bundesamt

    Google Scholar 

  45. Federal Statistical Office: Verdienste und Arbeitskosten. 2010, Wiesbaden: Statistisches Bundesamt

    Google Scholar 

  46. Federal Statistical Office: Verdienste in Deutschland und Arbeitskosten im EU-Vergleich. Press release no. 179. 2008,

    Google Scholar 

Pre-publication history

Download references


The study was funded by the German Federal Ministry of Education and Research (grant numbers 01ET0725-31 and 01ET1006A-K).

This article is on behalf of the MultiCare Cohort Study Group, which consists of Attila Altiner, Horst Bickel, Wolfgang Blank, Monika Bullinger, Hendrik van den Bussche, Anne Dahlhaus, Lena Ehreke, Michael Freitag, Angela Fuchs, Jochen Gensichen, Ferdinand Gerlach, Heike Hansen, Sven Heinrich, Susanne Höfels, Olaf von dem Knesebeck, Hans-Helmut König, Norbert Krause, Hanna Leicht, Melanie Luppa, Wolfgang Maier, Manfred Mayer, Christine Mellert, Anna Nützel, Thomas Paschke, Juliana Petersen, Jana Prokein, Steffi Riedel-Heller, Heinz-Peter Romberg, Ingmar Schäfer, Martin Scherer, Gerhard Schön, Susanne Steinmann, Sven Schulz, Karl Wegscheider, Klaus Weckbecker, Jochen Werle, Siegfried Weyerer, Birgitt Wiese, and Margrit Zieger.

We are grateful to the general practitioners in Bonn, Dusseldorf, Frankfurt/Main, Hamburg, Jena, Leipzig, Mannheim and Munich who supplied the clinical information on their patients, namely Theodor Alfen, Martina Amm, Katrin Ascher, Philipp Ascher, Heinz-Michael Assmann, Hubertus Axthelm, Leonhard Badmann, Horst Bauer, Veit-Harold Bauer, Sylvia Baumbach, Brigitte Behrend-Berdin, Rainer Bents, Werner Besier, Liv Betge, Arno Bewig, Hannes Blankenfeld, Harald Bohnau, Claudia Böhnke, Ulrike Börgerding, Gundula Bormann, Martin Braun, Inge Bürfent, Klaus Busch, Jürgen Claus, Peter Dick, Heide Dickenbrok, Wolfgang Dörr, Nadejda Dörrler-Naidenoff, Ralf Dumjahn, Norbert Eckhardt, Richard Ellersdorfer, Doris Fischer-Radizi, Martin Fleckenstein, Anna Frangoulis, Daniela Freise, Denise Fricke, Nicola Fritz, Sabine Füllgraf-Horst, Angelika Gabriel-Müller, Rainer Gareis, Benno Gelshorn, Maria Göbel-Schlatholt, Manuela Godorr, Jutta Goertz, Cornelia Gold, Stefanie Grabs, Hartmut Grella, Peter Gülle, Elisabeth Gummersbach, Heinz Gürster, Eva Hager, Wolfgang-Christoph Hager, Henning Harder, Matthias Harms, Dagmar Harnisch, Marie-Luise von der Heide, Katharina Hein, Ludger Helm, Silvia Helm, Udo Hilsmann, Claus W. Hinrichs, Bernhard Hoff, Karl-Friedrich Holtz, Wolf-Dietrich Honig, Christian Hottas, Helmut Ilstadt, Detmar Jobst, Gunter Kässner, Volker Kielstein, Gabriele Kirsch, Thomas Kochems, Martina Koch-Preißer, Andreas Koeppel, Almut Körner, Gabriele Krause, Jens Krautheim, Nicolas Kreff, Daniela Kreuzer, Franz Kreuzer, Judith Künstler, Christiane Kunz, Doris Kurzeja-Hüsch, Felizitas Leitner, Holger Liebermann, Ina Lipp, Thomas Lipp, Bernd Löbbert, Guido Marx, Stefan Maydl, Manfred Mayer, Stefan-Wolfgang Meier, Jürgen Meissner, Anne Meister, Ruth Möhrke, Christian Mörchen, Andrea Moritz, Ute Mühlmann, Gabi Müller, Sabine Müller, Karl-Christian Münter, Helga Nowak, Erwin Ottahal, Christina Panzer, Thomas Paschke, Helmut Perleberg, Eberhard Prechtel, Hubertus Protz, Sandra Quantz, Eva-Maria Rappen-Cremer, Thomas Reckers, Elke Reichert, Birgitt Richter-Polynice, Franz Roegele, Heinz-Peter Romberg, Anette Rommel, Michael Rothe, Uwe Rumbach, Michael Schilp, Franz Schlensog, Ina Schmalbruch, Angela Schmid, Holger Schmidt, Lothar Schmittdiel, Matthias Schneider, Ulrich Schott, Gerhard Schulze, Heribert Schützendorf, Harald Siegmund, Gerd Specht, Karsten Sperling, Meingard Staude, Hans-Günter Stieglitz, Martin Strickfaden, Hans-Christian Taut, Johann Thaller, Uwe Thürmer, Ljudmila Titova, Michael Traub, Martin Tschoke, Maya Tügel, Christian Uhle, Kristina Vogel, Florian Vorderwülbecke, Hella Voß, Christoph Weber, Klaus Weckbecker, Sebastian Weichert, Sabine Weidnitzer, Brigitte Weingärtner, Karl-Michael Werner, Hartmut Wetzel, Edgar Widmann, Alexander Winkler, Otto-Peter Witt, Martin Wolfrum, Rudolf Wolter, Armin Wunder, and Steffi Wünsch.

We also thank Corinna Contenius, Cornelia Eichhorn, Sarah Floehr, Vera Kleppel, Heidi Kubieziel, Rebekka Maier, Natascha Malukow, Karola Mergenthal, Christine Müller, Sandra Müller, Michaela Schwarzbach, Wibke Selbig, Astrid Steen, Miriam Steigerwald, and Meike Thiele for data collection as well as Ulrike Barth, Elena Hoffmann, Friederike Isensee, Leyla Kalaz, Heidi Kubieziel, Helga Mayer, Karine Mnatsakanyan, Michael Paulitsch, Merima Ramic, Sandra Rauck, Nico Schneider, Jakob Schroeber, Susann Schumann, and Daniel Steigerwald for data entry.

Author information

Authors and Affiliations



Corresponding author

Correspondence to Hans-Helmut König.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

HHK, ME and HL conceived and designed the analysis. BW and GS prepared the data for analysis. ME analyzed the data. HHK drafted the manuscript. HB, AF, JG, HHK, HL, WM, KM, SRH, IS, SW, HvdB and MS participated in study design and implementation. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

König, HH., Leicht, H., Bickel, H. et al. Effects of multiple chronic conditions on health care costs: an analysis based on an advanced tree-based regression model. BMC Health Serv Res 13, 219 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: