Skip to main content

Decomposition of outpatient health care spending by disease - a novel approach using insurance claims data



Decomposing health care spending by disease, type of care, age, and sex can lead to a better understanding of the drivers of health care spending. But the lack of diagnostic coding in outpatient care often precludes a decomposition by disease. Yet, health insurance claims data hold a variety of diagnostic clues that may be used to identify diseases.


In this study, we decompose total outpatient care spending in Switzerland by age, sex, service type, and 42 exhaustive and mutually exclusive diseases according to the Global Burden of Disease classification. Using data of a large health insurance provider, we identify diseases based on diagnostic clues. These clues include type of medication, inpatient treatment, physician specialization, and disease specific outpatient treatments and examinations. We determine disease-specific spending by direct (clues-based) and indirect (regression-based) spending assignment.


Our results suggest a high precision of disease identification for many diseases. Overall, 81% of outpatient spending can be assigned to diseases, mostly based on indirect assignment using regression. Outpatient spending is highest for musculoskeletal disorders (19.2%), followed by mental and substance use disorders (12.0%), sense organ diseases (8.7%) and cardiovascular diseases (8.6%). Neoplasms account for 7.3% of outpatient spending.


Our study shows the potential of health insurance claims data in identifying diseases when no diagnostic coding is available. These disease-specific spending estimates may inform Swiss health policies in cost containment and priority setting.

Peer Review reports


Health care spending in rich countries like Switzerland is high and rising. Despite fierce debates on how to control health care spending, little is known about which diseases drive spending. Recent research has shown the potential of tracking disease-specific spending to explain changes in health care spending over time [1,2,3,4,5,6,7,8,9,10]. For Switzerland, the evidence is limited to a decomposition of total health care spending by 21 major diseases by Wieser et al. [11]. Their study also highlights the difficulties in identifying diseases in outpatient care, which is responsible for more than 50% of total health care spending [12].

The main objective of this paper is to decompose the spending for 12 outpatient services and drugs by 42 diseases or disease groups in 2017 in Switzerland. The second objective is to investigate the differences in disease-specific spending by age, sex, and type of health care service. The contributions are twofold: First, we use a multitude of diagnostic clues in insurance claims data to simultaneously identify a broad set of diseases. Previous research used Swiss administrative data and electronic health records to identify single diseases (e.g., multiple sclerosis [13], asthma/chronic obstructive pulmonary disease (COPD) [14] or diabetes mellitus [15]) or multiple diseases [16]. However, these studies covered only a limited number of diseases and used the type of drug as the only diagnostic clue. We also provide disease-specific spending estimates for Switzerland on a more granular level than in Wieser et al. [11]. Second, we apply a novel two-step decomposition method to assign individual spending to diseases. In the first step, we directly assign individual cost items to specific diseases if they are employed exclusively in the treatment of those diseases (e.g., anti-diabetic drugs for diabetes). In the second step, we use regression-based methods to distribute the remaining part of individual spending. Our comprehensive decomposition approach ensures that we allocate spending to one disease only. This is an advantage over single cost-of-illness studies that tend to over-estimate spending for the investigated disease [17].

By estimating spending by disease, age, sex, and service type, we provide the basis for a systematic and detailed health care spending monitoring.

The Swiss health care system

The Swiss health care system offers timely access to a broad range of services. This comes at high costs, when compared to other high-income countries [18]. In 2018, total per capita health care spending according to National Health Accounts (NHA) was at 9420 Swiss Francs (CHF)Footnote 1 [12], corresponding to 11.2% of GDP, the second highest share in the world after the United States [19]. Switzerland has universal coverage via mandatory health insurance (MHI). MHI covers a generous basket of health services but includes yearly deductibles and co-payments. Premiums are subsidized for low income households.

Outpatient care is highly fragmented and provided by a multitude of practitioners and hospitals on a fee-for-service basis. In contrast to inpatient care, there is no comprehensive diagnostic coding of outpatient care, e.g., by the international classification of diseases (ICD). Insurers are not allowed to collect information on the type of disease affecting their clients. However, they collect all bills for the services and drugs consumed by these clients. These claims data represent the most comprehensive source of information on outpatient service use in Switzerland. Nonetheless, claims data do not include all outpatient care spending: First, because MHI does not cover all outpatient services (e.g., most of dental care, except for unavoidable diseases of the chewing system or if treatment is associated with severe illness) and is not the only payer. The share of outpatient services covered by MHI was estimated at 67% of total outpatient spending [20]. Second, because bills for services covered by MHI might not be forwarded to health insurers, as individuals with high deductibles have to cover them anyway. Previous research has shown that unsubmitted claims amount to 2–3% of all claims [21].

Diagnostic clues in tariff catalogues

The diagnostic clues that can be used to identify diseases consist of health services and drugs identified by specific tariff positions included in claims data. These positions are listed in five national tariff catalogues:

  • The TarMed (tarif médical) for physician services contains about 4600 codes for either technical (e.g., thorax MRI) or time-dependent services (e.g., 5 min of consultation). Tariff points are assigned to each code and determine reimbursement together with the locally contracted price per tariff point.

  • The AL (Analysenliste) for laboratory tests contains about 1800 codes and applies the same reimbursement mechanism as TarMed.

  • The MiGel (Mittel- und Gegenständeliste) for therapeutic devices such as hearing aids contains about 700 codes and maximum prices for each device.

  • The SL (Spezialitätenliste) for drugs contains about 9700 codes and the respective prices for all drug packages covered by MHI and classifies them according to the hierarchical anatomical therapeutic chemical (ATC) classification of the World Health Organization.

  • The SwissDRG catalogue for inpatient acute care contains about 1000 codes based on the patient’s diagnoses and treatments.


We use data from SWICA, a major Swiss health insurer with a MHI market share of 8.1% in 2017 [22]. Our random sample of 709,788 insured covers 90% of the total SWICA insured population. A random sample of 90% of the insured population was taken to avoid showing results specifically for the enrolled population of SWICA in 2017. The sample is fairly representative of the general Swiss population with respect to the age/sex structure and the per capita MHI spending. The share of the elderly population is slightly lower than in the general population, and spending per capita is also slightly lower. Table 1 in Additional file 2 provides the descriptive statistics of this comparison.

Table 1 Number of clues used for disease identification by type of clue and disease level

In order to adjust for the slight differences in the age and sex structure, we computed weights for each 5-year age group and sex (21 × 2 = 42 groups) based on each group’s share in the sample and in the general population [23]. These weights were used in the estimation of the overall prevalence rates as well as the computation of the spending by disease, age group, sex, and service type.

The sample consists mainly of individuals insured with SWICA for the entire year 2017, as individuals may switch to another MHI provider only at the beginning of each year. We observe total spending in MHI at the individual level by service provider (e.g., general practitioner) and service (e.g., physician services). The spending includes the part borne by the insurer as well as deductibles and the co-payments covered by insurees, provided that the bill was sent to the health insurer. The analysis was performed from a health insurance perspective, including co-payments and deductibles paid by insurees.

In addition to the spending by service, the data contains the number and the billed amount of selected tariff positions from the tariff catalogues. These tariff positions were used in the disease identification.


We proceeded in three steps. First, we defined a decomposition framework consisting of a comprehensive health service and disease classification. Second, we developed a disease identification algorithm and used it to label individuals with specific diseases. This allowed us to estimate the treated prevalence of each disease. Third, we assigned the spending at the individual level to diseases, either directly or regression-based. This allowed us to estimate the total outpatient treatment costs of each disease. The following section describes these steps in more detail.

Decomposition framework

We defined a comprehensive and mutually exclusive set of 12 outpatient services using the classification of the Swiss NHA from 2017 [12]. The 12 outpatient services include general practitioners (GPs), specialist physicians, outpatient hospital, drugs, home care, physiotherapy, occupational therapy, outpatient psychiatry, laboratory tests performed by external laboratories outside the doctor’s office, radiology, dental care (for the few indications covered by MHI), and other spending (e.g., devices). Spending by disease was estimated separately for each service.

We classified diseases according to the Global Burden of Disease (GBD) study [24]. This has four major advantages: First, the classification is mutually exclusive, thus avoiding double counting. Second, the GBD provides prevalence rates, which can be used to validate our results. Third, the GBD estimates mortality and disability-adjusted life years (DALY) by disease, which may be used to complement spending by disease with the disease burden. Fourth, it enables comparisons with other studies using the same classification [11, 25].

The GBD classification comprises four hierarchical disease levels with 359 diseases and injuries at the most granular level. Level 1 distinguishes between injuries, non-communicable diseases (NCDs), and ‘other diseases’ including communicable diseases. Level 2 distinguishes between major diseases groups (such as ‘cardiovascular diseases’) and levels 3 and 4 distinguish by more specific diseases (e.g., ‘stroke’ and ‘ischemic stroke’).

We used a simplified GBD classification, as many diseases could not be identified due to a lack of specific diagnostic clues in claims data.

Whenever possible, we defined diseases at GBD level 3. For communicable diseases, we only distinguished between GBD level 2 diseases, except for two GBD level 3 diseases (HIV/AIDS and hepatitis). For NCDs, we selected between two and four GBD level 3 diseases for each major disease group at GBD level 2. The selection of diseases was based on two principles: First, the prevalence level of the disease according to the GBD study for Switzerland in 2017 [26]. For neoplasms, the four localizations were chosen based on incidence rates as reported by the national cancer registry NICER [27]. Second, the availability of clues in the claims data to identify diseases. We judged this availability based on knowledge from previous studies. After selecting diseases within each category, we summed up the others in the residual category (e.g., ‘other cardiovascular diseases’).

Ultimately, our exhaustive und mutually exclusive classification consisted of 42 diseases at GBD level 3 and 15 major disease categories at GBD level 2. As in previous research [11], we added a ‘well care’ category for health care spending on healthy pregnancies, preventive check-ups and other non-diseases. We excluded injuries, as treatment of injuries is financed by a separate mandatory accident insurance.

Disease identification in claims data

We developed a disease identification algorithm to identify diseases based on the diagnostic clues contained in claims data. The algorithm was then used to label individuals with specific diseases. This procedure was based on clinical literature, information on the medical indication of procedures and drugs, and advice from clinical experts. The algorithms consisted of single clues (e.g., specific billing positions), or a combination of clues (e.g., a specific billing position and physician specialization). When clues allowed a disease identification only at GBD level 2 but not at level 3, individuals were assigned to the residual ‘other’ disease category at level 3.

The DRG codes from inpatient care served as reliable disease clues when they directly linked to specific ICD-10 codes. We explored the degree of correspondence between the DRG codes and ICD-10 codes in the Swiss inpatient registry [28], which covers all inpatient care episodes. We set a minimum of a 95% correspondence between DRG and ICD-10 codes to include a DRG code in the disease algorithm. The DRG code B69D (Transient ischemic attack and extracranial vascular occlusions) did for example correspond to a main diagnosis of a stroke (ICD-10: I60-I69) in 99.8% of cases and was therefore used for identification.

As in previous studies [16, 29], ATC drug codes allowed for the identification of numerous diseases. This was very effective when the treatment called for a disease-specific drug (e.g., drugs for HIV).

Physician specialization (e.g., oncology) was mostly used to identify GBD level 2 diseases but was particularly useful when linked to exclusive disease groups, such as oncologists for cancer.

Single tariff codes from the AL and TarMed included the repeated use of prostate-specific antigen (PSA) testing to identify prostate carcinoma and TarMed chapter 23.02 (tumour surgery of the mamma) to identify mamma carcinoma (both in combination with more clues referring to cancer).

Table 1 summarizes the number of clues used for disease identification by type of clue and disease level. The algorithms used to identify diseases are provided in the Additional file 1.

Spending assignment

The last step after labelling individuals with specific diseases was to assign spending at the individual level to diseases. This is somewhat challenging, as individuals may suffer from multiple diseases and spending due to each disease must be estimated appropriately [30]. Several encounter-, episode- and person−/regression-based methods have been proposed for this spending allocation [17].

We implemented a novel two-step spending assignment procedure. The main innovation is the direct assignment of detailed spending items to diseases, followed by an indirect (regression-based) assignment of the residual individual spending.

Spending for each individual i by service s, \( {y}_{i,s}^{total} \), was decomposed and assigned to diseases in two steps: First, we directly assigned spending to disease d using disease clues that uniquely identified diseases at GBD level 3 (e.g., ATC codes), yielding \( {y}_{i,s,d}^{direct}. \) In other words, whenever a single claim (e.g., a specific drug) indicated a specific disease (e.g., prostate cancer), its spending was directly allocated to that disease. Spending associated with specific clues used at disease level 2 (e.g., physician specialization) was distributed equally to the level 3 disease(s) within the corresponding level 2 group. Second, we used regression-based attributable fractions (AF) to decompose the residual spending \( {y}_{i,s}^{residual}={y}_{i,s}^{total}-\sum \limits_{j=1}^{42}{y}_{i,s,j}^{direct} \) [31], thereby following previous studies [32,33,34]. This allowed for the assignment of non-disease-specific claims (e.g., outpatient spending in hospitals). We ran regressions of spending on all 42 disease indicators, separately for 56 groups that were defined based on outpatient service category (7 servicesFootnote 2), sex (male/female), and age (4 groups: < 20 y./20–44 y./45–64 y./65+ y.).

We estimated a Poisson pseudo-maximum likelihood (PPML) model, which has been shown to perform especially well when there are many zeros in the dependent variable [35]Footnote 3:

$$ \Pr \left({y}_{i,s}^{residual}|{I}_{i,1},\dots, {I}_{i,42}\right)=\frac{e^{-{\lambda}_{i,s}}\left({\lambda}_{i,s}^{y_{i,s}^{residual}}\right)}{y_{i,s}^{residual}!} $$


$$ {\lambda}_{i,s}={e}^{\alpha_s+\sum \limits_{j=1}^{42}{\beta}_{s,j}{I}_{i,j}+{\varepsilon}_{i,s}} $$

and Ii, j is an indicator equal to 1 if disease j is present in insured i and 0 if not.

The AF is the part explained by the disease indicators. The exponentiated constant α in our models represents the estimated mean spending with all disease indicators equal to 0. The spending share si, s, d for each disease d for insured i was calculated using the regression coefficients βs, j. Multiplying the AF with si, s, d and the residual spending resulted in the spending for disease d.

$$ A{F}_{i,s}=\frac{\hat{y_{i,s}}-{e}^{\upalpha_s}}{\hat{y_{i,s}}} $$
$$ {s}_{i,s,d}=\frac{\left({e}^{\upbeta_{s,d}}-1\right)\ast {I}_{i,d}}{\sum_{j=1}^{42}\left[\left({e}^{\upbeta_{s,j}}-1\right)\ast {I}_{i,j}\right]} $$
$$ spendin{g}_{i,s,d}=A{F}_{i,s}\ast {s}_{i,s,d}\ast {y}_{i,s}^{residual} $$

We only assigned spending to a disease if its coefficient was significant at the 5% level. Furthermore, we did not allow for any negative effects on individual spending, i.e. we set negative β to zero.Footnote 4

The share of observations with zero spending was very high for home care. We therefore did not apply the regression-based approach for home care, but split spending at the individual level equally across all the identified diseases. We applied a similar logic to the spending for occupational therapy and split it across all identified neurological and musculoskeletal disorders.

Validation and robustness checks

The validation of our disease identification approach is challenging, due to the lack of diagnostic coding in outpatient care. Nonetheless, we compared the disease prevalence rates in our sample with the existing literature. Furthermore, we performed an internal validation by running the disease identification and spending assignment procedure on ten random subsamples each consisting of 70% of the full sample.

The spending by disease depends substantially on the number of patients with the disease. We thus checked the robustness of the spending decomposition results by running a set of scenarios with different thresholds for disease identification based on two clues: drug consumption (i.e., number of packages per year) and treatment by a physician with a certain specialization (i.e., the total spending for a physician with a certain specialization). We defined nine scenarios, combining three different thresholds: at least 1, 2, or 3 drugs with ATC codes used for the disease, as well as at least 1, 300, or 1000 CHF spending for a service provider (e.g., oncologist for the identification of neoplasms) over the course of the year. The scenario with the two lowest thresholds (min. 1 package, min. 1 CHF with service provider) was the one used for most diseases in the disease identification and spending assignment procedure described in 3.2 and 3.3. Only for some diseases (e.g., chronic respiratory diseases) we defined higher thresholds (for additional information on the thresholds used see Additional file 1).


Treated prevalence of identified diseases

We identified 14 major disease groups on GBD level 2 and 42 more specific diseases at GBD level 3, with the addition of well care at both levels of the classification (see 1st and 2nd column in Table 2). The treated prevalence rates were calculated using age group and sex specific weights of the Swiss population (see 3rd column in Table 2).

Table 2 Estimated treated prevalence rates of identified diseases and comparison with other estimates

Disease clues overlapped substantially in disease identification, as individuals identified as prevalent cases of a disease based on one clue were often also identified by other clues. Diseases at level 2 were identified based on one (80.1% of prevalent patients), two (17.5%), three (2.3%), or all four (0.1%) types of clues. For level 3, the corresponding shares were 87.9, 10.7, 1.3, and 0.006%. Diseases were sometimes identified by several clues of the same type (e.g., different drugs). The degree of the overlap between types of clues is illustrated by the Venn diagrams in Fig. 1 for selected diseases. The figure shows that both type and number of disease clues vary across diseases. Table 2 in the Additional file 2 in the Appendix shows the share of patients by the number of clues for each disease.

Fig. 1

Venn diagrams of clues used in the identification of diseases. Numbers refer to the number of patients identified with the clues (in parentheses: share within each disease)

Spending by disease

We were able to assign 80.7% of outpatient spending to diseases. Of these, 53.5% were directly assigned (first step) and 46.5% indirectly assigned (regression-based second step). The regression coefficients used in the computation of the AF are shown in Additional File 3. 19.3% of outpatient spending could not be allocated to any disease. This is for two reasons: first, spending of people who were not flagged with any disease (3.7%) and second, the share of residual spending not assigned in the regression procedure (15.6%). Figure 2 shows the shares of assigned outpatient spending by level 2 diseases. The largest share was devoted to musculoskeletal disorders (19.2%), followed by mental disorders (12.0%), sense organ diseases (8.7%) and cardiovascular diseases (8.6%). The brighter the area, the more was assigned indirectly in the second step.

Fig. 2

Outpatient spending by disease at GBD level 2 (% of total spending assigned)

Figure 3 shows the results by level 3 diseases. Depression had the largest share within mental diseases, with 4.3% of total outpatient spending. The most expensive type of cancer was trachea, bronchus, and lung cancer (1.6%). Other costly single diseases were osteoporosis (2.7%), diabetes (2.4%), and rheumatoid arthritis (2.2%).

Fig. 3

Outpatient spending by disease at GBD level 3 (% of total spending assigned). Note: Labelling of selected diseases with low spending shares was omitted for reasons of readability

Differences in spending by age, sex and service

Spending by disease and sex

Spending shares by disease were similar for both men and women, with few exceptions (Fig. 4). Well care (0.7% for men/9.3% for women) was more relevant in women, mainly due to spending for healthy pregnancy. Men showed higher spending for cardiovascular (10.8%/7.0%), chronic respiratory (2.4%/1.6%), and communicable diseases (10.8%/6.8%) as well as diabetes and kidney diseases (5.7%/2.8%). On the other hand, women showed higher relative spending for musculoskeletal (17.3%/20.5%) and nutritional diseases (0.9%/2.1%). The results for level 3 diseases are shown in Fig. 1 in the Additional file 2.

Fig. 4

Outpatient spending by disease groups for men and women separately (% of total assigned spending). Note: Spending shares refer to the spending that could be assigned. NCDs: non-communicable diseases

Spending by disease and age

Figure 5 shows spending on major disease groups by five-year age groups. Younger age groups had a substantially higher spending share for mental diseases. Cardiovascular diseases had a higher spending share in older age groups. The same was true for neoplasms, which show, however, a decreasing spending share in patients above age 80. The share of spending that was not attributed to any disease was much higher in young individuals. Absolute spending by disease and age groups is shown in Fig. 2 in the Additional file 2.

Fig. 5

Outpatient spending by major disease groups (in % of total by age group)

The spending share attributable to age groups varied by disease (Fig. 6). The population above age 65 consumed more than half of the spending assigned to sense organ and cardiovascular diseases. In other disease groups like musculoskeletal, mental, and neurological disorders, the predominant group were individuals aged 45–64 years.

Fig. 6

Outpatient spending of broad age groups by disease group (% of assigned spending)

Spending by disease and health care service

The spending by disease differed significantly across services. As illustrated in Fig. 7 the spending share of neoplasms was higher in hospital outpatient care (13.4%) and drugs (14.4%). Musculoskeletal disorders showed the highest spending shares in many services, such as GPs (23.7%), drugs (15.3%), home care (16.4%), and radiology (34.9%). The numbers in the coloured area of the figure refer to the assigned spending and add up to 100%. The part that could not directly be allocated in the two assignment steps is shown below the coloured area above the health service labels. Non-assigned spending was highest for GPs (34.9%) and equal to zero for services that we assigned completely to diseases (e.g., psychiatry). The share was also low for specialists (12.8%) and drugs (11.4%).

Fig. 7

Outpatient spending by disease groups at GBD level 2 and health service (% of spending assigned to health services)

The relative importance of services in the estimated spending varied by disease (Fig. 8). Drugs were the most important spending component for many diseases (e.g., neoplasms and chronic respiratory diseases). The high shares of other outpatient care in mental disorders and diabetes and kidney disease was mainly due to psychotherapy (mental) and dialysis (kidney disease). The results for spending by services and diseases at level 3 are shown in Figs. 3 and 4 in the Additional file 2.

Fig. 8

Spending shares of outpatient services by disease group at GBD level 2 (% of spending assigned to diseases). GP: General practitioner. Note: the category “other outpatient” comprises all the outpatient services not explicitly shown in the graph (physiotherapy, occupational therapy, psychiatry, radiology, laboratory tests, dental care, other outpatient)

Average annual disease-specific spending per patient

Average disease-specific spending per patient was calculated by dividing total spending by disease by the number of patients with that disease. This disease-specific spending is an average over all prevalent patients treated for that disease in 2017. It thus represents an average over incident and prevalent cases of different disease stages. Table 3 shows this spending per patient separately for women and men (column 3 and 4). Average annual spending per patient was highest for hepatitis, followed by trachea, bronchus and lung cancer, and chronic kidney disease. Columns 5 and 6 in the table show the overall share of women and men in disease specific spending. The last column shows the share of total outpatient spending for that disease.

Table 3 Outpatient spending by disease at GBD level 3 (share by sex, per patient by sex, overall share)

Validation and robustness of results

In Table 2, we provide values from the GBD study for Switzerland (4th column) [26] and from other previous epidemiological literature for Switzerland (5th column). The best available data source to compare our prevalence estimates with is registry data, which is available only for neoplasms in the national cancer registry NICER [36].

The internal validation of the spending estimates by diseases showed that the spending shares of diseases were very similar across subsamples. Running the disease identification and the subsequent spending assignment on ten randomly drawn subsets showed that the spending shares ranged between +/− 0.3%-points at the maximum (+/− 0.1%-points on average) compared to the initial estimates for diseases at level 2.

The nine scenarios that we ran to check for the robustness of our results led to different spending shares by disease group (see Fig. 9). For most diseases, the shares were similar across the scenarios. The largest variation was observed for communicable diseases (min: 5.3, max: 9.6), while the shares for mental disorders varied only slightly (min: 12.0, max: 13.7). The three scenarios with the highest specialist spending threshold (in pink) decreased the share of musculoskeletal disorders quite substantially. The minimum, maximum, and mean values for all level 2 and level 3 diseases are shown in Tables 4 and 5 in Additional file 2.

Fig. 9

Spending shares in scenario analysis for GBD level 2 diseases. Note: The scenarios include different combinations of minimum requirements of drug utilization (number of packages) and spending at a physician with a certain specialization used in the identification of diseases. Spending shares refer to the spending that could be assigned. The numbers shown for each disease refer to spending shares in the base estimation. NCDs: non-communicable diseases


This study estimated disease-specific outpatient spending by age, sex, and health care services and drugs in Switzerland in 2017. Diseases were classified based on a simplified exhaustive and mutually exclusive GBD classification, with two hierarchical levels including 15 major disease groups at GBD level 2, and 42 more specific diseases at GBD level 3. Health care services and drugs were classified based on a simplified National Health Accounts classification. Diseases were identified based on a combination of diagnostic clues in claims data from a major MHI provider. Spending was assigned to diseases both directly and indirectly (regression-based).

Interpretation of results

Our results show the number of individuals treated in outpatient settings for specific diseases in 2017, as well as the disease-specific outpatient spending (overall and per patient) in that year. Our estimated treated prevalence rates are mostly lower than the overall prevalence rates estimated in the GBD study for Switzerland and in other studies (see Table 2). This difference may be explained by two factors: First, not all individuals affected by a disease in a year (prevalent cases) are treated for the disease in that year. Possible reasons include lack of diagnosis, lack of access to care, lack of treatment options, and lack of current need of treatment, as well as the patient’s choice not to be treated. Second, even if the individual was treated for a disease, we could not identify the disease if we had no suitable diagnostic clue in the claims data. The first factor does not affect our spending by disease estimations, as health care spending can only occur when patients are treated. The second factor will, however, lead to an underestimation of disease-specific spending.

In general, our disease identification algorithms performed reasonably well in diseases with specific treatments and compelling need of treatment, such as diabetes, ischemic heart disease or lung cancer. The identification was more challenging for diseases with mainly unspecific treatments, such as low-back pain and osteoarthritis. However, these diseases could in part be identified at the GBD level 2 of the major disease categories, such as musculoskeletal disorders and mental and substance use disorders.

Reasons for differences in prevalence rates

The differences between our estimates and GBD prevalence rates may be explained. First, we identified treated prevalence, while GBD reports overall prevalence. Second, the GBD prevalence estimates are not always derived from epidemiological data from Switzerland, but from other countries. Third, the identification for some diseases was difficult with our data, because treatments were usually not covered by MHI, e.g., oral disorders.

Spending by disease

We were able to assign 80.7% of outpatient spending to diseases. We found that almost half of outpatient spending was on just four disease groups: musculoskeletal disorders (19.2% of total assigned spending), mental and substance use disorders (12.0%), sense organ diseases (8.7%), and cardiovascular diseases (8.6%). These often chronic diseases are highly prevalent, in many cases well treatable and do usually not lead to death (with the exception of cardiovascular diseases). Neoplasms accounted for 7.3% of outpatient spending, although they had the largest health burden in terms of years of life lost in 2017 (46.6% of total in women and 28.5% in men) [47]. This may be explained by the often short and acute rather than chronic disease episodes, leading to lower overall spending, even if spending per patient may be very high. Previous literature has found similar results for neoplasms [11, 25].

The share of direct spending assignment differed substantially across diseases. It was high when we had either specific drugs with a large share in disease-specific spending (e.g., for communicable diseases, multiple sclerosis, or ADHD) or when we were able to directly identify all reimbursed services (e.g., healthy pregnancy or prevention services for well care). The share of non-assigned spending was quite low for drugs (11.4%), as spending for many expensive drugs could be assigned directly to specific diseases.

Spending by sex, age, and type of health care service

We observed substantial differences in the disease-specific spending by service type. The relative importance of single diseases for each type of health service is strongly influenced by the type of available outpatient treatment options. The highest share of spending on drugs is for example for neoplasms and musculoskeletal disorders, while the highest share of spending on laboratory tests is for communicable diseases.

There were substantial differences in spending by disease across age groups and sex. Spending on mental and substance use disorders was considerably higher in the younger age groups, while spending on cardiovascular and sense organ diseases was considerably higher in the older age groups. A higher share of spending could not be assigned to any disease in the younger age groups. This may indicate that younger individuals are relatively more affected by less frequent diseases. Women had a substantially higher share in spending on many highly prevalent diseases (e.g., dementia, depression, sense organ diseases and musculoskeletal disorders), while men had a higher spending share in some communicable diseases (HIV/AIDS, hepatitis), cardiovascular diseases (e.g., ischemic heart disease), mental and substance use disorders (e.g, schizophrenia, ADHD, alcohol and drug use disorders), and diabetes and chronic kidney disease.

Methodological challenges

The biggest challenge in spending decompositions is the availability of data containing both diagnostic and spending information. Most comparable studies used encounter-level data and assigned spending to the listed diagnoses [25, 48, 49]. We did not have access to diagnostic and spending data at the level of single encounters. However, our two-step approach involving direct and regression-based assignment has two important advantages. First, we implicitly allowed for comorbidities for all spending types, while attributing everything to the primary diagnosis may overestimate spending for frequent main diagnoses. Second, we estimated coefficients for each service separately and thus allowed for different effects of diseases on the individual spending for each service. Due to very scarce diagnostic data in the outpatient sector in Switzerland, the focus was on decomposing outpatient health care spending. Our two-step spending assignment would be easily applicable to other services where no diagnostic information is collected, such as inpatient long-term care.

Comparison with previous studies

Despite methodological differences and the different health care systems, we found similar spending shares as in the US study which applied a similar disease classification based on GBD [25]. The following comparison refers to shares of total health care spending by US public insurance (mainly Medicare and Medicaid).Footnote 5 Our estimates were similar for mental and substance use disorders (12.0% vs. 12.9% for the US), musculoskeletal disorders (19.2% vs. 17.4%), neurological diseases (6.8% vs. 5.5%), and cardiovascular diseases (8.6% vs. 7.7%). The US estimate for diabetes (16.8%) was not directly comparable to our estimate of 2.4% as it also included urogenital, blood and other endocrine diseases. However, we would expect a higher number for the US due to the much higher prevalence of typical risk factors for diabetes, such as overweight [50]. We found higher spending shares for communicable, maternal, neonatal, and nutritional diseases (10.8% vs. 5.4%) and neoplasms (7.3% vs. 4.9%). The US study assigned 6.8% of non-injury outpatient care spending to the treatment of risk factors, which comprises conditions that we included directly in the disease groups (e.g., hypertension in cardiovascular diseases).

The only comparable study for Switzerland decomposed spending in 2011 by 21 diseases mostly consistent with the GBD level 2 conditions [11]. However, that study relied on less information to identify diseases. Our estimates were similar for musculoskeletal disorders (19.2% vs. 16.3% in [11]), mental and substance use disorders (12.0% vs. 9.5%), diabetes (2.4% vs. 2.6%), and skin and subcutaneous diseases (3.7% vs. 3.1%). We found higher shares for sense organ diseases (8.7% vs. 2.9%) and neurological disorders including dementia (6.8% vs. 2.6%) and a lower share for cardiovascular diseases (8.6% vs. 18.9%).


This study has some limitations. First, we were not able to validate our disease identification algorithms at the individual level using an external data source. Nonetheless, we compared the computed prevalence rates with the existing literature and the GBD study and performed some internal validation by running the algorithm on random subsamples.

Second, we did not find any diagnostic clues for some important diseases. The reasons were a lack of specificity of diagnostic clues (e.g., for osteoarthritis, as anti-inflammatory medication is usually unspecific [51]) and a lack of sensitivity (e.g., for low back pain, as claims data hold no specific diagnostic clues for the treatment of this disease). In these cases, we underestimated the true treated disease prevalence, and consequently also the spending assigned to these diseases. This limitation must be considered when comparing our results with the existing literature as in Table 2. However, this limitation applies only to some GBD level 3 diseases and less to the broader GBD level 2 major disease groups.

Third, a substantial part of spending (19.3%) could not be assigned to any disease. This was due to insured individuals, that could not be labelled with any condition, and to residual spending, that could not be explained by the regression model (i.e., the constant). This effect is well-known in person-based spending allocations [34]. Our algorithms were especially sensitive to acute and severe conditions. Spending for unspecific routine visits and drug prescriptions remained unexplained, especially in younger individuals.

Fourth, spending shares were highest for the residual ‘other’ categories which included insured who did not show any clues at the more disease-specific level 3.

Finally, we weighted our results both for the prevalence and the spending estimation by age and sex specific weights according to the Swiss population. They are, however, still not necessarily generalizable as the client structure of a single insurer may not be representative of the national population with respect to morbidity and geographical distribution. However, another source of information suggests that SWICA population is quite representative for the overall Swiss population: The Swiss risk equalization scheme aims to compensate for per capita spending differences due to the risk profiles of the populations enrolled with different MHI providers. Payments in the risk equalization scheme in 2017 suggest that the SWICA population was, on average, very close to the general population [22]. Moreover, the SWICA sample showed a similar age/sex structure as the Swiss population and only slightly lower per capita spending in MHI than the Swiss average.

Future research

algorithm and, most importantly, validate it. One promising possibility to overcome the lack of diagnostic information is to link insurance claims with ICD-10 diagnoses from the Swiss inpatient registry and to check the correspondence of the diagnoses in the two sources. Furthermore, our results for the outpatient sector should be complemented with those for other health care services.


At present, little is known on how much single diseases contribute to outpatient spending in Switzerland. One reason is the lack of data holding information both on spending and diseases at the individual level. Our study shows the high potential of health insurance claims data in identifying diseases when no diagnostic coding is available. Our approach may thus also be promising for epidemiological research on treated prevalence. It may be also applied in other countries, with social health insurance provided by private health insurers.

Decomposing spending by age, sex and diseases over time can inform on the drivers of health care spending. This information contributes to a better understanding of the effects of epidemiological and demographic trends on health care spending. It may be particularly important from a health policy perspective, as it can guide the definition of global spending budgets currently discussed in Switzerland and elsewhere, as well as health care provision planning.

Availability of data and materials

Original data are confidential and not available.


  1. 1.

    USD 11,103 in 2018 with 1.18 USD/CHF exchange rate in 2018 (OECD).

  2. 2.

    Three service types (physiotherapy, psychiatry, dental care) were fully attributed to a disease group and no second step assignment was needed. For two services (home care and occupational therapy), we did not use the regression-based approach to assign spending in the second step.

  3. 3.

    We tested three other models: Ordinary least-squares (OLS) on log(y + 1), a generalized linear model (GLM) with gamma distribution and log link, and a zero-inflated negative binomial (ZINB). The PPML outperformed the other estimators in standard goodness-of-fit measures: mean absolute error (MAE), root mean squared error (RMSE) and adjusted R squared (see table 3 in the Additional file 2).

  4. 4.

    There were a total of 85 negative and statistically significant coefficients in the 56 models with 42 variables each (2352 coefficient estimates in total), which corresponds to a share of less than 4%.

  5. 5.

    We obtained these numbers from the online results tool:



Attention Deficit Hyperactivity Disorder




Anatomical-therapeutic chemical


Chronic obstructive pulmonary disease


Disability-adjusted life years


Diagnosis-related group


Global Burden of Disease


General Practitioner


International Classification of Diseases


Mean absolute error


Mandatory health insurance


Mittel- und Gegenständeliste


Magnetic resonance imaging


Non-communicable diseases


National Health Accounts


Poisson pseudo-maximum likelihood


Root mean squared error


Tarif médical


  1. 1.

    Dieleman JL, Squires E, Bui AL, Campbell M, Chapin A, Hamavid H, et al. Factors associated with increases in US health care spending, 1996-2013. JAMA. 2017;318(17):1668–78.

    Article  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Dunn A, Liebman E, Shapiro AH. Decomposing medical care expenditure growth. In: Aizcorbe A, Baker C, Berndt E, Cutler D, editors. Measuring and Modeling Health Care Costs. University of Chicago Press; 2016. p. 81–111.

  3. 3.

    Dunn A, Rittmueller L, Whitmire B. Health care spending slowdown from 2000 to 2010 was driven by lower growth in cost per case, according to a new data source. Health Aff. 2016;35(1):132–40.

    Article  Google Scholar 

  4. 4.

    Dunn A, Whitmire B, Batch A, Fernando L, Rittmueller L. High spending growth rates for key diseases in 2000–14 were driven by technology and demographic factors. Health Aff. 2018;37(6):915–24.

    Article  Google Scholar 

  5. 5.

    Roehrig C. Mental disorders top the list of the most costly conditions in the United States: $201 billion. Health Aff. 2016;35(6):1130–5.

    Article  Google Scholar 

  6. 6.

    Roehrig CS, Rousseau DM. The growth in cost per case explains far more of US health spending increases than rising disease prevalence. Health Aff. 2011;30(9):1657–63.

    Article  Google Scholar 

  7. 7.

    Starr M, Dominiak L, Aizcorbe A. Decomposing growth in spending finds annual cost of treatment contributed most to spending growth, 1980–2006. Health Aff. 2014;33(5):823–31.

    Article  Google Scholar 

  8. 8.

    Thorpe KE. Treated disease prevalence and spending per treated case drove most of the growth in health care spending in 1987–2009. Health Aff. 2013;32(5):851–8.

    Article  Google Scholar 

  9. 9.

    Zhai T, Goss J, Li J. Main drivers of health expenditure growth in China: a decomposition analysis. BMC Health Serv Res. 2017;17(1):1–9.

    Article  Google Scholar 

  10. 10.

    Stucki M. Factors related to the change in Swiss inpatient costs by disease: a 6-factor decomposition. Eur J Health Econ. 2021;22(2):195–221.

    Article  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Wieser S, Riguzzi M, Pletscher M, Huber CA, Telser H, Schwenkglenks M. How much does the treatment of each major disease cost? A decomposition of Swiss National Health Accounts. Eur J Health Econ. 2018;19(8):1149–61.

    Article  PubMed  Google Scholar 

  12. 12.

    Federal Statistical Office. National Health Accounts 2019 (Kosten und Finanzierung des Gesundheitswesens). 2021.

  13. 13.

    Blozik E, Rapold R, Eichler K, Reich O. Epidemiology and costs of multiple sclerosis in Switzerland: an analysis of health-care claims data, 2011–2015. Neuropsychiatr Dis Treat. 2017;13:2737–45.

    Article  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Schmidt M, Brunner B, Wieser S, Rapold R, Blozik E. Late Breaking Abstract-Assessing epidemiology and costs of asthma and COPD in Switzerland with health insurance data. European Respiratory Journal. 2018;52(suppl 62, PA3150).

  15. 15.

    Huber CA, Schwenkglenks M, Rapold R, Reich O. Epidemiology and costs of diabetes mellitus in Switzerland: an analysis of health care claims data, 2006 and 2011. BMC Endocr Disord. 2014;14(1):1–9.

    Article  Google Scholar 

  16. 16.

    Huber CA, Szucs TD, Rapold R, Reich O. Identifying patients with chronic conditions using pharmacy data in Switzerland: an updated mapping approach to the classification of medications. BMC Public Health. 2013;13(1):1–10.

    Article  Google Scholar 

  17. 17.

    Rosen AB, Cutler DM. Challenges in building disease-based national health accounts. Med Care. 2009;47(7 Suppl 1):S7–S13.

    Article  PubMed  PubMed Central  Google Scholar 

  18. 18.

    De Pietro C, Camenzind P, Sturny I, Crivelli L, Edwards-Garavoglia S, Spranger A, et al. Switzerland: health system review. Health Syst Transit. 2015;17(4):1–288.

    PubMed  Google Scholar 

  19. 19.

    Health Statistics OECD. Organization for Economic Cooperation and Development. 2020.

  20. 20.

    Brunner B, Wieser S, Maurer M, Stucki M, Nemitz J, Schmidt M et al. Schlussbericht "Effizienzpotenzial bei den KVG-pflichtigen Leistungen": eine Studie im Auftrag des Bundesamtes für Gesundheit. 2019.

  21. 21.

    Reich O, Signorell A, Busato A. Place of death and health care utilization for people in the last 6 months of life in Switzerland: a retrospective analysis using administrative data. BMC Health Serv Res. 2013;13(1):1–10.

    Article  Google Scholar 

  22. 22.

    Federal Office of Public Health. Statistics on the Compulsory Health Insurance 2017 (Statistik der obligatorischen Krankenversicherung 2017). 2019.

  23. 23.

    Federal Statistical Office. Population and Households Statistics (STATPOP). 2021.

  24. 24.

    GBD 2017 Causes of Death Collaborators. Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392(10159):1736–88.

    Article  Google Scholar 

  25. 25.

    Dieleman JL, Cao J, Chapin A, Chen C, Li Z, Liu A, et al. US health care spending by payer and health condition, 1996-2016. JAMA. 2020;323(9):863–84.

    Article  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Global Burden of Disease Collaborative Network. Global Burden of Disease Study 2019 (GBD 2019) Results. Seattle, United States: Institute for Health Metrics and Evaluation (IHME)2020.

  27. 27.

    Cancer incidence 1988-2017. NICER (Nationales Institut für Krebsepidemiologie und -registrierung) 2020. .

  28. 28.

    Federal Statistical Office. Inpatient Registry 2017 (Medizinische Statistik der Krankenhäuser 2017). 2020.

  29. 29.

    Dong YH, Chang CH, Shau WY, Kuo RN, Lai MS, Chan KA. Development and validation of a pharmacy-based comorbidity measure in a population-based automated health care database. Pharmacotherapy: The Journal of Human Pharmacology and Drug Therapy. 2013;33(2):126–36.

    Article  Google Scholar 

  30. 30.

    Rizzo JA, Chen J, Gunnarsson CL, Naim A, Lofland JH. Adjusting for comorbidities in cost of illness studies. J Med Econ. 2015;18(1):12–28.

    Article  PubMed  Google Scholar 

  31. 31.

    Trogdon JG, Finkelstein EA, Hoerger TJ. Use of econometric models to estimate expenditure shares. Health Serv Res. 2008;43(4):1442–52.

    Article  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Hall AE, Highfill T. Calculating disease-based medical care expenditure indexes for Medicare beneficiaries: A Comparison of method and data choices. In: Aizcorbe A, Baker C, Berndt E, Cutler D, editors. Measuring and Modeling Health Care Costs. University of Chicago Press; 2016. p. 113–141.

  33. 33.

    Renfro S, Lindner S, McConnell KJ. Decomposing Medicaid spending during Health system reform and ACA expansion. Med Care. 2018;56(7):589–95.

    Article  PubMed  Google Scholar 

  34. 34.

    Ghosh K, Bondarenko I, Messer KL, Stewart ST, Raghunathan T, Rosen AB, et al. Attributing medical spending to conditions: a comparison of methods. PLoS One. 2020;15(8):e0237082.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Silva JS, Tenreyro S. The log of gravity. Rev Econ Stat. 2006;88(4):641–58.

    Article  Google Scholar 

  36. 36.

    Cancer prevalence in Switzerland 2010-2025. NICER (Nationales Institut für Krebsepidemiologie und -registrierung) 2020.

  37. 37.

    Schmutz M, Beer-Borst S, Meiltz A, Urban P, Gaspoz J-M, Costanza MC, et al. Low prevalence of atrial fibrillation in asymptomatic adults in Geneva. Switzerland Europace. 2010;12(4):475–81.

    Article  PubMed  Google Scholar 

  38. 38.

    Andlin-Sobocki P, Jönsson B, Wittchen H, Olesen J. Cost of disorders of the brain in Europe. Eur J Neurol. 2005;12(s1):1–27.

    Article  PubMed  Google Scholar 

  39. 39.

    Wittchen H-U, Jacobi F. Size and burden of mental disorders in Europe—a critical review and appraisal of 27 studies. Eur Neuropsychopharmacol. 2005;15(4):357–76.

    CAS  Article  PubMed  Google Scholar 

  40. 40.

    Pletscher M, Mattli R, von Wyl A, Reich O, Wieser S. The societal costs of schizophrenia in Switzerland. J Ment Health Policy Econ. 2015;18(2):93–103.

    PubMed  Google Scholar 

  41. 41.

    Barger SD, Messerli-Bürgy N, Barth J. Social relationship correlates of major depressive disorder and depressive symptoms in Switzerland: nationally representative cross sectional study. BMC Public Health. 2014;14(1):1–10.

    Article  Google Scholar 

  42. 42.

    Kuendig H. Estimation du nombre de personnes alcoolo-dépendantes dans la population helvétique (Rapport de recherche No 56). Lausanne: Addiction Info Suisse; 2010.

    Google Scholar 

  43. 43.

    Kaiser A, Vollenweider P, Waeber G, Marques-Vidal P. Prevalence, awareness and treatment of type 2 diabetes mellitus in Switzerland: the CoLaus study. Diabet Med. 2012;29(2):190–7.

    CAS  Article  PubMed  Google Scholar 

  44. 44.

    Forni Ogna V, Ogna A, Ponte B, Gabutti L, Binet I, Conen D, et al. Prevalence and determinants of chronic kidney disease in the Swiss population. Swiss Med Wkly. 2016;146:w14313.

    Article  PubMed  Google Scholar 

  45. 45.

    Wieser S, Horisberger B, Schmidhauser S, Eisenring C, Brügger U, Ruckstuhl A, et al. Cost of low back pain in Switzerland in 2005. Eur J Health Econ. 2011;12(5):455–67.

    Article  PubMed  Google Scholar 

  46. 46.

    Schwenkglenks M, Szucs TD. Epidémiologie de l’ostéoporose et des fractures chez les personnes âgées. Ostéoporose et chutes des personnes âgées: une approche de santé publique. Bern: Federal Office of Public Health; 2004. p. 27–34.

  47. 47.

    Federal Statistical Office. Cause of Deaths Statistics (Todesursachenstatistik). 2019.

  48. 48.

    Dieleman JL, Baral R, Birger M, Bui AL, Bulchis A, Chapin A, et al. US spending on personal health care and public health, 1996-2013. JAMA. 2016;316(24):2627–46.

    Article  PubMed  PubMed Central  Google Scholar 

  49. 49.

    Kinge JM, Sælensminde K, Dieleman J, Vollset SE, Norheim OF. Economic losses and burden of disease by medical conditions in Norway. Health Policy. 2017;121(6):691–8.

    Article  PubMed  Google Scholar 

  50. 50.

    Thorpe KE, Howard DH, Galactionova K. Differences in disease prevalence as a source of the US-European Health care spending gap: Americans are diagnosed with and treated for several chronic illnesses more often than their European counterparts are. Health Aff. 2007;26(Suppl2):w678–w86.

    Article  Google Scholar 

  51. 51.

    Lix L, Yogendran M, Burchill C, Metge C, McKeen N, Moore D, et al. Defining and validating chronic diseases: an administrative data approach. Winnipeg: Manitoba Centre for Health Policy; 2006.

    Google Scholar 

Download references


We would like to thank the SWICA health insurance company for providing the data for this study. Furthermore, we thank three clinical experts for advising us on the identification of diseases in the insured population.


MS has a PhD scholarship sponsored by swissuniversities (Swiss Learning Health System). The study received funding from the ZHAW Zurich University of Applied Sciences. The funding bodies did not influence the design, realization or outcome of the study.

Author information




MS was the main responsible for the study, performed the data analysis, interpreted the results, and wrote the manuscript. SW and MS conceptualized and designed the study. MT and MS prepared the data. JN and MS developed and implemented the statistical methodology. All authors commented on previous versions of the manuscript and approved the final manuscript.

Corresponding author

Correspondence to Michael Stucki.

Ethics declarations

Ethics approval and consent to participate

Ethics committee approval was not required in accordance with the Swiss law on human research because all data sources were retrospective, routinely collected, and anonymized.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflicts of interest.

MT is a part-time employee of SWICA Health Insurance. SWICA Health Insurance provided anonymized claims data for this study, but did not influence its design, realization or outcome. Neither SWICA Health Insurance nor MT could have any financial interest in altering the results of this study.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Algorithms to identify diseases in the claims data.

Additional file 2.

Additional results not shown in the article.

Additional file 3.

Regression results.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Stucki, M., Nemitz, J., Trottmann, M. et al. Decomposition of outpatient health care spending by disease - a novel approach using insurance claims data. BMC Health Serv Res 21, 1264 (2021).

Download citation


  • Health care costs
  • Cost-of-illness
  • Outpatient care
  • Switzerland
  • Spending decomposition

JEL classification

  • I10