Comparison of alternative risk adjustment measures for predictive modeling: high risk patient case finding using Taiwan's National Health Insurance claims

Background Predictive modeling presents an opportunity to contain the expansion of medical expenditures by focusing on very few people. Evaluation of how risk adjustment models perform in predictive modeling in Taiwan or Asia has been rare. The aims of this study were to evaluate the performance of different risk adjustment models (the ACG risk adjustment system and prior expenditures) in predictive modeling, using Taiwan's National Health Insurance (NHI) claims data, and to compare characteristics of potentially high-expenditure subjects identified through different models. Methods A random sample of NHI enrollees continuously enrolled in 2002 and 2003 (n = 164,562) was selected. Health status measures and total expenditures derived from 2002 NHI claims data were used to predict the possibility of becoming 2003 top users. Statistics-based indicators (C-statistics, sensitivity, & Predictive Positive Value) and characteristics of identified top groups by different models (expenditures and prevalence of manageable diseases) were presented. Results Both diagnosis-based and prior expenditures models performed much better than the demographic model. Diagnosis-based models were better in identifying top users with manageable diseases; prior expenditures models were better in statistics-based indicators and identifying people with higher average expenditures. Prior expenditures status could correctly identify more actual top users than diagnosis-based or demographic models. The proportions of actual top users that could be identified by diagnosis-based models alone were much lower than that identified by prior expenditures status. Conclusions Predicted top users identified by different models have different characteristics and there is little agreement between modes regarding which groups would be potentially top users; therefore, which model to use should depend on the purpose of predictive modeling. Prior expenditures are a more powerful tool than diagnosis-based risk adjusters in terms of correctly identifying more actual high expenditures users. There is still much room left for improvement of diagnosis-based models in predictive modeling.


Background
Research analyzing the distribution of medical expenditures has consistently shown that a large proportion of medical resources are consumed by a small percentage of the total population [1]. The top 20% of the population with the highest expenditures accounted for about 80% of all healthcare expenditures in the United States [2][3][4]. This phenomenon of the concentration of healthcare expenditures has been observed continuously since 1970 [5]. Consequently, this group of extraordinarily high users of medical resources has inevitably become a target of several types of cost-containment strategies, such as disease management, care management, and utilization review [1,6].
Predictive modeling in health care is generally defined as 'a process of applying existing patient data to prospectively identify persons with high medical needs who are at risk for higher future medical utilization [7].' Predictive modeling is important because early intervention can be delivered to persons identified as possibly having high medical needs. By helping these individuals manage their diseases effectively and providing coordinated medical care, their medical utilization can be reduced and the quality of care they receive can be maintained or improved [8]. In the long run, the expansion of medical expenditures may be controlled within a reasonable range [4,6]. Diagnosis-based health indicators and prior expenditures are the two most common types of risk adjusters for this purpose. It was found that both types of models were very comparable in overall discrimination (by C-statistics), sensitivity, and specificity; however, high-expenditure individuals identified by diagnosisbased models had higher disease burden and somewhat higher healthcare utilization [3,[8][9][10]. Since high-expenditure users identified by diagnosis-based models have more 'manageable' diseases that are targets of disease management programs, it is the preferred model to use.
Taiwan launched a government-run, single-payer National Health Insurance (NHI) programme in May 1995. All Taiwanese nationals are obligated by law to join this programme to ensure adequate risk pooling. Under the jurisdiction of the national government's Department of Health, the NHI is administered by the Bureau of National Health Insurance (BNHI) and six regional branches are in charge of administrating the NHI in each area. The NHI's benefit packages are comprehensive, including inpatient and outpatient services, pharmacy services, Chinese medicine and dental services. Beneficiaries have complete freedom of choice of providers and therapies, and they do not need to go through 'gatekeepers' in order to obtain medical services from specialists. The primary source of funding for the NHI is the payment of premiums shared by the insured, the employers and the government. In terms of reimbursement, the global budget payment system was adopted in order to contain the growth of medical expenditure. Within budget limits, the NHI reimburses contracted providers mostly on a fee-for-service basis, using uniform national fee schedules.
Given the rising concerns to contain the growth of medical expenditures, predictive modeling presents an opportunity to achieve this goal by focusing on very few people. However, previous studies regarding predictive modeling have used regional datasets or focused only on sub-populations, and were conducted in the Western countries. Taiwan is one of very few healthcare systems in the world which has universal coverage and a single national computerized database that includes medical diagnosis information on almost 100% of the population. For this reason the results of this paper have potential policy and methodology implications for most other high or middle income nations.
Few studies related to predictive modeling have been conducted in Taiwan. It has been shown that a government-sponsored disease-management program significantly reduced medical utilization for patients with asthma [11]. In addition, patients in the program had more accurate knowledge of and better self-care skills concerning asthma, and were more likely to adhere to physicians' suggestions [11]. These achievements imply that medical expenditures incurred by this group of patients could potentially be reduced by providing disease and care management, while quality of care could be improved. Utilization review has also been implemented by the Bureau of National Health Insurance (BNHI) in Taiwan, but it was done retrospectively; under such situation, high-expenditure users could only be identified after a large amount of expenditures had occurred, and only a certain proportion of this population would remain high-expenditure users in the following years.
The goal of this study is to evaluate the performance of the Adjusted Clinical Group (ACG) risk adjustment system in predictive modeling using Taiwan's National Health Insurance claims data, and to compare characteristics of potentially high-expenditure subjects identified through different models.

Data sources
The source of the data was a longitudinal dataset prepared by Taiwan's Bureau of National Health Insurance, which is available for researchers interested in observing longitudinal changes of medical utilization. This dataset contained enrollment and claims files of a randomly chosen 1% of Taiwan's population (~200,000 individuals). The enrollment files contained individual subscription information and demographic factors, including sex, date of birth, type of beneficiaries, and location. The claims files contained comprehensive records of inpatient care, ambulatory care, pharmacy store, dental care, and Chinese medicine services. The files also included date of service, ICD-9-CM (International Classification of Diseases) diagnosis codes, claimed medical expenses, and amount of co-payment for each encounter. Twenty-four-month enrollment in both years (2002 and 2003) was required for this analysis, resulting in the final sample size of 164,562 subjects. Individuals' identifiers in this dataset have been encrypted to protect privacy and confidentiality, and this study has been approved by the Johns Hopkins School of Public Health Institutional Review Board.
Annual health expenditures for every NHI enrollee were aggregated from all inpatient, outpatient, and pharmacy store claimed expenses, including claimed reimbursement, medication expenses, and co-payments. Expenses for dental care and Chinese medicine were excluded from this aggregation. Both 2002 and 2003 expenditures were calculated. The unit of money in Taiwan is New Taiwan Dollar (NTD); the exchange rate is about 31 NTD: 1 US dollar as of May 2010. Demographic factors included sex, categorical age (0-17, 18-34, 35-49, 50-64, ≧65), type of beneficiary (insured or dependent), insurance category (based on insured's type of job), residence (three levels with different degrees of population density), and locality (six geographic regions by BNHI's administrative branches: Taipei, Northern, Central, Southern, Kao-Ping, Eastern). Diagnosis-based risk adjustment factors, including ACG, ADG (Aggregated Diagnosis Group) and EDC (Expanded Diagnosis Cluster), were derived from the ACG case-mix system (Version 7.1) using individuals' overall ICD-9-CM codes from both inpatient and outpatient records (diagnosis codes from dental and Chinese medicine services were excluded) in the year 2002.

The ACG risk adjustment system
ACGs are mutually exclusive health status categories defined by morbidity pattern, age, and sex. The ACG system assigns all ICD-9-CM codes to one of 32 diagnostic clusters (ADGs) based on five clinical dimensions: duration, severity, diagnostic certainty, etiology, and specialty care involvement [12,13]. Each ADG is a grouping of diagnosis codes similar in terms of severity and likelihood of persistence of the health condition treated over a relevant period. ADGs are not mutually exclusive and individuals can have multiple ADGs (up to 32). Individuals are then placed into one of 93 discrete ACG categories according to their assigned ADGs, age, and sex; the result is that individuals within a given ACG have experienced a similar pattern of morbidity and resource consumption. Expanded Diagnosis Clusters (EDCs) are binary indicators to show whether an individual has specific diseases/symptoms. The EDC methodology assigns each ICD code to a single EDC; there are 264 EDCs in total. ICD codes within an EDC share similar clinical characteristics and are expected to induce similar types of diagnostic and therapeutic responses.

Risk adjustment models
Five risk adjustment models evaluated in this study are listed below based on the comprehensiveness of risk adjusters: Selected EDCs were derived from the results of stepwise analyses in explaining prospective total expenditures, using a full set of EDCs and a multivariate linear regression model; 19 EDCs were thus chosen (Additional file 1).

Outcomes and measures of model performance
Being a high-expenditure user was a binary variable defined using the following three thresholds: top 0.5%, 1%, and 5% users in 2003. We applied a logistic regression model, given that it is the standard approach to analyze dichotomous outcomes. We conducted all statistical analyses using SAS™ software version 9.1. Performance of five risk adjustment models was evaluated from three aspects: statistical indicators, proportions of true cases identified by models, and characteristics of predicted cases. Statistical indicators included C-statistics, sensitivity, and predictive positive value [1], and the thresholds for calculating statistical indicators were set as the corresponding levels of outcomes. The c-statistic represents the area under the Receiver Operating Characteristic (ROC) curve, and hence provides an overall measure of model performance; in addition, the c-statistic is also independent of other conditions. Actual 2003 top users were assigned to one of four mutually exclusive categories: in 2002 top user group alone, in predicted top user group identified by risk adjustment models alone, in both groups, or in neither group ( Figure 1). The real contribution of risk adjustment models comes from those identified by models alone, because these subjects may not be known without applying risk adjustment models (area a in Figure 1).
After a group of high-expenditure users was identified by each model, it was also common to examine the characteristics of this population as an alternative method to assess the model performance.  [9]. A better risk adjustment model will have higher total and drug expenditures, higher top/bottom expenditures ratio, and higher year-2/year-1 expenditures ratio (subjects with expenditures increasing over time are better targets for intervention). In addition, the proportion of identified high-expenditure subjects with manageable diseases (asthma, COPD, hypertension, depression, or diabetes) is also another important indicator and extensively used [3,10]. Split analysis (a randomly selected 70% of study subjects were used for model development; the rest were set aside for model validation) was performed and measures of model performance were obtained from the validation set to avoid overfitting.

Results
Characteristics of the population ( Table 1) About half of the study subjects were male and 40% were insured. The mean age in 2002 was 35 years and 10% were elderly. About one-third lived in the areas within the Taipei Branch; only 2% were from the Eastern Branch. About 64% were living in rural county areas. Only 10% had not made any outpatient visit; 7% had at least one inpatient stay. About 90% had non-zero total expenditures and a similar percentage had nonzero drug expenditures. Average annual total expenditures were about 14,700 NTDs, among which medical expenditures (10,200 NTDs) were much high than pharmacy expenditures (4,100 NTDs).
C-statistics, sensitivity, and positive predictive value ( Table 2) A similar trend was observed across three outcome thresholds (top 0.5%, 1%, and 5% of actual users): more comprehensive models performed better than simpler models in terms of C-statistics. The largest increase was from model 1 to model 2 (~0.06 point), and then from model 2 to model 3 (~0.04 point); the performance of the most comprehensive three models were separated by only 0.01 points. C-statistics in model 1 and 2 increased while those in two prior expenditures models decreased as outcome thresholds were relaxed; those in model 3 remained similar. When the outcome was defined as top 0.5%, 0.1% and 5% of actual 2003 users, c-statistics in model 5 reached 0.913, 0.907, and 0.897, respectively; those in model 1 were 0.773, 0.797 and 0.815, correspondingly.
To calculate sensitivity and PPV, a threshold to define top users was necessary. We used two thresholds in this study for each outcome: the actual proportion of the defined outcomes and the top 5% of identified cases. Results showed that the stricter the threshold, the lower the sensitivity but the higher the PPV. Similarly, sensitivity and PPV went up as the comprehensiveness of the model increased, regardless of outcomes or thresholds. The biggest increase in sensitivity and PPV was from model 2 to model 3, while those in model 4 and 5 were close. When threshold cutoff points were fixed at the top 5%, it showed that as outcome standards relaxed, PPV increased across all models, but sensitivity increased in the demographics-only model and reduced in more comprehensive models.
Proportion of true cases identified by risk adjustment models and prior top user status (Table 3  3. Less than half of top users in the current year were also top users in the previous year (47.9%, 44.3%, and 47.6% when the threshold for top users was set as 0.5%, 1%, and 5%, respectively). The proportion of true cases identified by risk adjustment models ( Figure 1: area a plus area b) increased as the comprehensiveness of the risk adjustment model increased, regardless of outcome standards. We also found that the proportion of true cases that could be identified by prior expenditures status was always larger than that proportion by risk adjustment models (Figure 1: area c > area a), especially in simple ones. In addition, the proportion of true cases identified solely by risk adjustment model (Figure 1: area a) was low, and the difference between three models seemed to decrease as the outcome standards relaxed. For example, among 2003 top 1% of actual users, only 5.08% could be identified by the demographics model, 13.82% by model 2, 31.30% by model 3, and the proportion identified by risk adjustment models alone was much lower: 2.64%, 5.08%, and 6.10%, respectively. Among 2003 top 5% of actual users, 25.78% could be identified by the demographic model, 36.52% by model 2 and 41.67% by model 3; the proportion that could be identified by risk adjustment model alone was similar across three models, 10.5%.
Characteristics of predicted cases by risk adjustment models (Tables 4 &5) Across three outcome standards, total expenditures, drug expenditures, and top/bottom expenditures ratio generally increased as the comprehensiveness of the risk adjustment model increased. There was a large increase in total expenditures, drug expenditures, and top/bottom expenditures ratios from model 1 to model 2, and from model 2 to model 3; two models with prior expenditures stayed relatively the same. As the outcome standard was relaxed from the top 0.5% to the top 5%, there was a general trend for average total and drug expenditures in identified top groups by different models to decrease. Such a decreasing trend became more obvious as the comprehensiveness of risk adjusters increased. For example, the average total expenditures in the demographically-identified top group decreased from 58,000 to 51,590 NTDs, but in the top groups identified by model 5 it decreased from 391,583 to 111,235 NTDs. In addition, the top/bottom expenditures ratio was much higher in total expenditures than in drug expenditures when the outcome was set at the top 0.5% in all but the demographics model (30.7 for total expenditures and 19.3 for drug expenditures in model 5); when the outcome was set at the top 5%, however, the top/bottom drug expenditures ratio was comparable to or higher than the total expenditures ratio (11.6 for total expenditures and 12.6 for drug expenditures in model 5).
In addition to expenditures, we examined the prevalence of five commonly manageable diseases (asthma, hypertension, depression, COPD, and diabetes) among predicted top groups (Table 5). Across three outcome levels, predicted top groups identified by diagnosis-    Figure 1 based and prior-expenditures models overall had more manageable diseases than the actual top groups. Those identified by diagnosis-based models had the highest number of manageable conditions compared to models including prior expenditures. On average, the predicted top 0.5% and 1% group by model 2 had 1.33 and 1.35 conditions, respectively; the predicted top 5% group by model 3 had 1.46 conditions; the predicted top groups by the demographic model had the lowest number of conditions across three outcome levels (all less than 1). When looking at specific conditions, the predicted top groups by diagnosis-based only model generally had higher prevalence of asthma, hypertension, and COPD; those by models including prior expenditures usually had higher prevalence of depression. For example, among the predicted top 1% groups, the prevalence of asthma, hypertension, and COPD were highest in the ACG-identified groups, reaching 14%, 55.5%, and 30%, respectively; the prevalence of depression was highest in the group identified by the model 5, reaching 8.8%. In addition, other than model 1 and 2, the predicted top groups by the remaining three models had higher prevalence of five conditions as the outcome threshold relaxed. For example, among the top groups identified Among the predicted top groups across all outcome thresholds and diagnosis-based/prior-expenditures risk adjustment models (demographic model not included), hypertension was the most prevalent condition, with more than 50% having hypertension; diabetes was the second, with about 30%; the third was COPD, ranging from 20% to 30%;~10% had asthma, and somewhat fewer than 10% had depression.

Discussion
The results showed that both diagnosis-based and prior expenditures models performed much better than demographic models in predictive modeling, based on virtually all measures evaluated in the study (statisticsbased indicators, expenditures indicators, and prevalence of manageable diseases in top groups identified by models). Diagnosis-based models performed better in identifying high-expenditure users with manageable diseases; prior expenditures models were better in statistics-based indicators and identifying people with higher average expenditures. Prior expenditures status could correctly identify more actual high-expenditure users than diagnosis-based or demographics models. The proportions of true high-expenditure users that could be identified by diagnosis-based models alone were much lower than that by prior expenditures status.
In Taiwan, the degree of the concentration of medical expenditures on a small group is comparable to what has been observed in the United States: the top 0.5% consumed somewhat more than 20%, the top 1% consumed about 30%, while the top 5% consumed more than 50% of total medical expenditures. However, the next-year medical expenditures incurred by current high-expenditure users was much higher in Taiwan compared to the United States: in Taiwan, the year 1 top 0.5% group consumed 21.09% in current year and 14.53% in year 2; in the United States, the comparable group consumed about 20% and 7%, respectively. In addition, prior expenditures were also strongly related to current expenditures (Pearson's correlation coefficient between 2002 and 2003 total expenditures: 0.64), and about 50% of high-expenditure users in 2002 remained so in 2003. Therefore, the performance of models including prior expenditures should be better in Taiwan than in the United States. This strong correlation also led to the situation where the top groups identified by models including prior expenditures had higher prior expenditures than those identified by models without prior expenditures. In this study, their year 2 expenditures were higher than year 1, so that it showed a trend of decreasing expenditures over time. Users with expenditures decreasing over time may not be good candidates for interventions because their expenditures are going down already without interventions; they may not even need interventions to bring down their medical expenditures.
On the contrary, diagnosis-based models are better in catching people with 'manageable' conditions and those with increasing expenditures trends. It is critical that subjects predicted to be high-expenditure cases by models are "intervenable" so that their health status and medical utilization can possibly be improved and controlled through managed care or disease management programs. For example, an individual with a serious car accident will have very high medical expenditures in year 1, and will be predicted to be a high-expenditure user in year 2 if prior expenditures are included in the model. It is of little use to identify such a person because nothing much can prevent a car accident from happening (assuming car accidents occur by chance), and his/her medical expenditures will go down naturally in the following year without any intervention.
Part of the reason that the diagnosis-based model is better in catching more people with selected conditions is endogeneity. All five 'manageable' conditions were chronic, and individuals with any of the five conditions were more than likely to have condition-related diagnosis codes on their medical records over a yearly period. Individuals with claims data containing diagnosis codes related to these five conditions were then used as input for diagnosis-based risk adjustment models to identify high-expenditure users. Since patients with any of the five conditions were more likely to consume more resources, they were more likely to be included in identified top groups. And then, the same diagnosis codes were again used to distinguish whether identified subjects had these five conditions.
Without risk adjustment models, the best that health plans can do to identify potentially high-expenditure users is to rely on prior expenditures status. Therefore, the value of risk adjustment models partially lies in the ability to discover what otherwise would not have been found if risk adjustment models were not used. Overall, diagnosis-based models correctly identify a much higher number of high-expenditure users than the demographic model. However, the proportion of actual top users that can be identified solely by diagnosis-based risk adjustment models is only slightly more than what the demographic model can achieve. This is mainly due to the fact that there is a much higher overlap of top predicted groups between diagnosis-based models and prior expenditures status compared to that between the demographic model and prior status (area b in Figure 1). Even though diagnosis-based models do not outperform demographic models in identifying more top users with the existence of prior expenditures information, diagnosis-based models are still better choices given the ability to identify subjects with more 'manageable' conditions and higher total expenditures.
It has been shown in the United States that there is little agreement between different models regarding who is identified as top users [3,[8][9][10]14,15]. For example, only 0.19% of total subjects are identified as top users in both prior expenditures and diagnosis-based (DCGs) models when the outcome is set as the top 0.5% (the perfect match would lead to 0.5%) [3]. In Taiwan, the same phenomenon also exists ( Table 6). Taking the top 0.5% as an example, the overlap of identified top users between two diagnosis-based models was only 0.07% while that between two models including prior expenditures was also not perfect (only 0.36%). The overlap between the prior expenditures and the comprehensive diagnosis model was 0.19%. Given that there is little overlap and that top groups identified by different models have different characteristics, it is important for policymakers to clarify what the purpose of predictive modeling is before they make decisions on which model to use. If higher total expenditures is preferred, it is crucial to include prior expenditures in the model; if more manageable diseases are preferred, a diagnosis-based model will be more effective.

Limitations
Predictive modeling presents an opportunity to reduce medical expenditures by identifying a very small number of potential top users regarding whom health plans can take actions. Therefore, identifying potentially top users is only the first step; what will be done after that also plays an important role in determining how much reduction in medical expenditures can possibly be achieved. No matter how perfect the risk adjustment models can be in identifying high-expenditure users, if no effective programs are implemented, the ultimate goal of containing the expansion of medical expenditures will not be realized.
The ACG risk adjustment system was developed using U.S. data; it was not calibrated for how the healthcare system works in Taiwan, so the system may not fit Taiwan's claims data well. However, in prior research it has been shown that the performance of the ACG system in explaining medical utilization in Taiwan was similar to that in other countries [16][17][18]. In addition, it was also assumed that the claims data obtained from BNHI would be comprehensive enough to capture all important diagnosis codes that may affect the patients' morbidity status. Given comprehensive benefits, easy access to medical care and the low cost of seeking care under NHI, this may not be a big concern in the study.

Future Research Directions
Many risk adjusters can be used for predictive modeling; however, in this study we included only diagnosis information and prior expenditures. Other than diagnosis information, drug information was also readily available in Taiwan. Therefore, it will be interesting to evaluate how much the performance of risk adjustment models will improve once pharmacy information is also included. It is also better if the model can identify a group of top users with consistently high expenditures over time because it takes time for interventions to show their effectiveness. So, to analyze medical expenditures incurred by predicted top groups from different models-not only in the next year but also several years after-is an important next step to evaluate how risk adjustment models really work.

Conclusions
Predicted top groups identified by different models have different characteristics and there is little agreement between modes regarding who would be potentially top users. Diagnosis-based models tend to identify people with more 'manageable' diseases; models with prior expenditures are more likely to identify people with higher expenditures. Therefore, which model to use should depend on the purpose of the application. The prior expenditures approach is a more powerful tool than diagnosis-based risk adjusters in terms of correctly identifying actual highexpenditure users. The proportions of actual top users that can be identified by diagnosis-based risk models alone are much lower than those by prior expenditures status alone. There is still much room left for improvement of diagnosis-based models in predictive modeling.

5.00%
*: There were several observations with exactly the same predicted value at the threshold.
International Classification of Diseases; NTD: New Taiwan Dollar; PM: predictive modeling; PPV: positive predictive value