Applying diagnosis and pharmacy-based risk models to predict pharmacy use in Aragon, Spain: The impact of a local calibration

Background In the financing of a national health system, where pharmaceutical spending is one of the main cost containment targets, predicting pharmacy costs for individuals and populations is essential for budget planning and care management. Although most efforts have focused on risk adjustment applying diagnostic data, the reliability of this information source has been questioned in the primary care setting. We sought to assess the usefulness of incorporating pharmacy data into claims-based predictive models (PMs). Developed primarily for the U.S. health care setting, a secondary objective was to evaluate the benefit of a local calibration in order to adapt the PMs to the Spanish health care system. Methods The population was drawn from patients within the primary care setting of Aragon, Spain (n = 84,152). Diagnostic, medication and prior cost data were used to develop PMs based on the Johns Hopkins ACG methodology. Model performance was assessed through r-squared statistics and predictive ratios. The capacity to identify future high-cost patients was examined through c-statistic, sensitivity and specificity parameters. Results The PMs based on pharmacy data had a higher capacity to predict future pharmacy expenses and to identify potential high-cost patients than the models based on diagnostic data alone and a capacity almost as high as that of the combined diagnosis-pharmacy-based PM. PMs provided considerably better predictions when calibrated to Spanish data. Conclusion Understandably, pharmacy spending is more predictable using pharmacy-based risk markers compared with diagnosis-based risk markers. Pharmacy-based PMs can assist plan administrators and medical directors in planning the health budget and identifying high-cost-risk patients amenable to care management programs.


Background
Pharmaceutical public spending is recognised as one of the main cost containment targets in the financing of national health systems such as that in Spain. However, studies aimed at understanding population patterns of drug consumption, costs and morbidity are uncommon. Recent pharmaceutical reimbursement reforms in Spain have focused on measures oriented towards the industry side, giving less importance to the role of patient need and demand [1]. Patient characteristics are not taken into consideration for pharmacy budget allocation to health centres either, ascribing fundamental weight to prior year costs [2].
By taking into account the risk factors for a covered population, prospective risk adjustment methodologiescommonly referred to as "predictive models" (PMs)-can be helpful for health budget planning and case profiling [3]. These tools can also be used as population-based screens meant to identify enrolees who pose a relatively higher risk of generating large future pharmacy expenditures [4,5]. Such patients would benefit from case management programs that could enhance primary care in order to improve the quality and coordination of services [6][7][8]. As far as physicians are concerned, PMs provide a means of considering health status when assessing drug expenditure, providing health care providers with incentives to be efficient in exercising pharmaceutical benefits.
In this manuscript, we use Adjusted Clinical Group PMs developed at Johns Hopkins University because of their acknowledged validity and extensive use as a risk adjustment technology [9,10]. To date, health services research in the Spanish context has focused on using diagnoses-based risk assessment variables [3,11]. However, risk assessment models such as the ACG-PMs, which integrate routinely collected pharmacy data, may be more appropriate. Chronic conditions are associated with specific pharmacotherapy protocols and, furthermore, diagnostic data often lack complete documentation [12]. This is due not only to the fact that chronic diseases are frequently not explicitly named in physicians' medical records [13], but also because individual data are frequently fragmented across different health care information systems-primary care, secondary care, emergencies etc.-. In contrast, studies carried out in Canada [14,15] showed that prescription claims represent one of the most accurate means of determining what drugs are dispensed to individuals, because pharmacists almost always dispense the medication that is prescribed and this information is reliably transmitted to the drug claims database.
Last, the application of these tools in a different health system from the one where they were originally developed requires the use of local weights for the analysed risk factors. Thus, the role of a calibration may also be appraised in order to maximise the predictive accuracy of the PMs.
The objective of this paper was first to assess the usefulness of incorporating pharmacy data into our efforts to better understand health care resource utilisation and patients' use of pharmacy resources in particular and, second, to evaluate the benefit of a local calibration of PMs in order to adapt them to the Spanish health care system.

Data Source and Study Sample
Data were obtained from the Electronic Medical Records of patients from six primary care health centres belonging to Aragon's Public Health Care System for the years 2006 (Year-1) and 2007 (Year-2). In order to increase the reliability of the data, health centres were selected according to their experience with the use of Electronic Medical Records, which, in all cases, was longer than three years. The sample was restricted to enrolees seen at least once by a public general practitioner (family doctor or paediatrician) during both Year 1 and Year 2, which resulted in a final sample of 84,152. Among the 84,152 patients 9.4% had no pharmacy expenditure in 2006 and 9.3% had no pharmacy expenditure in 2007. Data were obtained from administrative registries of the Aragon Health Care System after official request and authorization. Personal information was anonymised according to the Spanish Organic Law of Personal data Protection 15/1999. This work is part of a project funded by the Carlos III Health Institute which has been approved by the Ethics Committee of Aragon (CEICA).
The Year-1 inputs for the PMs were patients' gender, their age, all assigned diagnoses and the codes for the drugs they consumed according to the ATC Classification System [16]. Diagnoses were originally coded according to the International Classification of Primary Care (ICPC-2) [17]; the codes were subsequently transformed to correspond with the International Classification of Disease (ICD-9-CM) [18]. We used patients' annual pharmacy expenditures from Year-1 to calculate prior pharmacy cost predictors. Year-2 expenditures were only used as a validation tool, in order to assess the performance of the PMs.
To obtain information on pharmacy charges, data from Electronic Medical Records were crossed with the information in the National Pharmacy Database. Registries were made anonymous with the aim of ensuring the confidentiality of the information. These costs correspond to the retail price of drugs consumed and pharmacy accessories used by patients. Even if drug claims collect prescriptions carried out by both specialised and general practitioners, the majority of these prescriptions are filled by the latter (92.6% in 2006 and 92.7% in 2007, according to the Pharmacy Database of Aragon).

Risk Factors Within the Predictive Models
Based on the abovementioned input variables, the ACG case-mix system, version 8.1®, assigns risk measurement variables related to specific demographic, diagnoses, and pharmacy use patterns.
Diagnosis-based risk factors include Adjusted Clinical Groups implying the highest levels of medical need (ACGs are a series of mutually exclusive health status categories defined by morbidity, age, and sex [9,10]), specific Expanded Diagnostic Clusters representing uncommon diseases that signify high disease severity (EDCs are groupings of ICD-9-CM codes according to clinical similarity [19]), and the hosdom and patient frailty markers (the hosdom marker is a count of different morbidity types associated with a >50% probability of future hospital admission among patients and the frailty marker clusters diagnostic codes that indicate the presence of at least 1 to 11 frail conditions [19]).
As for medication-derived risk factors, several pharmacy-based morbidity groups (Rx-MG) are included. Rx-MGs are created to account for the anatomical-physiological system drugs act on, as well as the morbidity differentiation, the expected duration and the severity of the diseases to be treated using the medication. These four clinical dimensions not only characterise medications by morbidity type but also have major consequences for predictive modelling. Higher levels of differentiation and chronicity, as well as greater severity, would all be expected to increase resource use [19].
Whereas DxRx-PMs include the whole set of risk factors, Dx-PMs and Rx-PMs will only use diagnostic or pharmacy-based variables, respectively.

Data Analysis
To understand the differences between the U.S. and Spanish health care systems, we first described, for each of the settings, each patient's demographic and clinical characteristics from Year-1 as well as the distribution of the pharmacy expenditures from Year-2. U.S. data, which were originally used for the empirical development of the ACG-based PMs, were obtained from the PharMetrics Patient-Centric Database. This database represents the medical and pharmacy claims and enrolment records across the continuum of medical care for approximately 85 geographically diverse health plans within the U.S.
Pharmacy data were used as a means of assessing the validity of diagnoses recorded in physicians' medical records. Comparisons were made between patients identified as having specific chronic conditions using diagnosis codes (ICD-9-MC), pharmacy codes (ATC Classification System), and both diagnoses and pharmacy codes.
Next, we looked at how useful Rx-PMs were for predicting future pharmacy expenditures. Multivariate linear regression was employed to obtain r-squared statistics by regressing Year-2 pharmacy charges on the PMs' risk factors.
We also calculated predictive ratios (PRs) to assess model accuracy. We used Year-2 as the validation sample and applied models estimated from Year-1 to generate predictions for each patient and PRs for 3 types of cohorts defined by a) the presence of a relevant diagnosis, b) the presence of a relevant drug claim and c) pharmacy costs in Year-1 arrayed by percentiles.
Classification accuracy was examined using logistic regression in which the dependent variable was defined by assignment (or not) to the top 5% risk group in terms of Year-2 charges (high consumer yes/no). Model fit was examined through c-statistic, sensitivity and specificity parameters.
For each of the mentioned measures, results from a local calibration of the model were compared with results based on the U.S. health care system.
Finally, we performed a sensitivity analysis to measure how results would vary when using U.S. weights derived from a Medicare managed care population for individuals over the age of 65. This is an alternative to the commercial reference weights obtained from the Phar-Metrics Patient-Centric Database, which is available as part of the ACG software for predominantly elderly populations.

Results
The ACG case-mix system proved to have high performance in recognising and assigning 100% of the diagnoses and 92.5% of the pharmacy codes.
The Spanish and U.S. datasets had similar distribution in terms of age, sex and chronic conditions ( Table 1). With regard to the prevalence of cited diseases, numbers were comparable except for depression: compared with what was found in the U.S. data, the proportion of the study sample having this condition was nearly two times higher for the young Spanish population and around four times higher for the elderly Spanish population. In both Spanish and U.S. datasets, the distribution of the pharmacy expenditure showed a pronounced skewness, especially among the under-65 population, where nearly half of annual expenditures on prescription drugs were generated by only 5% of the study population. Regarding the elderly population, U.S. data showed a more skewed distribution, with half of the study population being responsible for almost 90% of pharmacy expenses. These numbers are consistent with the well-known distribution properties of pharmacy costs in a population.
With regard to comparisons between "who" was identified as having a particular medical condition using diagnostic data from electronic medical records and "who" was identified as having a medical condition using pharmacy claims, we found that, for several chronic conditions such as depression or asthma, some patients taking drugs have no diagnoses associated with these drugs. On the other hand, a high percentage of individuals to whom the diagnosis of hyperlipidaemia has been assigned are not taking lipid-lowering drugs according to pharmacy claims ( Figure 1). Across all PMs, the DxRx model explained the most variance (Table 2). However, the capacity of the RxPM to predict future pharmacy expenditures was almost as high as that of the DxRx combined model (40.6% vs. 42.6%). In all cases, models provided considerably better predictions when applying weights resulting from local calibrations. Table 3 shows means and PRs for Year-2 pharmacy expenditures based on information from the prior year. The most expensive of these groups (those who had congestive heart failure, were taking drugs for congestive heart failure or were in the top 1% of spending during Year-1) incurred costs between 4 and 15 times higher than the average of 390€. The diagnosis-based model (Dx-PM) predicted pharmacy expenditures more accurately for the diagnosis-identified groups, but somewhat underpredicted the groups defined by their use of drugs. Analogously, the drug-based model (Rx-PM) was more accurate with the groups defined by their use of drugs in comparison with the groups defined by medical conditions. The benefit of a local calibration was persistent in all cases. Table 2 also shows the area under ROC curve and sensitivity and specificity values of each of the models evaluated for the top 5% Year-2 cost group. Whereas the 3 models exhibited similar performance on the specificity test, performance on sensitivity diverged: the DxRx-PM and the Rx-PM using Spanish weights showed the best performance (53.2% and 52.3% respectively). This is confirmed by the corresponding ROC curve values (0.949 and 0.941 respectively). Once again, there was little improvement in sensitivity or ROC curve values for the Rx-PM when adding the diagnostic information with regard to the prediction of pharmacy expenditure. The results were notably enhanced when Spanish weights were applied.

Discussion
No previously published research assesses the usefulness of incorporating pharmacy data into prospective risk adjustment techniques in any national health system. To date, research in the European and Spanish contexts has focused on using diagnoses-based risk assessment variables [3,11]. However, our study, as well as others carried out within the Spanish National Health System [20], determined that the accuracy of the diagnostic codes allocated by primary health care physicians in their computerised medical records could be improved.
The results of this study confirm that much can be learned by looking at pharmacy data, especially when forecasting drug expenditures. Studies carried out in the U.S [21,22] and Europe [23] have outlined the potential of pharmacy data to improve the system of risk adjustment for both care management program capitation payments and pharmacy budget planning. This is of particular interest in a situation in which the information related to drug consumption is routinely recorded and automated, as it is in Spain. Moreover, the fact that the applied drug classification system (ATC classification) is uniform and used all over the country makes the use of pharmacy data very feasible-even more so if we consider the regular updating of the national ATC code databases issued by the Ministry of Health, according to the Guidelines of the World Health Organisation [24].
Nevertheless, several challenges are posed by medication-only PMs. First, using pharmacy data as a risk adjuster for resource allocation could create perverse incentives, encouraging inappropriately prescribed drugs  to be given higher budgets and promoting the inappropriate use of these drugs in the future. In consequence, if pharmacy data are to be used for budget allocation purposes, intensive monitoring activity will be required to prevent the inappropriate prescribing of drugs. As it happens in other European countries [23], in Spain, social security funds are not care providers themselves unlike Health Maintenance Organizations (HMO) in the U.S.; this may reduce the danger of inappropriate prescription behaviour. Second, pharmacy claims data may not always portray an accurate clinical picture, because some prescribed medications have multiple indications from which a patient's disease status must be inferred and because one disease may have many medication options in terms of its management [25]. One of the most important innovations presented by the Rx-PM from the ACG system is its clinically oriented approach, which captures the unique clinical information embedded in medication-use data instead of attempting to identify diseases/ conditions based on medications [19].
The benefits of local calibrations have become evident. As was the case with other risk adjustment tools, such as the Diagnosis-Related Groups used in acute care hospitals, adaptation processes have been developed by other countries on the grounds that the philosophy of health care, resource consumption patterns and funding approaches differ from those in the country where the tool was originally created [26]. Indeed, importing clinical predictors related to weights resulting from empirical evaluations carried out with U.S. cost data could lead to the incorporation of U.S. funding incentives and disincentives into the Spanish health care system. The results of our work show that the statistical performance of PMs was optimised using Spanish weights. This is due to the fact that local weights were calculated by regressing pharmacy cost data from our own health system on the explanatory variables for ACG-PMs.
Moreover, our sensitivity analysis demonstrated that these results were fairly robust even when using U.S. weights derived from a Medicare managed care population for individuals over the age of 65.

Limitations
The main limitation of this study is related to the infeasibility of applying a split half method, which involves dividing the total sample in two and correlating the results, as a way of assessing the reliability of a test. In order to apply this technique to our particular study, Nevertheless, the validity of the sample is backed up by the fact that the proportion of women, the age distribution, the prevalence of chronic conditions and the behaviour of patients with regard to pharmacy expenditure are consistent with those of previous studies carried out in the Spanish primary care setting [3,11].
Another potential limiting factor could be related to the relatively recent incorporation of electronic medical records into the primary care setting. Even if a series of inclusion criteria were applied during the health centre selection process to guarantee the quality and reliability of the clinical data, a three-year period of experience in the use of electronic records is still short enough that we might question the maturity of the information systems. This could lead to an overestimation of the clinical under-coding effect highlighted in this study. Thus, using even slightly more recent data could substantially boost model performance measures.
An additional reason for overestimating the undercoding phenomenon arises from the lack of connection between databases for primary and specialised care. Although the primary care general practitioners are considered the gatekeepers of the health system and would therefore need to have a recompilation of all diagnostic episodes of a patient, certain codes could be missing when these disease are followed by specialised physicians. In the study region, specialised physicians have poorer information systems than primary care physicians and, consequently, data are less available. Still, drug claims collect prescriptions carried out by both specialised and general practitioners. This situation could explain cases in which patients consume drugs for diseases that are not registered in general practitioners' office records, as reported in this paper.
Even if these two previous statements have been considered as potential limitations of the study due to their association with a poor quality of the data, they support the need to incorporate pharmacy data when carrying out risk adjustment.

Implications for the Spanish National Health System
When the target of a health care organisation is the management of an individual's medication use, predictive models based on pharmacy data are particularly useful. Adding diagnostic markers to medication data does not appear to improve predictions for pharmacy costs [21,22], which tend to show a pronounced degree of persistence from year to year, particularly among the heaviest users [27]. This has long been the rationale for using prior costs in themselves for budget planning by hospitals and the primary care setting in Spain [2]. In terms of statistical performance, prior cost is a fairly good predictor of future cost-even better than diagnostic or pharmacy-related variables [3]-, but it has some limitations. First, prior cost has no inherent clinical meaning, and is therefore of low relevance to clinicians who wish to intervene. It is not tied to morbidity and, thereby, cannot be translated into clinical action. Second, prior cost is subject to the phenomenon of regression to the mean (i.e., the natural tendency of groups of individuals who are high cost one year to move towards mean costs in the following years). Third, prior-use measures are not entirely appropriate as risk factors for risk-adjusted rate setting or profiling as they potentially could provide incentives to excessive and inappropriate pharmacy use.
Screening tools based on diagnostic or medication data can identify reliable "early warning signs" of future expenses that can then promote secondary prevention through patient care management [4]. Although the beneficial effects of care management have not been consistently demonstrated [28], preliminary evidence from an intensive nurse-based intervention for high-risk elderly individuals appears to show that it holds great promise in terms of cost reduction [29] and better quality of care [30]. A randomised clinical trial carried out recently in the Spanish primary care setting has confirmed the effectiveness of intervention strategies in decreasing the number of consultations of frequent attenders [31], paving the way for the implementation of further cost efficiency-focused strategies.
Moreover, the optimal predictive capacity of the ACG-PM proves its usefulness for future budget planning. This has been demonstrated to entail the largest impact on pharmaceutical revenues among existing pharmacy regulatory measures [32].
Last, PMs provide a means of determining physician prescription profiles while adjusting for patient casemix, so decisions about incentives, efficiency improvement efforts or even sanctions can be targeted towards the "right" physicians [33].

Conclusion
Pharmacy-based PMs can assist plan administrators and medical directors in planning the health budget and identifying high-cost-risk patients amenable to care management programs.
An application focussing on the total expenditure of the primary care setting is the next research approach required. While pharmacy spending is particularly predictable from drug data, diagnoses may be more useful than drugs for predicting other medical costs and total costs [4,22,34]. There could also be factors, such as behavioural risks or functional health status, that risk models do not yet capture well, and whose influence on pharmacy consumption ought to be considered in future studies.