Disease identification based on ambulatory drugs dispensation and in-hospital ICD-10 diagnoses: a comparison

Background Pharmacy-based case mix measures are an alternative source of information to the relatively scarce outpatient diagnoses data. But most published tools use national drug nomenclatures and offer no head-to-head comparisons between drugs-related and diagnoses-based categories. The objective of the study was to test the accuracy of drugs-based morbidity groups derived from the World Health Organization Anatomical Therapeutic Chemical Classification of drugs by checking them against diagnoses-based groups. Methods We compared drugs-based categories with their diagnoses-based analogues using anonymous data on 108,915 individuals insured with one of four companies. They were followed throughout 2005 and 2006 and hospitalized at least once during this period. The agreement between the two approaches was measured by weighted kappa coefficients. The reproducibility of the drugs-based morbidity measure over the 2 years was assessed for all enrollees. Results Eighty percent used a drug associated with at least one of the 60 morbidity categories derived from drugs dispensation. After accounting for inpatient under-coding, fifteen conditions agreed sufficiently with their diagnoses-based counterparts to be considered alternative strategies to diagnoses. In addition, they exhibited good reproducibility and allowed prevalence estimates in accordance with national estimates. For 22 conditions, drugs-based information identified accurately a subset of the population defined by diagnoses. Conclusions Most categories provide insurers with health status information that could be exploited for healthcare expenditure prediction or ambulatory cost control, especially when ambulatory diagnoses are not available. However, due to insufficient concordance with their diagnoses-based analogues, their use for morbidity indicators is limited.


Background
Building health indicators, managing health care and prevention, and adjusting for insurers' risks require the assessment of morbidity burdens [1]. Demographic variables do not account sufficiently for the discrepancy in health service use and costs, overestimating cost variations between care providers and misidentifying outliers [2,3].
Most developed countries have minimal data sets on inpatient morbidity and causes of death. Outpatient morbidity information is scarcer except for cancer registers and contagious infections, which are subject to mandatory declaration. National health surveys have been conducted to estimate the prevalence of chronic illnesses but such expensive and time-consuming studies are generally not feasible on an ongoing basis [4,5]. Although the increased use of electronic medical records (EMR) by primary physicians has the potential to collect clinical information in large populations, the identification of a particular disease within an EMR often remains far from straightforward [6,7].
Current patient classification systems are mainly based on diagnoses information. In the USA, Medicare and Medicaid databases and some private health insurance or maintenance organizations routinely record ambulatory diagnoses. In Switzerland, as in many other countries, such records are missing mainly because data collection is time-consuming, costly and not always reliable [8,9].
Thence the growing interest in measures based on drug prescription data, often routinely collected by insurers; they may also provide information on well-controlled diseases, which are frequently under-declared by physicians [10,11].
Most medication-based classification systems are derived from the chronic disease score (CDS) developed by Von Korff et al., with a fair prediction of hospitalization, mortality, the number of ambulatory visits and costs [12][13][14]. Improvements now include a wider range of drugs, new scores, and extended application to various populations (pediatric, Medicare and Medicaid, veterans, European countries) [15][16][17][18]. For example the "Rxrisk" model developed by Fishmann included 55 therapeutic categories. It was designed to predict future health costs and thus restricted to chronic diseases [19].
Only a few studies on selected populations have checked criterion validity by comparing drugs categories head-to-head with their diagnoses-based analogues [18]. As measured by the Kappa coefficient (< 0.4), 40% of the "Rxrisk" categories seldom matched with their ICD-9-CM based counterparts. Drug rates provided a valid estimation of diagnosed and treated prevalence for only few medical conditions [20,21].
Many drugs-related classification systems were built on national drug nomenclatures [14,17]. However, since indications for certain agents differ depending on how they are administered, names alone do not adequately express a condition. Pharmacy-based models should be regularly updated and validated to verify that they are not sensitive to practice variations.
The overall aim of our work was to develop a clinically relevant drugs-based case mix measure, derived from the WHO Anatomical Therapeutic Chemical (ATC) classification of drugs [22]. Diagnoses information not being available for ambulatory care, we limited the accuracy assessment of disease detection to the hospitalized population. Testing the performance of drugs-based patient classification systems to predict ambulatory resources or health outcomes was beyond the scope of our work.

Setting
Our study is an observational study based on routine data from four Swiss health insurers on approximately 2.0 million insured enrollees of whom 1.7 million were followed in 2005 and 2006. Among the latter, all insured hospitalized at least once in a Swiss hospital were retained. Data were collected with the support of the Swiss Federal Office of Public Health [23]. Dispensed medication was identified by a national product code (pharmacode). As in Switzerland pharmacists systematically send drugs codes and dispensation dates to insurers for billing purposes, we expect only minimal inaccuracies in those data. All Swiss citizens are covered by compulsory insurance and thus have unrestricted access to drugs. Few drugs dispensed in outpatient hospital settings (e.g. anti-neoplastic) were not in the database. Pharmacy data were linked to corresponding hospital diagnoses codes (ICD-10) [24] via the anonymous linkage code procedure by the Swiss Federal Statistical Office (SFSO); only a sequential number was delivered [25]. Hospital data supplied by the Federal Statistical Office (inpatient diagnoses) are publicly available. Insurers' data (dispensed drugs) are not publicly available and were supplied only for the research project supported by the Federal Public Health Office, with the prerequisite of using the anonymous linkage code procedure of the Federal Statistical Office. All data were anonymous and did not include any element, which might allow identifying a single person (no birth date or ZIP codes for instance) [26]. More than 99% of the insured had a corresponding anonymous linkage code in the hospital SFSO data base. As in most other developed countries, only diagnoses which had an impact on the treatment of the patient were collected [27]. The space for recording diagnoses was limited to ten codes. These medical records were mainly used for epidemiological studies and hospital resources allocation (Diagnosis Related Groups). Several cantons allow physicians to dispense drugs to patients directly (selfdispensation). In such cases, information on dispensed drugs was limited to their costs. To avoid information bias, patients for whom self-dispensation represented more than 5% of drugs costs were excluded. For chronic diseases, all dispensed drugs were considered regardless of the dispensing and hospitalization dates, while for acute diseases only drugs dispensed two months before a hospital admission or after discharge were kept. In the event of several hospital stays of a same patient, all diagnoses and drugs related to each hospitalization were kept but the patient was considered as only one observation.

Assigning diagnoses and drugs to morbidity groups
Diagnoses consisted of over 16,000 ICD10 codes, too many to be manageable. Therefore, in a first step, they were grouped under the 130 categories of the International Shortlist for Hospital Morbidity Tabulation (ISHMT) recommended by the World Health Organization [24]. Diagnoses groups which usually do not require a specific drug treatment (trauma, surgical or obstetrical conditions, congenital malformations, and unspecified morbidities or symptoms) were excluded (Additional file 1). Morbidity groups were deduced from main and secondary diagnoses coded in hospital medical records.
All dispensed drugs were attributed to the ATC classification system, after the exclusion of other pharmaceutical products (dressings, homeopathy, herbal medicine, etc.). We mainly used the therapeutic subgroups (2 nd level of the classification, e.g. A10 = anti-diabetic drugs), but a higher level of classification was sometimes required to identify a specific disease (e.g. therapy against HIV disease). Subgroups that could not be matched with any morbidity categories (blood products, anesthesia, etc.) were not taken into account (Additional file 1).
We identified all diagnostic categories assignable to one therapeutic subgroup. Most other diagnostic categories were subdivided to match therapeutic subgroups. In other cases, we grouped diagnostic categories requiring similar treatment. For certain conditions, we combined scattered ICD-10 codes corresponding to similar treatments (e.g. bacterial infection or thrombo-embolic diseases). Face validity of each morbidity category inferred from drug information was ensured by thorough reviews conducted by a skilled physician (PH) and a skilled pharmacist (AD) as regards the clinical homogeneity of the condition and the labeled use of the drug.

Accuracy of morbidity groups identification
An algorithm identifying subjects with different conditions or diseases may be seen as a diagnostic test, and is generally assessed by the four estimates of diagnostic accuracy: sensitivity, specificity, positive and negative predictive values. These measures establish one of the classification procedures (here: morbidity categories based on inpatient ICD-10 codes) as a true gold standard. Yet we know that co-morbidities are often not recorded in hospital minimal data sets, especially if their impact on resource use is weak, or for patients with a more serious illness, where the severity of the condition and complications take precedence over chronic conditions in coding [28,29]. We thus focused mainly on the degree of agreement between the two morbidity classifications.
To ensure the identification of chronic diseases over both 2005 and 2006, a test-retest procedure was applied to all subjects classified in at least one drug-based category in 2005.
We computed the prevalence in 2005 of morbidities estimated from ambulatory drugs dispensation for all insured. Underestimation of cases due to the removal of subjects receiving medication directly from their physician was corrected via the assumption that morbidity distribution in this population was similar to the rest.
For comparisons against published estimates (external data), crude rates were standardized using gender and a distribution by five-year age categories of the Swiss population in 2005. When reference rates were available only for a specific population restricted to categories defined by age or sex, direct standardization of rates used only groups of this particular population.

Statistical analysis
The results according to drug and diagnoses based information may be arranged in a four cells table with proportions as shown in Table 1.
The Kappa coefficient (K c ), described by Cohen, is commonly used to assess agreement between two ratings [30]. It is obtained from the proportion of observed agreement, o = (a + d) and the expected agreement, e = (a + c) (a + b) + (c + d) (b + d), as follows.
Zero indicates only chance agreement, 1 indicates complete agreement beyond chance.
One limitation of K c is that the measure ignores the relative utility of false positives (b) versus false negatives (c). To deal with this problem we proposed the weighted Kappa coefficient K w, where the weight indicates the relative importance of false negatives versus false positives [31,32]. K 0 (weight = 0) is recommended if expected false negatives (PQ') have zero utility; K 1 (weight = 1) if expected false positives (P'Q) have zero utility (see Additional file 1 for details).
Use of drug-based morbidities for measuring health care indicators was grounded on the values of the three Kappas, Κ c , Κ 0 , Κ 1 . Agreement between illnesses screened by delivered drugs and inpatient-coded diagnoses must be high to compare similar clinical situations. As suggested by Landis and Koch, we considered that a K c over 0.4 describes a minimal level of agreement [33]. We relaxed this criterion if one of the weighted Kappas (K 0 or K 1 ) was respectively greater than 0.4 in the two following situations: existing alternatives to drug treatment explained a substantial proportion of false negatives (K 0 >0.4 is required), or the high risk of under-reporting a nonsevere morbidity explained a substantial proportion of false positives (K 1 > 0.4 is required). We analyzed false positives and negatives in order to locate the potential errors in the two screening methods and -if possibleto improve the screening algorithm based on drug information. In order to facilitate interpretation of the results, we chose to present the findings according to three potential fields of application of a heath status routine measure: morbidity indicators, ambulatory care cost control, and health insurers' risk adjustment ( Table 2, col. use). Morbidity indicators i.e., incidence or prevalence measures, (M in Table 2), require accurate disease detection, with K C greater than 0.4, or K 1 > 0.4, if false positives are explained by under-coded diagnoses. However, criteria can be relaxed to K 0 or K 1 > 0.2 for ambulatory cost control or risk adjustment (C and R in Table 2), since it is better to detect some morbidity rather than none. Note that only chronic diseases are relevant to insurers' risk adjustment, given that the aim of that procedure is to forecast costs. Finally, despite satisfactory concordance, caution should be exercised when considering conditions for which the indication of drugs might be uncertain or is prone to practice variations ( Table 2, footnote c).
Year-to-year category stability was measured by the Κ 1 coefficient. We expected chronic diseases detected in 2005 to also be detected in 2006, as reflected by a high K 1, i.e. >0.6 [33].
Cohen Kappa coefficients were given with a 95% confidence interval [34]. Table 3 lists morbidity groups with corresponding ICD-10 and ATC codes. Eighteen diagnostic categories of the ICD shortlist were left unmodified, and 31 were subdivided to fit drug categories. Five morbidity groups were built by grouping diagnostic categories, and six by grouping subcategories. Thus we obtained 60 morbidity groups derivable from drugs dispensation. Morbidity groups were attributed independently to all insured inpatients from coded diagnoses (Table 3, ICD-10 column) and dispensed drugs ( Table 3, ATC codes).

Results
The studied population included 108,915 insured enrollees followed throughout 2005 and 2006; they were hospitalized at least once, and did not obtain over 5% of their drugs via self-dispensation by a physician. Sixty four percent of hospitalizations (N = 70,083) were classified in at least one of the morbidity groups listed in Table 3 (average number of categories: 2.75). Eighty percent (N = 86,915) took a drug associated with at least one of these morbidity categories (average number of categories: 6.08). The mean age of the studied population was 53.1 years (SD 22.5), with a 44% proportion of men. Results for the 60 morbidity groups identified by drugs or hospitals diagnoses are given in Table 2. All K c confidence intervals were narrow, i.e. not more than 1% above or below the estimates.
Five morbidity categories (transplant, diabetes, HIV disease, hypertension and thyroid disorders) had a K c exceeding 0.4, justifying the use of drug-based information for morbidity indicators, cost control and risk adjustment purposes. As most antihypertensive agents are also recommended for heart failure, we combined those two categories and obtained an enhanced agreement between drugs and diagnoses.
Eight conditions can be treated without drugs (false negatives), justifying a K 0 greater than 0.4: no indication of long-term treatment for progressive multiple sclerosis, non-nutritional anemia, or chronic hepatitis, surgery for ischemic heart disease, conduction disorders, or malignant disease, and the psychosocial treatment of alcohol or opioid abuse. Considering chronic renal failure and chemotherapy as proxies of non-nutritional anemia (usual complications), we observed a significantly increased K 0 (0.63), suggesting a possible under-coding of the condition. All these conditions identified from drug dispensations correspond to actual morbidities, justifying their use for cost control and risk adjustment.
Five conditions were related to obvious under-coded diagnoses with a K 1 over 0.4: glaucoma, hyperlipidemia, functional digestive disorders, prostate hyperplasia, and osteoporosis. These conditions are often not coded because they seldom merit hospital treatment. We also found evidence of under-coding for four other morbidities with K 1 >0.4, as shown by the improvement of K 0 to fair values when the screen is restricted to those subjects who receive the most treatment, and were thus more likely to feature in the hospital data. Indeed, K 0 increased from 0.18 to 0.36 for mood disorders when the screen was restricted to patients taking three classes of psychotropic drugs; from 0.25 to 0.46 for reactive airway disease (RAD) when screening criteria required inhaled and systemic corticoids; from 0.31 to 0.43 for Alzheimer's disease when criteria included memantine use (indicated for more severe impairments); from 0.25 to 0.33 for inflammatory bowel diseases (IBD) when criteria included systemic corticoid or immune-suppressors in addition to the tracer drugs.
Three conditions were difficult to detect because some tracer drugs can be prescribed for other diseases, explaining false positives and the use of K 1 value as criteria. The following refinements of the algorithms significantly improved K c values without excessively lowering K 1 values (see Table 2): removing the association of an anticholinergic medication and a neuroleptic from the screen of Parkinson's disease (treatment of neuroleptic-induced extra-pyramidal symptoms); removing the association of neuroleptic and anti-cholinesterasic drugs from the screen of psychotic disorders (treatment of behavior disorders of dementia); removing the association of opioids with gabapentine or pregabaline from the screen of epilepsy (treatment of chronic pain). There was also some evidence of under-coding of epilepsy, because K o increased     Thyroid disorders (23-) E00-E07 H03   M05-M09, M30-M36, M45  H02A, H02B,L04AA13,L04AB01, L04AB04, L04AX03,  M01B, M01CA, M01CB, P01BA02   L04AX03+ M01B + M01CA+   Pain (80-83-,85- to 0.35 when we restricted the screen to patients with multiple antiepileptic drugs. Two morbidity groups, for which tracer treatment is mainly preventive (thromboembolism risk or disease, and diseases of esophagus/peptic ulcer), had high K 1 but very low K 0 (<=0.10). Extending ischemic heart disease and thrombogenic cardiac arrhythmias to thrombo-embolic disease increased all Kappa values, suggesting that secondary prevention is often the treatment aim. Enhancement in accuracy of peptic disease screening by removing patients taking a non steroidal anti-inflammatory (primary prevention hypothesis) was negligible.
Thirty two morbidity categories exhibited poor fit between diagnoses and drugs. Most of them were acute conditions requiring time-limited drug treatment or minor conditions seldom occurring or collected in hospitals. We considered that if such conditions (migraine, mycosis, acne for instance) had K 1 >0.2, they corresponded to an actual morbidity, but had not been recorded in the hospital data. However, the uncertainty surrounding the appropriateness of treatment precluded the adoption of several groups in spite of fair K 1 (see Table 2, last column). The same limitation prompted cautious use of most categories detected by tracer drugs that have multiple indications reaching acceptable K 1: hypnotics, painkillers, nutritional supplements.
At the end of the analysis, 15 morbidity groups were retained for morbidity indicators ( Table 2, letter M), to which 16 were added for insurers' risk adjustment (letter R) and six further groups for ambulatory cost adjustment (letter C). Twenty three were not retained because of their poor accuracy or their treatment was prone to practice variations. Table 4 shows the reproducibility of drug information based morbidity categories from one year to the next. Chronic conditions, for which drug-based information performed the best were also more reproducible. All acute conditions had poor reproducibility.
Morbidity prevalence (crude and adjusted) inferred from 2005 drug information for the whole insured population (N = 2,028,620, mean age 55.5, SD age 22.7, men 39.3 percent) are shown in Table 5 and compared with available national values or estimates from other external sources. All of our estimates were fairly close to the reference estimates, with only a few exceptions, Parkinson and transplants were overestimated (by a factor of 2 and 3 respectively), whereas HIV, Alzheimer's disease, prostate hyperplasia and osteoporosis were underestimated by a factor between 2 and 4.

Discussion
Our drugs-based morbidity groups include the majority of chronic categories of the most recent CDS derived tools [12][13][14][15][16][17][18][19]. The few CDS categories our classification ignored were those deduced from devices or nonpharmaceutical prescriptions (urinary incontinence, ostomy, neurologic bladder, malnutrition), and three with poor screening (pancreatic insufficiency, hyperkaliemia and liver failure, the latter removed by other authors due to different indications of amonemia detoxicants in many countries) [17]. A few chronic illnesses and seventeen acute conditions were added. Although we retained most ATC codes of the revised CDS of Kuo and al, many drugs were added and some removed [14].
Fifteen chronic conditions ( Table 2, M in last column) exhibited sufficient agreement with their diagnoses-based counterparts to be considered alternative strategies to diagnosis for capturing similar populations. Furthermore, all chronic conditions but four (IBD, mood disorders, prostate hyperplasia and RAD) exhibited substantial or almost perfect reliability on test-retest procedure. Finally, prevalence estimates largely agreed with estimates from national epidemiological studies whenever prevalence information is available in Switzerland. Similar age and gender distribution was found for diabetes [39,40], treated hyperlipidemia [35,51], treated hypertension [35,52], IBD [47] and reactive airway disease [38]. Our HIV disease prevalence agrees with the national estimate of 0.3%, in view of the fact that 75% of subjects registered in the Swiss HIV cohort and thus subject to close follow-up and compulsory treatment receive an antiretroviral therapy [53]. The moderate reproducibility of IBD, mood disorders and RAD may reflect the fact that therapy varies depending on the severity of conditions like these, which are characterized by exacerbation and remission periods [54]. Regarding prostate hypertrophy this may reflect failed treatment and a switch to surgery [55].
For nine conditions (see Table 2, K o > 0.4 and indicated in the last column by C, R), drug-based information accurately identified only a subset of the population defined by diagnoses. These drugs are often used in a specialist care setting (e.g. chronic hepatitis, malignant neoplasm, multiple sclerosis, neutropenia) and thus might detect individuals with special or more costly care needs. Most conditions had only fair or moderate test-retest reliability reflecting time-limited treatment. Although their ability to describe the distribution of illnesses is poor, these categories fit medical conditions sufficiently well to be used to analyze medical practices and risk adjustment, providing better identification of the severity of a disease than only inpatient diagnoses.
Restricting tuberculosis screening to active cases -i.e. treated by more than one therapeutic agent -enhanced accuracy; low sensitivity may be due to time-limited therapy. However, our estimated prevalence agrees with the national estimates [50].
For all other conditions, agreement between hospital diagnoses and drug information was poor, which does not mean that information inferred from drugs is useless. The relationship between drug and diagnoses-based screening is not straightforward and would benefit by closer examination. Epilepsy and psychotic disorders are two examples where diagnoses-based conditions are much less frequent than drug-based conditions, but the interpretation is likely to differ. For epilepsy, where treatment is often continued several years after the last seizure, many subjects might have treatment renewed without a coded diagnosis. In such a case, drug-based information could offer a more complete overview of the condition. On the other hand, it seems unlikely that the diagnosis will be overlooked for psychotic disorders. Because they are used to treat other severe psychiatric disorders, antipsychotic drugs cannot be consistently associated with overlooked psychotic disorders.
Categories in which drug information detects a greater number of conditions than diagnosis pose a particular problem, since therapy might more effectively reflect medical practices than true conditions. For instance, primary prevention of peptic ulcer by means of nonsteroidal anti-inflammatory drugs does not provide evidence of an active disease, and therapy for functional dyspepsia may be disputable. If the acute nature of bacterial infection may explain the poor fit between diagnoses and drug-based information, the fact that antibiotherapy is often prescribed inappropriately is an important aspect when considering pharmacy-based screening. The high prevalence of other broadly defined conditions, e.g., pain, certain mental and behavioral disorders may also highlight their overestimation.
While some drugs-based morbidity groups were very specific (i.e. tuberculosis, vertigo, psoriasis, neutropenia), others were broad (viral diseases, malignant neoplasms). For the latter, more accurate information could be obtained from other data sources, including cancer registries. Missing information due to hospital drug dispensation might also explain the poor sensitivity of malignant neoplasm.
The most widespread application of disease status measure computable from routinely available data is to correct for confounding when comparing health care service indicators. Which rate of error is tolerable when estimating a population's health from drugs dispensation depends on the purpose of the indicator [1]. Morbidity indicators require satisfactory agreement between drugsbased and diagnoses categories (see Table 2). The basis of this interpretation is the Kappa thresholds defined by the authors. Only 15 categories fit this purpose. For other indicators, such a stringent criterion is not required. Inpatient diagnoses have limited pertinence, as only a minority of enrollees is hospitalized, particularly under age 65. Consequently, some data are better than none, but only if the measure creates no perverse incentives. Several categories that reflect medical practices rather than true morbidity must be viewed with caution, even when they have good predictive performance (overuse of medication).  The main limitation of our study is that we restricted the validation of drug-based classification to inpatients, thus underestimating specificity by under-reporting chronic co-morbidities. On the other hand, sensitivity is measured on a population suffering from more severe conditions and might be overestimated. There might be variations in the hospital coding practices across countries due to differences in coding rules, coding purposes such as hospital payment or the thoroughness of secondary diagnosis coding, which is sometimes limited by the data fields available. Therefore, caution should be exercised when generalizing our results to other countries. Some conditions, mainly those treated in ambulatory settings, are poorly represented. Further studies conducted in ambulatory settings might rate drugs-based acute or milder categories differently. Another problem is that we may be identifying incidental users of a drug rather than a real condition; for example, having been treated by a proton-pump inhibitor or a painkiller does not mean having a disease.

Conclusion
We defined sixty morbidity categories that may in theory be related to a particular drug signature that might be applied internationally. Drug information was a good proxy of diagnoses to identify 15 chronic conditions, providing useful information for epidemiological studies. Although the accuracy of detection was only fair, twenty-two other morbidities could also be exploited for health insurers' risk adjustment or ambulatory cost control. Several categories were excluded because they prone to variations in prescribing (pain, bacterial infection, peptic ulcer). Some acute diseases poorly represented in hospitalized populations should be studied further on outpatient samples. Further research should also focus on more detailed validation, e.g. using medical records or other more precise data.

Additional file
Additional file 1: Appendix A. Morbidity groups that cannot be inferred from drug dispensations. Appendix B. Drugs that did not screen for specific morbidities (ATC codes). Appendix C. Weighted Kappa.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions ES prepared data and participated in the statistical analysis. AD reviewed the correspondence between drugs and diagnoses based groups. PH and YE designed and conducted the study, and drafted the manuscript. All authors have read and approved the final manuscript.