Skip to main content
  • Research article
  • Open access
  • Published:

The validity of using ICD-9 codes and pharmacy records to identify patients with chronic obstructive pulmonary disease



Administrative data is often used to identify patients with chronic obstructive pulmonary disease (COPD), yet the validity of this approach is unclear. We sought to develop a predictive model utilizing administrative data to accurately identify patients with COPD.


Sequential logistic regression models were constructed using 9573 patients with postbronchodilator spirometry at two Veterans Affairs medical centers (2003-2007). COPD was defined as: 1) FEV1/FVC <0.70, and 2) FEV1/FVC < lower limits of normal. Model inputs included age, outpatient or inpatient COPD-related ICD-9 codes, and the number of metered does inhalers (MDI) prescribed over the one year prior to and one year post spirometry. Model performance was assessed using standard criteria.


4564 of 9573 patients (47.7%) had an FEV1/FVC < 0.70. The presence of ≥1 outpatient COPD visit had a sensitivity of 76% and specificity of 67%; the AUC was 0.75 (95% CI 0.74-0.76). Adding the use of albuterol MDI increased the AUC of this model to 0.76 (95% CI 0.75-0.77) while the addition of ipratropium bromide MDI increased the AUC to 0.77 (95% CI 0.76-0.78). The best performing model included: ≥6 albuterol MDI, ≥3 ipratropium MDI, ≥1 outpatient ICD-9 code, ≥1 inpatient ICD-9 code, and age, achieving an AUC of 0.79 (95% CI 0.78-0.80).


Commonly used definitions of COPD in observational studies misclassify the majority of patients as having COPD. Using multiple diagnostic codes in combination with pharmacy data improves the ability to accurately identify patients with COPD.

Peer Review reports


Chronic obstructive pulmonary disease (COPD) is a significant cause of morbidity and mortality in the United States and throughout the world [1]. COPD consumes substantial healthcare resources and is among the most expensive medical conditions in the United States [2, 3]. Due to the magnitude of the public health and economic burden of COPD, investigators are actively researching all aspects of the genetics, biology, pathophysiology, epidemiology, pharmacotherapy, and healthcare delivery of COPD [4].

The Global Initiative for Chronic Obstructive Lung Disease (GOLD), a partnership between the World Health Organization (WHO) and the National Heart, Lung and Blood Institute, define COPD as the presence of postbronchodilator airflow limitation documented as a fixed ratio FEV1/FVC < 0.7 on spirometry [5]. However, there continues to be disagreement among professional societies as to the optimal physiologic criteria to define COPD [6]. While investigators have performed spirometry in population samples quantify the prevalence of COPD throughout the world, such studies are expensive and time consuming to conduct [79]. The challenges of obtaining spirometry limit the ability of investigators to identify patients with COPD and investigate differences in COPD care practice across broad geographic regions within the United States.

Many investigators combat these issues by utilizing administrative databases to provide information about the epidemiology of and the care delivered to patients with COPD [10]. The literature is replete with examples of the use of COPD International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9) diagnosis codes to identify COPD cases[1015] and COPD exacerbations [1618]. Despite the common use of ICD-9 codes to define COPD in the literature, there is limited data on the validity of such codes [16, 1921]. Prior studies examining the validity of ICD-9 codes utilize medical record review[20, 21] or physician diagnosis[16] as the gold standard for COPD. There is limited data characterizing the performance of ICD-9 codes when spirometry is used as the gold standard [22].

We sought to develop a predictive model that would best identify COPD patients using administrative data when spirometry was the gold standard. We focused on determining the performance of outpatient and inpatient ICD-9 codes, but evaluated the ability of additional information, such as age, pharmacy records, and smoking status, to improve the performance of ICD-9 codes in identifying patients with COPD.


Study design

We conducted a secondary analysis of data collected as part of an observational study of medication adherence among patients with COPD.

Setting and participants

We utilized the Department of Veterans Affairs (VA) inpatient and outpatient databases to screen all patients receiving any inpatient or outpatient care at two VA medical centers in the Pacific Northwest between January, 2003 and December, 2007. We defined the index date when patients entered the sample as the date on which the first pulmonary function test (PFT) including spirometry was performed. We excluded all patients who did not receive postbronchodilator spirometry from our analysis. We also excluded patients with a past or current diagnosis of lung cancer and patients with a BMI < 15 or ≥55 as these patients may have evidence of airflow obstruction for reasons other COPD.

Data Collection and definitions

We collected demographic data, pharmacy records and the primary ICD-9 code for all outpatient and inpatient visits during the exact calendar date one year pre- and one year post the index date utilizing the VA computerized medical record system.

Patients with any of the following primary ICD-9 codes were considered to have a COPD-related visit: 491.xx - chronic bronchitis, 492.xx - emphysema, 493.2 - chronic obstructive asthma, 496.xx - chronic airway obstruction, not elsewhere classified. We did not include 490 - Bronchitis, not specified as acute or chronic in our administrative definition of COPD because the definition itself lacks specificity which increases the concern about misclassification [23, 24]. Outpatient primary and secondary ICD-9 codes were those recorded during a patient encounter in any outpatient clinic while inpatient primary ICD-9 codes were those recorded during an admission to the hospital. ICD-9 codes generated during visits to the pulmonary function laboratory were not considered in this analysis. Although secondary ICD-9 codes were considered for defining a COPD-related visit these were uncommonly (<7% of visits) coded by providers. Comorbid conditions relevant to patients with COPD were determined using ICD-9 codes for all previous outpatient visits in the one year period prior to the index date. These included a diagnosis of lung cancer (162.x, 163.x), acute coronary syndrome (410.xx, 411.xx), congestive heart failure (398.91,415.xx, 416.xx,425.x, 428.x), diabetes (250.x), hypertension (401.xx-405.xx), atrial fibrillation (427.xx), depression (311, 300.4, 296.2x, 296.3x), and schizophrenia (295.xx).

Smoking was assessed at the time of spirometry; patients were classified as never/former or current based upon self report. We determined the total number of metered dose inhaler (MDI) canisters prescribed over the two year period to each patient for both albuterol and ipratropium bromide (categorized as: albuterol - 0, 1-5, 6+ MDI; ipratropium - 0, 1-2, 3+) using the Veterans Integrated Service Network (VISN) data warehouse. The VISN data warehouse contains the complete pharmacy records for patients who filled prescriptions within the VISN region. These data include the drug name, class, prescription identification number, prescription fill dates (primary and refills), number of allowable refills, date of next allowable refill, amount dispensed, day supply, unit price of the medication and directions for use. Nebulized medications were not included in the calculation of these totals. Tiotropium was not included in our analysis as it was adopted slowly in the VA because of formulary restriction.

COPD criterion standard

First, we categorized patients using the GOLD criterion: postbronchodilator FEV1/FVC < 0.70 to define COPD. Although experts are actively debating which COPD criterion standard is optimal, GOLD is criticized for identifying false positives among older patients [25, 26]. The second definition used a postbronchodilator FEV1/FVC < lower limit of normal (LLN), where the LLN is defined by the referent equations of Hankinson, et al [27]. At both centers, spirometry was performed in accordance with the ATS guidelines for reproducibility [28].

Statistical analysis

Bivariate comparisons utilized the t-test or Χ2 test as appropriate for the distribution of the variable. We considered a p-value < 0.05 statistically significant.

Model development

We pre-specified an approach involving the sequential addition of variables to a multiple logistic regression model for each standard. We considered alternative approaches to model development including classification and regression tress (CART) and neural networks, but neither of these approaches has been consistently shown to outperform logistic regression for most classification problems in medicine [2931]. We first assessed a model containing a single variable representing the presence/absence of ≥1 outpatient COPD-related ICD-9 code. We then increased the number of codes represented by the indicator variable to ≥3 codes. We then evaluated the performance of ≥1 inpatient COPD-related ICD-9 code. Next, we added pharmacy variables and age to the prior models to characterize changes in the performance of the model after such additions. Because smoking status was assessed by interviewing the patient at the time of spirometry, it is not considered an administrative variable. Nevertheless, this variable may be available to investigators when identifying cohorts of patients with COPD so we added it at the last stage of model development.

We stratified all models by age ≥ 65 years by interacting age (≥65 years) with all variables in each model. This approach allowed separate coefficient estimates for patients ≥ 65 and <65 years of age allowing one to apply the model to Medicare and non-Medicare patients, but provides one number for each estimate of model performance (e.g AUC, Hosmer-Lemeshow, etc).

Model evaluation

Model performance was assessed by evaluating the sensitivity and specificity of each model (cut point for the predicted probability of COPD = 0.5), and the discrimination and calibration of each model. Discrimination was determined by calculating the area under the receiver operating characteristic curve (AUC or C-statistic). Calibration was assessed using the Hosmer-Lemeshow goodness-of-fit statistic. We calculated the Brier score for each model as an alternative measure of accuracy that incorporates features of discrimination and calibration as a single measure. The Brier score is calculated as the mean squared error of the model and describes the magnitude by which the predicted probability of COPD generated by the model deviates from the true COPD status of the patient. Because model performance rather than parsimony was our primary concern, we did not employ measures such as the Akaike information criteria during the model building process [32].

For the best performing model containing only administrative variables we determined the sensitivity, specificity, positive and negative predictive values for three cut points (0.25, 0.5 and 0.75) in the model-based predicted probability of COPD. These cut points illustrate the tradeoff in sensitivity and specificity of the model over the range of predicted values. Patients with a model-based probability of COPD greater than the cut point were classified as having COPD. Because the prevalence of COPD in our cohort was high, we also estimated the positive and negative predictive values for the best performing model using prevalence estimates closer to that experienced in the general population (10-20%) [7, 9, 25].

Validation and sensitivity analysis

We utilized the bootstrap (2000 iterations) to internally validate the best performing model. We selected this previously described approach[33] instead of split sample internal validation because it provides an more accurate, unbiased estimate of performance in external cohorts. No external validation was performed.

Finally, we performed two sensitivity analyses to assess the impact of our cohort definition on the results. Because BMI is not captured in most administrative data sources, we re-fit all models after including all patients regardless of BMI. Since the specificity of ICD-9 codes for COPD in younger patients may be low, we also re-fit each model after excluding patients who were < 40 years-old (n = 552).

All analyses were performed using Stata 10.0 (Statacorp, College Station, TX). The institutional review boards for the University of Washington and the participating Veterans Affairs centers approved of the study.


Bivariate analysis

We identified 12,205 patients referred for spirometry during the study period. We excluded patients who had a past or current history of lung cancer (n = 330), a BMI < 15 or ≥55 (n = 68), or no assessment of bronchodilator response (n = 2234). After these exclusions, the cohort contained 9573 (78.4%) patients with at least one postbronchodilator assessment (Additional file 1). Among patients assessed, 4564 (47.7%) had fixed airflow obstruction (FEV1/FVC <0.70). Patient demographics, comorbidities, and disease severity are shown in Table 1. Compared to patients with airflow obstruction, patients without airflow obstruction were younger, more often female, and had greater prevalence of diabetes and depression. Patients with obstruction were more likely to be current smokers at the time of PFTs. Patients with fixed airflow obstruction had a lower FEV1 compared to patients without obstruction (1.74 vs. 2.74 L, p < 0.001) respectively. The most common degree of obstruction among patients with obstruction was moderate (48%). Patients with fixed airflow obstruction were also more likely to be prescribed a greater number of MDIs for both albuterol and iprotropium bromide than patients without airflow obstruction.

Table 1 Characteristics of cohort by presence of fixed airflow obstruction (GOLD/ATS/ERS standard)

Multivariable analysis

The performance characteristics of the series of models we developed utilizing ICD-9 codes, pharmacy data, age, and smoking status are presented for each reference standard in Table 2. In general, ICD-9 codes by themselves exhibited a modest ability to classify a patient as having airflow obstruction, regardless of the standard (models 1-3), with outpatient codes providing better discriminative ability than inpatient codes (model 1 vs. model 4). Increasing the minimum necessary number of outpatient visits with a primary ICD-9 code for COPD to define airflow obstruction resulted in minimal impact on the AUC beyond that provided by the presence of one or more outpatient diagnostic codes (models 1-3). However, specificity of the model improved when more outpatient ICD-9 codes were required to define obstruction. When added to a model with ≥1 outpatient ICD-9 code, MDI canister counts improved the discriminative ability of the model (models 5, 6). Ipratropium bromide (model 5) appeared to improve the AUC to a slightly greater extent than albuterol (model 6) MDI canisters (AUC 0.77 vs 0.76, respectively). The best performing model utilizing only administrative data (model 8) included the following variables: ≥6 albuterol MDI, ≥3 ipratropium MDI, ≥1 outpatient ICD-9 code, ≥1 inpatient ICD-9 code, and age (model 8, AUC = 0.79, 95% CI 0.78-0.80, Table 2). The overall AUC was qualitatively larger for GOLD standard than for the LLN standard, although changes in the AUC were of similar magnitude when variables were entered into the model. The addition of self reported smoking collected at the time of PFT assessment minimally changed the AUCs and Brier scores for both standards.

Table 2 Sensitivity (sens), specificity (spec), discriminative performance (AUC) and calibration (Brier score, Hosmer-Lemeshow [H-L] goodness of fit p-value) for models based on two years of administrative data

The best performing model incorporating only administrative data (model 8) for both standards was well calibrated (Hosmer-Lemeshow statistic [GOLD p = 0.86; LLN p = 0.50]). The Brier score was lowest for these models as well (GOLD 0.187; LLN 0.187).

Coefficients for the best performing model utilizing administrative data are shown in Table 3. These coefficients are presented for both airflow obstruction standards (GOLD, LLN) and are stratified by age ≥65 years. Table 4 presents the sensitivity, specificity, positive and negative predictive values for various cut points in the model-based predicted probability of COPD generated by model 8 for each diagnostic standard. Utilizing the GOLD standard, setting the cut point in the model-based predicted probability of COPD at 0.25 resulted in a sensitivity (95% CI) of 91% (90-92%), specificity of 41% (39-42%), and positive and negative predictive values of 58% (57-59%) and 83% (81-85%) respectively. Setting the cut point higher in the model-based predicted probability of COPD resulted in greater positive predictive values for both airflow obstruction standards. Estimated PPV and NPV when the prevalence of COPD is closer to population-based estimates (10 or 20%) are presented in the Additional File 2.

Table 3 Logistic regression coefficients (beta) for GOLD and LLN diagnostic standard (model 8, table 2)
Table 4 Sensitivity, specificity, PPV, and NPV for various predicted probabilities of COPD from logistic model 8 by diagnostic standard

Bootstrap internal validations of all models resulted in insignificant changes in the AUC and are therefore not reported. Inclusion of all patients regardless of BMI resulted in no substantive changes in all models (data not shown). There were no substantive changes in the models' performance when we limited the cohort to patients over 40 years-old (Additional File 3).


Utilizing over 9500 VA patients with postbronchodilator spirometry we determined that ICD-9 codes have a moderate to good ability to discriminate patients who have fixed airflow obstruction from those who do not with outpatient codes offering better performance than inpatient codes. The addition of a patient's age and pharmacy data including the number of MDIs of albuterol and ipratropium bromide to outpatient and inpatient ICD-9 codes improves the sensitivity and specificity and the overall discriminative performance of a model used to identify patients with airflow obstruction. These variables showed similar performance when utilizing GOLD criteria for airflow obstruction compared to the LLN standard for airflow obstruction.

The use of ICD-9 codes to identify cohorts of patient with COPD using administrative data is common [1118]. Investigators and payers have utilized these codes to describe the epidemiology of COPD[12, 14, 15, 17, 24, 3436], to evaluate the effectiveness and safety of treatments in COPD[11, 13, 18, 37, 38], and more recently, as a means to assess the quality of care provided to patients with COPD [16]. In fact, the National Committee for Quality Assurance (NCQA) and the Agency for Health Research and Quality (AHRQ) both advocate for use of quality measures relying on ICD-9 code-based COPD case-identification [39, 40]. It is therefore surprising that the validity of both outpatient and inpatient ICD-9 codes for identifying patients with COPD has not been rigorously studied in large populations.

Most prior efforts to establish the validity of ICD-9 codes for COPD utilize chart review or physician consensus as the gold standard. One of the most widely referenced studies, conducted by Rawson and colleagues, utilized the 1987 Saskatchewan health care data files to assess the validity of inpatient COPD ICD-9 codes compared to both the patient's inpatient medical chart and provider service data [20]. Two hundred patient charts were randomly selected from the 4613 hospitalized patients with a primary ICD-9 code for COPD (n = 496). The charted discharge diagnosis from the patient's medical record showed exact agreement for 94.2% of these patients. However, overall concordance between physician documentation of COPD related care and hospital discharge COPD-related ICD-9 codes (490-493, 496) was 68%. An analysis by Ginde and colleagues utilized a similar approach to determine the positive predictive value for principle ICD-9 codes to identify acute exacerbations of COPD in the emergency department [16]. A random sample of 200 patients was taken from all 644 patients with a code for COPD (491.2x, 492.8, 496) at two academic medical centers between 2005 and 2006. Chart review for these patients was used to establish the gold standard for COPD exacerbation which was defined as: 1) the presence of a respiratory infection, 2) change in cough or 3) change in sputum with known physician diagnosed COPD. The overall positive predictive value for the presence of any of the specified codes was 97%. The positive predictive value for a code of 496 alone was 60% (95% CI 32-84%).

Finally, a more recent study using claims in Ontario, Canada examined the combination of ICD-9 outpatient codes and ICD-10 inpatient codes to identify patients with COPD cared for by community providers [19]. The combination of one or more outpatient ICD-9 codes (491.xx, 492.xx, 496.xx) or one or more inpatient ICD-10 codes (J41, J43, J44) had a sensitivity of 85% and specificity of 78.4% among 113 patients with COPD and 329 patients without COPD. An expert panel reviewed each patient's medical record to determine the gold standard for COPD. Spirometry was available in only 180 patients and details about its collection were not reported in the study. The study was further limited by employing ICD-10 codes which have yet to be universally adopted by many countries around the world.

While these studies outlined above suggest that ICD-9 codes can be used to accurately identify physician defined COPD, none universally employed spirometry to define the criterion standard for COPD. Physician diagnosed COPD may not be the optimal gold standard to define COPD. A number of previous studies highlight the difficulty physicians have in correctly indentifying COPD in the absence of spirometry. In North America only 20-30% of patients billed for a COPD-related visit have had spirometry to confirm or refute the diagnosis of COPD [12, 4143]. Up to 20% of physicians confronted with a standardized patient in a COPD exacerbation fail to correctly identify COPD as the cause of respiratory complaints [44]. These data raise concerns about the validity of the COPD gold standard used in prior studies examining the use of ICD-9 codes to identify patients.

The only study utilizing primarily spirometry to define COPD compared discrimination between patients with asthma versus patients with COPD. The accuracy of ICD-9 codes demonstrated excellent performance (AUC 0.98) for the calculated ratio of total COPD ICD-9 codes to total respiratory ICD-9 codes to differentiate patients with asthma from patients with COPD; however, this comparison cannot develop models to predict patients with COPD as the comparator was patients with asthma. Finally, unlike our study, which included over 9500 patients, this study was limited by its inclusion of only 151 patients with COPD [22].

Our study has several strengths. Our gold standard for COPD used the most rigorous definition possible - fixed airflow obstruction on spirometry and captures a large number of patients who had clinical indication for spirometry. This is contrast to many of the previous studies highlighted above.

Our results also have important implications for clinical investigators and health services and health policy analysts. We present the coefficients for a model incorporating administrative variables that can be used to accurately identify patients with COPD. This equation can be used by investigators to calculate the predicted probability of airflow obstruction within novel cohorts. The sensitivity, specificity, positive and negative predictive values for cut points in the model-based predicted probability of airflow obstruction will allow an investigator to maximize sensitivity or specificity depending on the needs of the study practice. For example, one might select a lower cut point (0.25) in the model-based predicted probability of airflow obstruction if utilizing this model to screen a clinical database to identify candidates for a COPD clinical trial. In this situation, maximizing sensitivity would capture the majority of patients with true COPD but at the cost of a large number of false positives. Study staff could access the medical records of these patients to eliminate people without airflow obstruction on spirometry.

We recognize several limitations to our analysis. First, we did not externally validate our model in alternative cohorts of patients. Model performance will likely drop when our model is applied to different patients as a result of geographic and temporal changes, differences in data definitions and case-mix. We assessed the optimism in the estimated AUC for our model utilizing the bootstrap which resulted in no appreciable change in the AUC, but recognize that external validation is a necessary step prior to widespread use [45]. Second, our model was derived on US veterans that were mostly older white men. This may limit the generalizability of our models if applied outside of the VA. In addition, the primary reason for collection of ICD-9 codes in VA patients is not for billing purposes. Differences in coding practice between the VA and other organizations capturing ICD-9 codes primarily for billing purposes may alter the performance of our models if applied outside the VA. Third, some degree of ascertainment bias is likely present, as we were unable to assess clinic visits and hospital admissions to non-VA facilities. Fourth, we collected ICD-9 codes from the one year pre- and one year post the date of spirometry, a time interval that may have reduced the sensitivity and specificity of the codes for COPD. For example, a provider may provide a COPD code on initial evaluation only to learn that spirometry rules out the diagnosis of COPD. Nevertheless, we believe the time interval we used is appropriate because it approximates how ICD-9 codes are screened in observational research and provides a conservative estimate of their performance.

Finally, we limited our cohort to patients referred for spirometry who received a bronchodilator during their test. This was done to ensure that we had a rigorous gold standard by which we defined COPD, but may limit the applicability of our model to only patients who are clinically referred for spirometry. Given the high prevalence of COPD in this population, and the VA more generally [46], the positive predictive value of our model will decrease if applied to a broader population. Several studies suggest that the prevalence of physiologically determined COPD is closer to 10-20%[7, 9, 25], which is considerably lower than the 48% prevalence observed in our sample. By limiting our analysis to only patients referred to spirometry we provide a conservative estimate of the models performance if applied to a general population. Discriminating patients with COPD from those without COPD among patients who are ill enough to be referred to spirometry is likely a more difficult task than discriminating COPD patients from those without COPD among all patients in a general population. Nevertheless, the estimates of the positive and negative predictive values will change when applying our model to cohorts with different COPD prevalence. Additional testing of our model in broader populations should be done prior to widespread use.


Administrative data are ubiquitous, are employed in all aspects of healthcare, and are frequently being used to understand the health and healthcare of patients with COPD. Healthcare payers, policy makers, and investigators using administrative data to study COPD rely upon valid assessment of disease status when conducting analyses. Currently used definitions of COPD in observational studies misclassify the majority of patients as having COPD. We determined that ICD-9 codes in combination with pharmacy data can accurately identify patients with COPD. Further validation of our model is required prior to its widespread application.



Agency for health research and quality


American thoracic society


Area under the receiver operating characteristic curve


Body mass index


Classification and regression trees


Confidence interval


Chronic obstructive pulmonary disease


European respiratory society


Forced expiratory volume in one second


Forced vital capacity


Global initiative for chronic obstructive lung disease


International classification of disease, 9th revision


Interquartile range


Lower limit of normal


Metered dose inhaler


National committee for quality assurance


Pulmonary function test


Veterans affairs


  1. Lopez AD, Shibuya K, Rao C, Mathers CD, Hansell AL, Held LS, Schmid V, Buist S: Chronic obstructive pulmonary disease: current burden and future projections. Eur Respir J. 2006, 27 (2): 397-412. 10.1183/09031936.06.00025805.

    Article  CAS  PubMed  Google Scholar 

  2. Druss BG, Marcus SC, Olfson M, Pincus HA: The most expensive medical conditions in America. Health Aff (Millwood). 2002, 21 (4): 105-111. 10.1377/hlthaff.21.4.105.

    Article  Google Scholar 

  3. Sullivan SD, Ramsey SD, Lee TA: The economic burden of COPD. Chest. 2000, 117 (2 Suppl): 5S-9S. 10.1378/chest.117.2_suppl.5S.

    Article  CAS  PubMed  Google Scholar 

  4. Maclay JD, Rabinovich RA, MacNee W: Update in chronic obstructive pulmonary disease 2008. Am J Respir Crit Care Med. 2009, 179 (7): 533-541. 10.1164/rccm.200901-0134UP.

    Article  CAS  PubMed  Google Scholar 

  5. Rabe KF, Hurd S, Anzueto A, Barnes PJ, Buist SA, Calverley P, Fukuchi Y, Jenkins C, Rodriguez-Roisin R, van Weel C, et al: Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary. Am J Respir Crit Care Med. 2007, 176 (6): 532-555. 10.1164/rccm.200703-456SO.

    Article  PubMed  Google Scholar 

  6. Celli BR, MacNee W: Standards for the diagnosis and treatment of patients with COPD: a summary of the ATS/ERS position paper. Eur Respir J. 2004, 23 (6): 932-946. 10.1183/09031936.04.00014304.

    Article  CAS  PubMed  Google Scholar 

  7. Halbert RJ, Isonaka S, George D, Iqbal A: Interpreting COPD prevalence estimates: what is the true burden of disease?. Chest. 2003, 123 (5): 1684-1692. 10.1378/chest.123.5.1684.

    Article  CAS  PubMed  Google Scholar 

  8. Halbert RJ, Natoli JL, Gano A, Badamgarav E, Buist AS, Mannino DM: Global burden of COPD: systematic review and meta-analysis. Eur Respir J. 2006, 28 (3): 523-532. 10.1183/09031936.06.00124605.

    Article  CAS  PubMed  Google Scholar 

  9. Buist AS, McBurnie MA, Vollmer WM, Gillespie S, Burney P, Mannino DM, Menezes AMB, Sullivan SD, Lee TA, Weiss KB, et al: International variation in the prevalence of COPD (The BOLD Study): a population-based prevalence study. The Lancet. 2007, 370 (9589): 741-750. 10.1016/S0140-6736(07)61377-4.

    Article  Google Scholar 

  10. Gershon AS, Wang C, Wilton AS, Raut R, To T: Trends in chronic obstructive pulmonary disease prevalence, incidence, and mortality in Ontario, Canada, 1996 to 2007: a population-based study. Arch Intern Med. 2010, 170 (6): 560-565. 10.1001/archinternmed.2010.17.

    Article  PubMed  Google Scholar 

  11. Macie C, Wooldrage K, Manfreda J, Anthonisen NR: Inhaled corticosteroids and mortality in COPD. Chest. 2006, 130 (3): 640-646. 10.1378/chest.130.3.640.

    Article  CAS  PubMed  Google Scholar 

  12. Han MK, Kim MG, Mardon R, Renner P, Sullivan S, Diette GB, Martinez FJ: Spirometry utilization for COPD: how do we measure up?. Chest. 2007, 132 (2): 403-409. 10.1378/chest.06-2846.

    Article  PubMed  Google Scholar 

  13. Lee TA, Pickard AS, Au DH, Bartle B, Weiss KB: Risk for death associated with medications for recently diagnosed chronic obstructive pulmonary disease. Ann Intern Med. 2008, 149 (6): 380-390.

    Article  PubMed  Google Scholar 

  14. Shaya FT, Dongyi D, Akazawa MO, Blanchette CM, Wang J, Mapel DW, Dalal A, Scharf SM: Burden of concomitant asthma and COPD in a Medicaid population. Chest. 2008, 134 (1): 14-19. 10.1378/chest.07-2317.

    Article  PubMed  Google Scholar 

  15. Shaya FT, Maneval MS, Gbarayor CM, Sohn K, Dalal AA, Du D, Scharf SM: Burden of COPD, asthma, and concomitant COPD and asthma among adults: racial disparities in a medicaid population. Chest. 2009, 136 (2): 405-411. 10.1378/chest.08-2304.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Ginde AA, Tsai CL, Blanc PG, Camargo CA: Positive predictive value of ICD-9-CM codes to detect acute exacerbation of COPD in the emergency department. Jt Comm J Qual Patient Saf. 2008, 34 (11): 678-680.

    PubMed  Google Scholar 

  17. Tsai CL, Sobrino JA, Camargo CA: National study of emergency department visits for acute exacerbation of chronic obstructive pulmonary disease, 1993-2005. Acad Emerg Med. 2008, 15 (12): 1275-1283. 10.1111/j.1553-2712.2008.00284.x.

    Article  PubMed  Google Scholar 

  18. Washko GR, Fan VS, Ramsey SD, Mohsenifar Z, Martinez F, Make BJ, Sciurba FC, Criner GJ, Minai O, Decamp MM, et al: The effect of lung volume reduction surgery on chronic obstructive pulmonary disease exacerbations. Am J Respir Crit Care Med. 2008, 177 (2): 164-169. 10.1164/rccm.200708-1194OC.

    Article  PubMed  Google Scholar 

  19. Gershon AS, Wang C, Guan J, Vasilevska-Ristovska J, Cicutto L, To T: Identifying individuals with physician diagnosed COPD in health administrative databases. Copd. 2009, 6 (5): 388-394. 10.1080/15412550903140865.

    Article  CAS  PubMed  Google Scholar 

  20. Rawson NS, Malcolm E: Validity of the recording of ischaemic heart disease and chronic obstructive pulmonary disease in the Saskatchewan health care datafiles. Stat Med. 1995, 14 (24): 2627-2643. 10.1002/sim.4780142404.

    Article  CAS  PubMed  Google Scholar 

  21. Mapel DW, Frost FJ, Hurley JS, Petersen H, Roberts M, Marton JP, Shah H: An algorithm for the identification of undiagnosed COPD cases using administrative claims data. J Manag Care Pharm. 2006, 12 (6): 457-465.

    PubMed  Google Scholar 

  22. McKnight J, Scott A, Menzies D, Bourbeau J, Blais L, Lemiere C: A cohort study showed that health insurance databases were accurate to distinguish chronic obstructive pulmonary disease from asthma and classify disease severity. J Clin Epidemiol. 2005, 58 (2): 206-208. 10.1016/j.jclinepi.2004.08.006.

    Article  PubMed  Google Scholar 

  23. Lacasse Y, Montori VM, Lanthier C, Maltis F: The validity of diagnosing chronic obstructive pulmonary disease from a large administrative database. Can Respir J. 2005, 12 (5): 251-256.

    Article  PubMed  Google Scholar 

  24. Camp PG, Chaudhry M, Platt H, Roch M, Road J, Sin D, Levy RD: The sex factor: epidemiology and management of chronic obstructive pulmonary disease in British Columbia. Can Respir J. 2008, 15 (8): 417-422.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Celli BR, Halbert RJ, Isonaka S, Schau B: Population impact of different definitions of airway obstruction. Eur Respir J. 2003, 22 (2): 268-273. 10.1183/09031936.03.00075102.

    Article  CAS  PubMed  Google Scholar 

  26. Swanney MP, Ruppel G, Enright PL, Pedersen OF, Crapo RO, Miller MR, Jensen RL, Falaschetti E, Schouten JP, Hankinson JL, et al: Using the lower limit of normal for the FEV1/FVC ratio reduces the misclassification of airway obstruction. Thorax. 2008, 63 (12): 1046-1051. 10.1136/thx.2008.098483.

    Article  CAS  PubMed  Google Scholar 

  27. Hankinson JL, Odencrantz JR, Fedan KB: Spirometric reference values from a sample of the general U.S. population. Am J Respir Crit Care Med. 1999, 159 (1): 179-187.

    Article  CAS  PubMed  Google Scholar 

  28. Miller MR, Hankinson J, Brusasco V, Burgos F, Casaburi R, Coates A, Crapo R, Enright P, van der Grinten CP, Gustafsson P, et al: Standardisation of spirometry. Eur Respir J. 2005, 26 (2): 319-338. 10.1183/09031936.05.00034805.

    Article  CAS  PubMed  Google Scholar 

  29. Tu JV: Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J Clin Epidemiol. 1996, 49 (11): 1225-1231. 10.1016/S0895-4356(96)00002-9.

    Article  CAS  PubMed  Google Scholar 

  30. Clermont G, Angus DC, DiRusso SM, Griffin M, Linde-Zwirble WT: Predicting hospital mortality for patients in the intensive care unit: a comparison of artificial neural networks with logistic regression models. Crit Care Med. 2001, 29 (2): 291-296. 10.1097/00003246-200102000-00012.

    Article  CAS  PubMed  Google Scholar 

  31. Lix LM, Yogendran MS, Leslie WD, Shaw SY, Baumgartner R, Bowman C, Metge C, Gumel A, Hux J, James RC: Using multiple data features improved the validity of osteoporosis case ascertainment from administrative databases. J Clin Epidemiol. 2008, 61 (12): 1250-1260. 10.1016/j.jclinepi.2008.02.002.

    Article  PubMed  Google Scholar 

  32. Akaike H: Information theory and an extension of the maximum likelihood principle. Proceedings of the 2nd International Symposium on Information Theory. 1973

    Google Scholar 

  33. Steyerberg EW, Harrell FE, Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD: Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 2001, 54 (8): 774-781. 10.1016/S0895-4356(01)00341-9.

    Article  CAS  PubMed  Google Scholar 

  34. Mapel DW, Hurley JS, Roblin D, Roberts M, Davis KJ, Schreiner R, Frost FJ: Survival of COPD patients using inhaled corticosteroids and long-acting beta agonists. Respir Med. 2006, 100 (4): 595-609. 10.1016/j.rmed.2005.08.006.

    Article  PubMed  Google Scholar 

  35. Fan VS, Ramsey SD, Giardino ND, Make BJ, Emery CF, Diaz PT, Benditt JO, Mosenifar Z, McKenna R, Curtis JL, et al: Sex, depression, and risk of hospitalization and mortality in chronic obstructive pulmonary disease. Arch Intern Med. 2007, 167 (21): 2345-2353. 10.1001/archinte.167.21.2345.

    Article  PubMed  Google Scholar 

  36. Jung E, Pickard AS, Salmon JW, Bartle B, Lee TA: Medication adherence and persistence in the last year of life in COPD patients. Respir Med. 2009, 103 (4): 525-534. 10.1016/j.rmed.2008.11.004.

    Article  PubMed  Google Scholar 

  37. Sin DD, Tu JV: Inhaled corticosteroids and the risk of mortality and readmission in elderly patients with chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2001, 164 (4): 580-584.

    Article  CAS  PubMed  Google Scholar 

  38. Parimon T, Chien JW, Bryson CL, McDonell MB, Udris EM, Au DH: Inhaled corticosteroids and risk of lung cancer among patients with chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2007, 175 (7): 712-719. 10.1164/rccm.200608-1125OC.

    Article  CAS  PubMed  Google Scholar 

  39. AHRQ quality indicators: Guide to prevention quality indicators: hospital admission for ambulatory care sensitive conditions [version 3.1]. 2007, Rockville (MD): Agency for Healthcare Research and Quality (AHRQ), 12. 59-(AHRQ Pub; no. 02-R0203)

    Google Scholar 

  40. National Committee for Quality Assurance (NCQA): HEDIS 2008: Healthcare Effectiveness Data & Information Set. Technical Specifications. 2007, Washington (DC): National Committee for Quality Assurance (NCQA), 2: various p

    Google Scholar 

  41. Kesten S, Chapman KR: Physician perceptions and management of COPD. Chest. 1993, 104 (1): 254-258. 10.1378/chest.104.1.254.

    Article  CAS  PubMed  Google Scholar 

  42. Chapman KR, Tashkin DP, Pye DJ: Gender bias in the diagnosis of COPD. Chest. 2001, 119 (6): 1691-1695. 10.1378/chest.119.6.1691.

    Article  CAS  PubMed  Google Scholar 

  43. Lee TA, Bartle B, Weiss KB: Spirometry use in clinical practice following diagnosis of COPD. Chest. 2006, 129 (6): 1509-1515. 10.1378/chest.129.6.1509.

    Article  PubMed  Google Scholar 

  44. Peabody JW, Luck J, Jain S, Bertenthal D, Glassman P: Assessing the accuracy of administrative data in health information systems. Med Care. 2004, 42 (11): 1066-1072. 10.1097/00005650-200411000-00005.

    Article  PubMed  Google Scholar 

  45. Bleeker SE, Moll HA, Steyerberg EW, Donders AR, Derksen-Lubsen G, Grobbee DE, Moons KG: External validation is necessary in prediction research: a clinical example. J Clin Epidemiol. 2003, 56 (9): 826-832. 10.1016/S0895-4356(03)00207-5.

    Article  CAS  PubMed  Google Scholar 

  46. Fan VS, Bridevaux PO, McDonell MB, Fihn SD, Besser LM, Au DH: Regional Variation in Health Status among Chronic Obstructive Pulmonary Disease Patients. Respiration. 2011, 81 (1): 9-17. 10.1159/000320115.

    Article  PubMed  Google Scholar 

Pre-publication history

Download references


This study was supported by the Department of Veterans Affairs, Health Services Research and Development (DHA), American Lung Association (CI-51755-N) awarded to DHA, the American Thoracic Society Fellow Career Development Award (CRC), and the Robert Wood Johnson Foundation Clinical Scholar's Program (CRC). The funding bodies had no role in study design analysis, interpretation and writing of the manuscript, and in the decision to submit the manuscript for publication. The authors would also like to acknowledge the referees for their thoughtful contributions to the manuscript.

This article was made available as Open Access with the support of the University of Michigan COPE Fund,

Disclaimer: The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs or the United States government.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Colin R Cooke.

Additional information

Competing interests

Dr. Au is a consultant for Nexura, LLC. All other authors have no competing interests to disclose.

Authors' contributions

CRC designed the study, participated in the analysis interpretation of the data, and drafted the manuscript. MJJ, SMA, and TAL participated in the interpretation of the data and revised the manuscript critically for important intellectual content. EMU and EJ conducted the data analysis, participated in interpretation of the data and revision of the manuscript. DHA received funding for the study, designed the study, participated in the analysis and interpretation of the data, and revised the manuscript. All authors read and approved of the final manuscript.

Electronic supplementary material


Additional file 1: Study cohort flow diagram. This figure presents a flow diagram describing the exclusion criteria for the primary analysis. (DOC 29 KB)


Additional file 2: Simulated positive predictive values (PPV) and negative predictive values (NPV) for Model 88, by prevalence of COPD. This table describes how PPV and NPV for Model 8 vary depending on the prevalence of COPD in the population studied. Because the prevalence of COPD in our study population was higher than that in the general population, we simulated what the PPV and NPV would be if the COPD prevalence were 10% to 20%. (DOC 33 KB)


Additional file 3: Discriminative performance (AUC) for original models compared to models when cohort limited to patients greater than 40 years-old. This table illustrates the changes to each model's performance when the cohort excluded all patients <40 years-old. (DOC 40 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Cooke, C.R., Joo, M.J., Anderson, S.M. et al. The validity of using ICD-9 codes and pharmacy records to identify patients with chronic obstructive pulmonary disease. BMC Health Serv Res 11, 37 (2011).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: