Coding of obesity in administrative hospital discharge abstract data: accuracy and impact for future research studies

Background Obesity is a pervasive problem and a popular subject of academic assessment. The ability to take advantage of existing data, such as administrative databases, to study obesity is appealing. The objective of our study was to assess the validity of obesity coding in an administrative database and compare the association between obesity and outcomes in an administrative database versus registry. Methods This study was conducted using a coronary catheterization registry and an administrative database (Discharge Abstract Database (DAD)). A Body Mass Index (BMI) ≥30 kg/m2 within the registry defined obesity. In the DAD obesity was defined by diagnosis codes E65 – E68 (ICD-10). The sensitivity, specificity, negative predictive value (NPV) and positive predictive value (PPV) of an obesity diagnosis in the DAD was determined using obesity diagnosis in the registry as the referent. The association between obesity and outcomes was assessed. Results The study population of 17380 subjects was largely male (68.8%) with a mean BMI of 27.0 kg/m2. Obesity prevalence was lower in the DAD than registry (2.4% vs. 20.3%). A diagnosis of obesity in the DAD had a sensitivity 7.75%, specificity 98.98%, NPV 80.84% and PPV 65.94%. Obesity was associated with decreased risk of death or re-hospitalization, though non-significantly within the DAD. Obesity was significantly associated with an increased risk of cardiac procedure in both databases. Conclusions Overall, obesity was poorly coded in the DAD. However, when coded, it was coded accurately. Administrative databases are not an optimal datasource for obesity prevalence and incidence surveillance but could be used to define obese cohorts for follow-up.


Background
Obesity is a highly prevalent health concern. While it is well established that many North Americans are obese, [1] similar trends are now being seen worldwide, even in countries such as India, where malnutrition was long the most common nutritional disorder. Obesity is now in line to overtake smoking as the leading preventable cause of morbidity and mortality, causing in excess of 300,000 deaths per year in the United States alone [2,3]. The burden of disease attributable to obesity is in large part due to its impact on the cardiovascular system of these individuals [4][5][6].
There are several published ways of measuring obesity, ranging from the simple, such as body mass index (BMI, kg/m 2 ) or waist circumference, to the complex, including body densitometry and more advanced volumetric techniques such as computed tomography (CT) imaging and magnetic resonance imaging (MRI) [7,8]. While the latter methodologies offer more accurate measurements of body composition, the former are more widely employed due to their relatively low cost, ease of use and familiarity. They are, however, prone to bias: frequently measures of weight and height are taken based on self -report which is rather unreliable, as women tend to underreport weight and men to over report height.
Gathering information on the adiposity of a population is difficult and time consuming: national surveys such as NHANES are expensive, and international studies such as the International Day for the Evaluation of Abdominal Obesity (IDEA) are logistically challenging [9]. It is even more challenging for follow-up studies to collect longitudinal information on obesity and outcomes from a large population. Being able to take advantage of existing administrative data, such as physicians claim and hospital discharge databases, could be potentially time and cost saving because obesity is captured as a diagnosis by the International Classification of Disease (ICD), codes 278 (ICD-9-CM) and E65 -E68 (ICD-10). In many developed countries (such as Canada), there are massive national administrative databases that are easily linked with other databases for research purposes. However, even though BMI is easily derived from standard clinical information, administrative data frequently does not capture height and weight.
There has only been limited evaluation of how frequently obesity is actually captured in administrative databases, or how accurately it is captured. A study by Quan et al from 2003 assessed obesity coding as one of their outcomes [10]. Chart review demonstrated an 8.3% frequency of obesitywhile ICD-9-CM data reflected a 2.7% (sensitivity 24.6% and positive predictive value (PPV) 75.9%) rate of obesity, and ICD-10 coding a 1.9% rate (sensitivity 18.6%, PPV83.8%). The association between coded obesity and adverse outcomes has not been well studied in administrative databases, such as those used in Canadian health care systems.
The objective of our study was to assess the validity of obesity coding in an administrative database. To conduct this study, we linked clinically captured physical measurement data, including height and weight, with administrative data to asses how frequently and accurately obesity is captured in an administrative database. To understand performance of obesity research using administrative data, we then determined the association between obesity and outcomes in an administrative database first and then replicate such analysis in registry. We evaluated if results generated from these two databases are comparable. This study will enrich the available information on obesity coding, and will allow the assessment of the utility of administrative data for population surveillance of obesity.

Defining obesity in physical measurement dataset
Our study was conducted using two data sources: The Alberta Provincial Project for Outcomes Assessment in Coronary Heart Disease (APPROACH) database and the Inpatient Discharge Abstract Database for the Calgary health region.
APPROACH is a clinical registry which has captured detailed clinical information on all patients undergoing cardiac catheterization in Alberta, Canada since 1995 [11]. At the time of catheterization, data are collected on clinical risk factors including age, sex, weight, height, body mass index (BMI, kg/m 2 ), hypertension, hyperlipidemia, diabetes, chronic lung disease, cerebrovascular disease, congestive heart failure, peripheral vascular disease, renal disease, liver or gastrointestinal disease, and malignancy. Also recorded are the results of coronary catheterization including coronary anatomy and left ventricular ejection fraction, procedures done at the time of initial catheterization and events thereafter (percutaneous coronary intervention (PCI), coronary artery bypass grafting (CABG) and death).
Obesity was defined within the APPROACH database using the Quetelet formula for BMI: weight (kilograms) divided by height (m) squared. A subject was determined to be obese in the APPROACH database if they had a BMI ≥ 30 kg/m 2 . Obesity classes were determined using the standard World Health Organization definitions: subjects with a BMI 30-34.99 kg/m 2 were considered Obesity Class I, subjects with a BMI 35-39.99 kg/m 2 were considered Obesity Class II, and subjects with a BMI ≥ 40 kg/m 2 were considered Obesity Class III [12].

Defining obesity in the hospital discharge abstract database
The Inpatient Discharge Abstract Database (DAD) collects administrative information on date and time of admission, length of stay and up to 25 diagnoses. Using the DAD for the years 2002-2008, obesity was defined by searching the diagnosis codes E65 -E68 (ICD-10) in the 25 diagnosis coding fields.
The APPROACH database and DAD were linked using Personal Health Numbers (PHNs), which are unique to each individual. Patients were excluded if they were under 18 years of age, did not have a valid Alberta PHN, or were from outside the Calgary Health Region. As the clinical covariates used for the study were obtained from the APPROACH database, cohort entry date was defined as the date of coronary catheterization. Only subjects who had a hospitalization in the first 6 months following catheterization were considered in this study. The diagnosis of obesity in the DAD was ascertained at the time of first hospitalization following catheterization.

Outcomes variables
The outcomes of interest were all cause mortality, as captured by vital statistics, first hospitalization (any cause) and first cardiac procedure (percutaneous coronary intervention (PCI) and coronary artery bypass grafting (CABG)) in the first year after the date of coronary catheterization. For those patients with multiple admissions in the following year after coronary catheterization, only the first admission was counted. The outcomes of PCI and CABG were obtained from the APPROACH database, and hospitalizations from the DAD.

Statistical analysis
Descriptive statistics were used to describe study population characteristics. Subjects were considered to be "correctly" coded as obese if they had a diagnosis code of obesity in the administrative database and a BMI ≥ 30 kg/m 2 in the measured data contained in APPROACH. The sensitivity, specificity, negative predictive value (NPV) and positive predictive value (PPV) of a diagnosis of obesity as coded in the administrative databases were determined using the physical measurement of obesity as the referent value. Each of these validity indices was calculated over the entire study time period as well as by year (2002 -2008). The accuracy of coding was then considered across various demographic categories and clinical conditions.
As a second step, we assessed the relative impact of being coded as obese on outcomes, namely hospitalization, PCI or CABG, and mortality. We considered the association between obesity and outcomes in those correctly diagnosed as obese within the administrative data, those diagnosed as obese within APPROACH only, and those diagnosed as obese within APPROACH or the administrative database. The association between outcomes and obesity were evaluated in multivariate logistic regression models. The Odds Ratio (OR) between the outcomes and obesity and their 95% confidential intervals (95%CI) were estimated in logistic regression model while controlling for other factors. Statistical analysis was conducted using SAS Version 9.0.
The study protocol was approved by the ethics review board of the University of Calgary.

Results
A total of 17,380 subjects were included in the initial analysis. Baseline characteristics of these patients are outlined in Table 1. The study population was largely male (56.4%) and 48.6% were aged ≥65 years old. Most subjects underwent coronary catheterization for urgent reasons, including myocardial infarction and unstable angina. In APPROACH, there were 3523 (20.3%) subjects with a BMI ≥ 30 kg/m 2 (mean BMI of 32.84 kg/m 2 ). Of these, 83.4% were obesity Class I, 13.5% were obesity Class II, and 3.1% were obesity Class III. In the DAD, 414 patients were coded as being obese.
Rates of obesity remained stable year to year (see Table 2). The sensitivity of a diagnosis of obesity in the DAD was low at 7.75%. However, it was highly specific at 99.0%, with Negative Predictive Value (NPV) of 80.8% and a PPV of 65.9% (Table 2). There were minor variations in the sensitivity of an obesity diagnosis, under 10% throughout the study time period. There were no clear trends or improvements in sensitivity over time. Specificity and NPV were excellent throughout the study period, at over 98% and 80% respectively.
Of those 414 subjects coded as obese in the DAD, nearly a third (141) were not actually obese when compared to measured data. These incorrectly coded subjects had a mean BMI of 26.9 kg/m 2 (SD 3.6), were older (mean age 63.7 ± 10.8 years vs 62.7 ± 11.0, p-value = 0.3), less likely to be female (33.3% vs 55.3%, p < 0.0001), and more likely to be diabetic (35.5 vs 44.3%, p = 0.0827) than those who were correctly coded as obese in both administrative and clinical data (n = 273).
We further analyzed our data to assess whether or not demographic or clinical factors would influence the PPV of a DAD diagnosis of obesity ( Table 2). The prevalence of obesity was higher in female subjects (28.3%) than male (16.6%), and the PPV was commensurately higher. The sensitivity of an administrative database obesity diagnosis was also higher in women. There were no trends across age groups, with the exception of a lower prevalence of obesity and PPV in the elderly (age >75 years) age group. The prevalence of obesity and the PPV was higher amongst those subjects with conditions associated with excess body mass: namely, diabetes and hypertension. This association was strongest for those with diabetes: the prevalence of obesity in patients with diabetes was 29.0%, versus 17.9% in those without; similarly, the PPV of an obesity diagnosis in the administrative database was higher amongst those with diabetes (70.8%) than those without (62.6%). The PPV did not seem to be impacted by a diagnosis of hyperlipidemia, congestive heart failure, or a history of previous myocardial infarction.
We further assessed whether BMI influenced the likelihood that a subject was coded as obese in the administrative database. Of those coded as obese within the administrative database, the large majority (72.9%) were Class I obese; of those not coded as obese, 84.3% were Class I obese, and 2.9% were Class III obese (Table 3). Thus, the higher the BMI by measured data the more likely subjects were to be coded as obesity in the administrative data.
As a final step, we wished to determine if obesity as coded in DAD was differentially associated with outcomes in comparison to obesity as determined by physical measurement. Over the course of the study, there were 7547 hospital admissions, 10772 CABG and PCIs, and 703 deaths. In subjects who were obese by DAD, there was no significant association between obesity and rehospitalization or mortality ( Table 4). The same was demonstrated for those "correctly" coded as obese. Considering those subjects who were obese by physical measurement only, obesity was associated with decreased risk of mortality or re-hospitalization, but an increased risk of repeat procedure. The same was seen for subjects who were obese by physical measurement or administrative data.

Discussion
We have confirmed the findings of previous investigators that administrative data under-coded obesity as a diagnosis.
However, once obesity is coded in the data, it is coded relatively accurately, as for other chronic conditions [10]. Administratively captured obesity was more likely in patients with higher classes of obesity or obesity-related complications. Interestingly, despite our suspicion that obesity coding would improve over time with increasing general awareness of the relationship between obesity and disease we found no evidence for this. These finding suggest that administrative databases could not be used for obesity surveillance due to under-reporting but could potentially be used to identify obesity for forming a cohort of obese subjects for follow-up studies.
Despite the general poor capture of obesity in administrative databases, we did find a number of conditions under which obesity is better captured. The PPV of an obesity diagnosis is higher in women than in men, and it is also higher in a number of conditions that are known to be associated with obesity, namely diabetes and hypertension. Additionally, in those cases in which obesity is actually captured, it is captured with great  accuracy, as demonstrated by the high PPV seen in this study. However the obese subjects as captured by the administrative database are more likely to be Class III obese than those who are not captured, i.e. there is a bias towards coding those with a higher BMI as obese, missing those who are Class I obese. Thus, the administrative databases are capturing subjects who perhaps already have complications from their obesity, as evidenced by the fact that the PPV of an obesity diagnosis in the administrative database is higher in those with obesity related complications such as diabetes. Cohorts defined using administrative data may therefore show a falsely high correlation between obesity and the development of complications or poor outcomes, as the obese subjects correctly identified in administrative databases are potentially sicker than an average obese subject. Some work has been done assessing the validity of obesity coding in administrative data in previous studies. In one chart based study by Quan et al, only weight loss, coagulopathy and blood loss anemia were less validly coded than obesity in administrative data. In Switzerland, obesity was under-coded (prevalence 2.2% in 1999, 3.2% in 2001 and 4.1% in 2003) compared with the prevalence in chart ranging 6.6-7.3% but coding improved over years (sensitivity 29.4% in 1999, 39.5% in 2001 and 51.5% in 2003; PPV 92.%, 81.1% and 91.7% in these years, respectively) [13]. Reasons put forth for the poor capture of obesity in administrative data include the fact that obesity is not explicitly mentioned in either physician or nursing notes, and also that coders may intentionally not code diagnoses such as obesity owing to time constraints when doing data abstraction. In the limited time for coding each chart, coders are likely to ignore risk factors, focusing on overt clinical conditions. Coding guidelines pay more attention to conditions contributing to resource use and the use of extra resources by obese subjects is a topic that is only more recently understood [14]. In addition, physicians may not explicitly mention obesity in the chart summary page which coders mainly rely on, as obesity is poorly recognized as a disease. BMI was also not well-documented although height and weight are available on most clinical charts. The diagnosis of obesity is often made based on clinician's subjective observation, likely capturing higher class obesity. If administrative database data abstractors are coding height and weight in the chart, rates of obesity are likely to be accurate.
Another difficulty in defining obesity is the use of patient self-reported data. On patient admission to hospital, height and weight are frequently determined by patient report, and this information is then recorded in the patient record. It has been shown that patients overestimate their height and underestimate their weight, which leads to underestimates of the prevalence of obesity. This misrepresentation of BMI is more common in the obese [15].
A recent study by Woo et al [16] considered both hospital administrative data and a clinical database that captured height and weight for all children admitted to hospital. The administrative database failed to capture  obesity for the majority of obese children who were admitted to hospital. A diagnosis of obesity in the administrative database only had an 8% sensitivity based on their BMI. More importantly, when outcomes were compared between non-obese children and obese children based on (a) obesity as captured in the administrative database versus (b) obesity as captured in the clinical database, the impact of obesity was found to be different. A diagnosis of obesity recorded in the administrative data identified "sporadic, potentially non-representative, hospital discharges with shorter lengths of stay." However our study demonstrated that the association between obesity and each of the outcomes (hospitalization, PCI/CABG or death) were similar between regardless of how obesity was coded. Differences arose in terms of the significance (for mortality, likely due to the small number of deaths, and for re-hospitalization), and in terms of the magnitude for PCI/CABG. For missing information on obesity in administrative data, merging with clinical databases such as was done in this study is an important way by which to enhance the quality data found in administrative databases. Additionally, physician claims databases as well as prescription databases are potential sources of obesity information. In a review of available literature, nearly all studies of obesity using larger databases are not based on administrative databases alone. This includes papers from NHANES assessing obesity prevalence [17,18], studies assessing the association between adiposity and cardiovascular outcomes, [19][20][21][22][23][24][25] and studies assessing care in obese subjects [26,27]. For instance, in a paper by Chang et al., while Medicare claims and enrollment were used to assess for service utilization, data on BMI were obtained from a merge with the Medicare Current Beneficiary Survey (MCBS). Similar studies done strictly using administrative or claims data without data enrichment to determine BMI would only identify a high risk group of obese subjects [28].

Limitations
There are a number of limitations in this study that need be noted. Firstly, we have only considered cardiac patients. As cardiovascular disease is a complication related to obesity, rates of obesity coding in administrative data may be higher among this population than in the general inpatient population. However, in a study by Quan et al assessing a random sample of charts, obesity had prevalence 2.7% in ICD-9-CM DAD and 1.9% in ICD-10 DAD [10] similar to the rate seen in this population. We could also only consider the impact of coding on outcomes in cardiac populations; more distinct patient populations need to be assessed.

Conclusions
A call to arms has been put forth by organizations such as the American Heart Association, recognizing that health care providers have not done a good job assessing for obesity and suggesting "the measurement and documentation of BMI in all adults" [29,30]. In this study we have demonstrated that even when obesity is present, care givers and coders do a poor job documenting its presencethough subjects at highest risk of complications are accurately identified. This large study demonstrates three key pieces of information: obesity is underreported in administrative data with low sensitivity, and hence cannot be used for incidence and prevalence surveillance; obesity coding in administrative databases could be used to define a cohort for followup or outcomes studies, supported by high PPV and similar outcomes conclusions between two databases; finally, we strongly recommend adding height and weight into routine administrative data coding, as is done age and sex. This would make these data an invaluable resource for studies of obesity and population health.