Comparing administrative and survey data for ascertaining cases of irritable bowel syndrome: a population-based investigation

Background Administrative and survey data are two key data sources for population-based research about chronic disease. The objectives of this methodological paper are to: (1) estimate agreement between the two data sources for irritable bowel syndrome (IBS) and compare the results to those for inflammatory bowel disease (IBD); (2) compare the frequency of IBS-related diagnoses in administrative data for survey respondents with and without self-reported IBS, and (3) estimate IBS prevalence from both sources. Methods This retrospective cohort study used linked administrative and health survey data for 5,134 adults from the province of Manitoba, Canada. Diagnoses in hospital and physician administrative data were investigated for respondents with self-reported IBS, IBD, and no bowel disorder. Agreement between survey and administrative data was estimated using the κ statistic. The χ2 statistic tested the association between the frequency of IBS-related diagnoses and self-reported IBS. Crude, sex-specific, and age-specific IBS prevalence estimates were calculated from both sources. Results Overall, 3.0% of the cohort had self-reported IBS, 0.8% had self-reported IBD, and 95.3% reported no bowel disorder. Agreement was poor to fair for IBS and substantially higher for IBD. The most frequent IBS-related diagnoses among the cohort were anxiety disorders (34.4%), symptoms of the abdomen and pelvis (26.9%), and diverticulitis of the intestine (10.6%). Crude IBS prevalence estimates from both sources were lower than those reported previously. Conclusions Poor agreement between administrative and survey data for IBS may account for differences in the results of health services and outcomes research using these sources. Further research is needed to identify the optimal method(s) to ascertain IBS cases in both data sources.


Background
Both administrative and survey data are used to conduct population-based health services and outcomes research about chronic conditions such as diabetes, hypertension, arthritis, and osteoporosis [1][2][3][4][5]. A primary concern when using administrative data for studies of chronic conditions is the accuracy and completeness of diagnostic information [6,7], while recall bias is a key concern for self-report data. These limitations of both data sources may result in discrepant research findings, yet only a few studies have compared administrative and survey data for ascertaining disease cases.
Okura et al. [8] observed good agreement between survey and administrative data for ascertaining cases of diabetes, hypertension, myocardial infarction, and stroke. Estimates of agreement were substantially lower for heart failure. Rector et al. [9] found that the sensitivity of administrative data, when compared to survey data, was very good for identifying cases of hypertension and diabetes, but lower for arthritis and heart failure. Other studies have found moderate to good agreement between administrative and survey data [4,[10][11][12]. While agreement tends to be highest for well-defined chronic conditions, such as diabetes, it also depends on the case-ascertainment methodology applied to administrative data [11,12]. For example, Rector et al. [9] found that the number of years of data used to identify chronic disease cases could have a substantial impact on the sensitivity of administrative data.
Recently, administrative data have been investigated as a potential data source for health services and outcomes research about irritable bowel syndrome (IBS) [13].
Population-based investigations about IBS have primarily been conducted using survey data [14][15][16]. IBS is a common non-inflammatory gastrointestinal condition; the prevalence in North America is estimated to be between 7% and 15% [15][16][17]. Individuals with IBS have higher health care use and costs than their non-IBS counterparts [18,19]. IBS is a difficult condition to investigate in population-based research because there is no single diagnostic test to confirm disease presence and the symptoms of IBS may also be associated with other conditions, including infections. However, when compared with clinical data or medical charts, diagnoses in administrative data have shown good specificity for ascertaining IBS cases [20,21].
This study compares administrative and survey data for ascertaining cases of IBS. The objectives are to: (1) estimate agreement between the two data sources for IBS and compare the results to those for inflammatory bowel disease (IBD), a gastrointestinal condition with well-defined diagnostic criteria in comparison with IBS; (2) compare the frequency of IBS-related diagnoses in administrative data for survey respondents with and without self-reported IBS, and (3) estimate IBS prevalence from both sources.

Methods
A retrospective cohort study was undertaken using administrative and survey data from the province of Manitoba, Canada, which has a population of approximately 1.2 million. The Research Data Repository housed at Manitoba Centre for Health Policy (MCHP) contains administrative data provided by the provincial health ministry. The Repository also houses populationbased survey data from a national health survey, the Canadian Community Health Survey (CCHS; http:// www.statcan.gc.ca/concepts/health-sante/cycle3_1/indexeng.htm) and the two sources can be directly linked via a unique, anonymized personal health identification number (PHIN). The University of Manitoba Health Research Ethics Board approved the conduct of this research and the Manitoba Health Information Privacy Committee approved the data linkage.
The survey data were collected between January and December 2005. The CCHS was developed to provide cross-sectional estimates of health status, health determinants, and health system use for a target population of individuals 12 years of age and older living in private dwellings. The survey does not include individuals living on Indian Reserves and other government-owned land, institutional residents, and full-time members of the Canadian Forces; these groups represent approximately 2% of the Manitoba population. The Manitoba response rate was 83.3%; the national response rate was 78.9%.
Sources of administrative data were records of inpatient hospitalizations and outpatient physician billing claims. Manitoba, like other Canadian provinces, has a universal health system; almost all Manitoba residents are covered under the Manitoba Health Services Insurance Plan (MHSIP). A hospitalization record is completed upon patient discharge. Each record includes up to 16 diagnosis codes from the International Classification of Diseases, 9 th Revision, Clinical Modification (i.e., ICD-9-CM) prior to April 1, 2004 and up to 25 diagnosis codes from the 10th Revision for this date onward (i. e., ICD-10-CA). Procedure codes are also captured in hospitalization records. Physicians paid on a fee-for-service basis submit billing claims to the provincial health ministry; these claims capture virtually all outpatient services, including those for hospital emergency departments and outpatient departments. While some physicians are salaried (e.g., approximately 7% of family physicians) [22], it has been estimated that approximately 90% of these physicians submit parallel billing claims for administrative purposes. Each visit results in one billing claim that contains a single ICD-9-CM code. ICD-9 codes in physician data are recorded to the third digit, while codes in hospital data are recorded using up to five digits.

Study Cohort
There were 7,004 Manitoba respondents to the CCHS in 2005. The anonymized linkage of survey and administrative data was conducted for those respondents who provided their consent for the linkage (N = 6,349; 90.6%). After removing invalid or missing PHINs, linkage was successfully achieved for 6,232 respondents (89.0%). From this sample, an adult cohort (19+ years) with at least three years of continuous coverage under the MHSIP prior to the date of their CCHS interview and at least one year of coverage after this date was created (N = 5,134; 73.3%). Coverage information was determined from the population registration file.
The survey interview schedule included the following directions: "Now I'd like to ask about certain chronic health conditions which you may have. We are interested in 'long-term conditions' that have lasted or are expected to last six months or more and that have been diagnosed by a health professional". Respondents were asked whether they had ever been diagnosed with a number of chronic conditions, including a bowel disorder, specifically: "Do you suffer from a bowel disorder such as Crohn's disease, ulcerative colitis, irritable bowel syndrome, or bowel incontinence?" If the response was affirmative, respondents were asked to identify the type of bowel disorder with which they had been diagnosed: Crohn's disease, ulcerative colitis, irritable bowel syndrome, bowel incontinence, or other bowel disorder (type was not specified). A single category was recorded for each respondent. Table 1 provides a listing of the diagnosis and procedures codes in administrative data that were selected for this investigation [13,20,21]. Previous studies that have used administrative data to ascertain IBS cases have only been conducted using ICD-9 codes. In this study it was also necessary to identify the relevant codes in ICD-10-CA. This was accomplished using crosswalk files, which map ICD-9 codes to ICD-10-CA codes, developed by a national health information agency, the Canadian Institute of Health Information, and confirmed by research team members.

Selection of Diagnosis Codes in Administrative Data
The ICD-9-CM diagnosis code for IBS is 564.1; the corresponding code in ICD-10-CA is K58. Other diagnosis and procedure codes that are more common among individuals with IBS than among individuals without this condition were identified from previous research [13,20,21]. These diagnosis codes were for a variety of gastrointestinal and genitourinary conditions and symptoms, procedures used to assess the presence of gastrointestinal inflammation, and some comorbid conditions. Diagnosis codes for IBD, specifically for Crohn's disease (CD) and ulcerative colitis (UC) were included because some individuals with IBD may have previously been diagnosed with IBS [23]. Diagnosis and procedure codes were investigated in administrative data up to three years prior to the date of the CCHS interview and one year following this date. These time frames were selected based on previous research which has used between one and three years of administrative data to ascertain chronic diseases [12,24].
Crude IBS prevalence estimates, as well as sex-and age-specific estimates, were generated from administrative and survey data. In the administrative data, two case-ascertainment algorithms were investigated. For the first, Manitoba health insurance registrants were identified as IBS cases if they had a least one IBS diagnosis in hospital or physician data in a one-year period. For the second method, registrants were identified as IBS cases if they had at least one IBS diagnosis in a three-year period. One-year estimates were based on data for the 2004/05 fiscal year and three-year estimates were based on data from fiscal years 2002/03 to 2004/05. The population registration file was used to obtain the denominator for these estimates.

Statistical Analysis
Descriptive statistics, including frequencies and percentages, were used to characterize the cohort on demographic and socioeconomic variables. The statistic, a chance-corrected measure of agreement, was estimated from the administrative and survey data for both IBS and IBD and 95% confidence intervals (CIs) were computed. The interpretation of used is [25]: poor agreement ( < 0.20), fair agreement ( = 0.20 to 0.39), moderate agreement ( = 0.40 to 0.59), good agreement ( = 0.60 to 0.79), and very good agreement ( = 0.80 to 1.00). The χ 2 test was used to test the association between the frequency of IBS-related diagnoses and procedures in administrative data and self-reported IBS. Ninety-five percent CIs were calculated for the prevalence estimates assuming a binomial distribution. Survey sampling weights were used in all inferential analyses; these weights ensure that the results are generalizable to the Manitoba population. Analyses were performed using SAS software [26].

Results
Overall, 152 (3.0%) members of the study cohort reported having IBS, another 0.8% reported having CD or UC, and 95.3% reported having no bowel disorder. Another 49 respondents, who indicated they had bowel incontinence or another, unspecified bowel disorder, were excluded from the analysis.
The socio-demographic characteristics of the three groups are described in Table 2. Males comprised less than 15.0% of respondents with self-reported IBS, more than one third of respondents with self-reported CD or UC, and almost half of respondents with no bowel disorder. Almost half of respondents with self-reported IBS were less than 45 years of age compared to slightly more than one-third of those who reported no bowel disorder. The majority of respondents were urban residents. Respondents were less likely to be in the highest and lowest income quintiles and more likely to be in one of the three middle income categories.
Estimates of agreement between survey and administrative data are reported in Table 3. Applying the interpretative criteria to these estimates [25], agreement was poor for IBS when one year of post-interview data was used for case ascertainment but was fair when three years of pre-interview data were used. For IBD, agreement was moderate when a single year of post-interview data was used, but was good when up to three years of administrative data prior to the interview date were used. Table 4 reports the percentage of survey respondents with self-reported IBS and no self-reported bowel disorder for the selected diagnoses and procedures in administrative data before and after the survey interview date. None of the survey respondents had an IBS or IBSrelated diagnosis code recorded in hospitalization records. An IBD diagnosis in hospitalization records was also rare among self-reported IBS cases. There was a significant association between the presence of a sigmoidoscopy, colonoscopy, or endoscopy procedure in both three-year and one-year periods before the survey interview and presence of self-reported IBS (p < 0.001). Overall, 11.2% of survey respondents who reported a diagnosis of IBS had one of the selected procedure codes in the three years prior to the interview date. In physician billing claims, 9.4% of the self-reported IBS cases had an IBS diagnosis prior to the interview date and a similar percentage (10.7%) had a diagnosis in the one-year period following this date. An IBS diagnosis code was present in the physician claims of less than 2.0% of respondents who reported that they did not have a bowel disorder.
Almost one-quarter of IBS cases had a diagnosis of IBD in physician claims data up to three years before the survey interview date, but in the year immediately before or after the interview date the number of cases with this diagnosis was too small to analyze. Among IBS cases, the most frequent IBS-related diagnoses recorded in physician billing claims in the three-year period before the survey interview date were anxiety disorders (34.4%), symptoms involving the abdomen and pelvis (26.9%), and diverticulitis of the intestine (10.6%). In the one-year period following the interview date the most common IBS-related diagnoses were anxiety disorders (20.7%), symptoms involving the abdomen and pelvis (10.2%), and symptoms involving the urinary system (4.2%). These were also the most common IBS-related diagnoses among non-cases, but the percentages were significantly different (p < 0.001 for all tests) for the two groups.
The crude IBS prevalence estimate from survey data was 3.00 per 100 (95% CI: 2.37, 3.76). Using administrative data, the crude prevalence estimate was 1.71 per 100 (95% CI: 1.69, 1.74) for a one-year case-ascertainment algorithm and 4.23 per 100 (95% CI: 4.19, 4.28) for a three-year algorithm. The number of IBS cases identified from hospitalization data was small; for the former algorithm, 1.74% of cases were identified solely from hospital data and for the latter algorithm the corresponding figure was 0.43% of cases.
Both administrative and survey data resulted in higher prevalence estimates for females than for males ( Figure  1); the ratio was 4.27 for survey data, 2.14 for the oneyear algorithm and 2.08 for the three-year algorithm. The analysis by age group (Figure 2) revealed that for the two youngest age groups, prevalence estimates from administrative and survey data were not significantly different (p ≤ .10 for both administrative data algorithms).
For the oldest age group, they were significantly higher using administrative data than survey data.

Discussion
This study compared population-based administrative and survey data for IBS diagnoses. The results show that agreement between administrative and survey data for IBS was low, but it was much higher for IBD. Agreement was investigated for two periods of time both before and after the survey interview date. Agreement between administrative and survey data remained low regardless of the size or direction (i.e., pre-interview versus post-interview) of the case-ascertainment window.
Compared with survey respondents who reported having no bowel disorder, respondents with a self-reported diagnosis of IBS were more likely to have diagnoses in administrative data for symptoms of the digestive system and abdomen, other gastrointestinal conditions including IBD, selected procedures, and an anxiety disorder. These findings are consistent with previous research, which has shown that that IBS patients have an increased likelihood of diagnosis for a variety of gastrointestinal, genitourinary, and psychological conditions when compared with the general population [21,27]. Given that IBD and IBS may have similar symptoms, an increased frequency of diagnosis for IBD may be indicative of physicians attempting to "rule out" the presence of this condition in IBS patients [23]. A recent Canadian study [28] found that physicians are accurate in their use of ICD-9 diagnosis codes for Crohn's disease and ulcerative colitis in billing claims. Thus, the increased frequency of IBD diagnosis among IBS patients does not appear to be due to inaccuracies in diagnoses. Note: IBS -Irritable bowel syndrome; CD -Crohn's disease; UC -Ulcerative colitis;indicates that the results have been suppressed due to cell sizes less than 7.
a Percentages for this variable may not sum to 100 because of missing data Table 3 Estimates of for administrative and survey data before and after the survey interview date IBS prevalence estimates have been reported to be two to three times higher among females than males [17], which is consistent with the findings observed in this study for both data sources. However, crude estimates in this study were still lower than those reported in previous North American research [14,16,17]. This may be due to a number of factors, including the lack of specificity of the survey questions about bowel disorders or the unfamiliarity of Manitoba physicians with diagnostic criteria for IBS. However, there is substantial international variation in IBS prevalence estimates, with some non-North American counties reporting values as low as 3 and 5 per 100 [29]. The estimates obtained from administrative data were higher for the oldest age Table 4 Percentages of respondents with selected diagnoses and procedures before and after the survey interview date  Note: IBS = irritable bowel syndrome; CD = Crohn's disease; UC = ulcerative colitis;indicates results have been suppressed due to cell sizes less than 7; * indicates a χ 2 test for IBS respondents and respondents with no bowel disorder that is significant at a = .05) and † indicates a χ 2 test that is non-significant.
Unmarked numeric values for IBS respondents could not be tested due to zero/small cell sizes.
group than for the younger two age groups, which is not consistent with the estimates obtained from survey data nor with the estimates reported in previous research, which show declining IBS prevalence with age [15]. This may be a result of some loss of specificity for ascertaining IBS cases in physician data because diagnoses in Manitoba's physician data are only recorded to the third digit; very few IBS cases were identified from hospital data, in which diagnoses are recorded with greater specificity. This finding may also be a result of increased reporting among older adults of a variety of symptoms and/or an increased likelihood of inaccurate assignment of an IBS diagnosis in older adults [30]. Well-defined chronic conditions are more likely to result in good agreement between administrative and survey data [8,31]. The Rome criteria have been developed to provide physicians with a systematic methodology to classify individuals with functional   gastrointestinal disorders, including IBS [32]. However, based on the results of medical chart review, Goff et al. [20] reported that the criteria are infrequently used by physicians to establish a diagnosis of IBS. Wilson et al. [16] suggest that the use of the Rome criteria by physicians may result in underestimation of IBS prevalence in population-based research that uses medical records.
One limitation of this study is that it was not possible to compare clinical data with the administrative data or survey data. As well, the survey data do not contain information about date of diagnosis, which would have been useful for decisions about the number of years of administrative data needed to ascertain disease cases. Some diagnoses occurred infrequently in administrative data and therefore it was not possible to test for differences between survey respondents with and without IBS. Finally, as noted previously, only three-digit ICD-9 codes are available in physician claims data in Manitoba, resulting in loss of specificity to ascertain IBS cases. This situation is not unique in Canada; physician data in Ontario, the largest province in Canada, is also limited to three-digit diagnosis codes [33]. However, there is a lack of consistency across jurisdictions in the way diagnoses are recorded in physician data. For example, ICD-9 codes in Medicare physician data from the United States may be recorded using up to five digits [9].
Given the low agreement between survey and administrative data and evidence that this disease appears to be under-reported in both data sources, it is important to undertake further research that can improve IBS case ascertainment in population-based data. Thompson et al. [32] suggest that imprecision in the wording of survey questions contributes to inaccuracies in case ascertainment. Wilson et al. [16] recommended that survey questions based on the Rome criteria be supplemented with additional questions to identify patients formerly diagnosed with IBS but with symptoms currently under control. Thus, future research could focus on comparing agreement between administrative and survey data when different question wording methods are adopted. For administrative data, techniques that do not rely exclusively on a single diagnosis code could be used to improve ascertainment results for IBS. Machine-learning and statistical classification models, including latent class analysis and neural networks, which have been applied to chronic conditions where there may be low sensitivity associated with using a single diagnosis code to ascertain disease cases [12,34,35], might be applied to this problem.

Conclusions
Population-based chronic disease research can provide important information about the effectiveness of disease treatment and management initiatives and health promotion and disease prevention strategies. The quality of this research depends, in part, on the accuracy and validity of chronic disease case ascertainment methods. In both data sources, the use of a single methodology for identifying disease cases may result in missed cases. Finally, researchers who rely on either administrative data or survey data to conduct population-based studies about IBS should recognize that lack of comparability between the two data sources will be an important confounder of their results.