An approximate 20% random sample of all residents of Alberta, Canada diagnosed with invasive colon cancer (International Classification of Diseases for Oncology (ICD-O)  codes: c18, excluding appendix) or rectal cancer (ICD-O c19 and c20) in years 2000 to 2005, stratified by stage and year of diagnosis, were identified from the Alberta Cancer Registry and included in the study. Patients were excluded for the following reasons: stage 0 cancer; histology that are not staged according to the Collaborative Staging Guidelines ; or missing the unique lifetime identifier (ULI). The ULI is a unique number assigned to all members of the Alberta Health Care Insurance Program (AHCIP), the publicly-funded provincial healthcare insurance plan in Alberta. The ULI is, therefore, used as the anonymized patient identifier in all provincial administrative databases in Alberta and was used to link data across data sources for the study.
Chart review data
A chart review using the cancer clinic medical chart was conducted to identify dates of endoscopy prior to and including the date of diagnosis. Cancer medical charts are initially created for all patients diagnosed with cancer by the Alberta Cancer Registry for use in coding cases. They include procedure reports such as those for pathology, surgery, or endoscopy, plus referral letters and dictation notes, if the patient is seen by an oncologist; thus a cancer chart exists for every patient diagnosed with cancer in the province. The following data were abstracted from the charts: date and type of endoscopy; result (cancer, suspicious, not cancer); and source of information (letter, dictation notes, report).
Administrative health databases
Endoscopy data were obtained from three provincial administrative databases, the first two of which conform to national reporting standards: 1) the Discharge Abstract Database (hospital inpatient data) which records information on all admissions to hospitals in Alberta; 2) the Ambulatory Care Classification System Database (hospital outpatient data), which contains information on all outpatient visits that occurred in hospitals, such as visits to hospital-based physicians’ offices, hospital endoscopy units, and emergency departments; and 3) the Physician Billing database, which contains all billing claims submitted by physicians remunerated on a fee-for-service basis and “shadow” billing submitted by physicians employed through the Alternate Relationship Plan (ARP). The latter group of physicians comprises a small number of physicians in one city during the time period of this study. From each data source, dates and codes for endoscopy procedures were identified that occurred within one year prior to colorectal cancer diagnosis for each patient included in the study. The timeframe of one year prior to diagnosis was determined based on a sensitivity analysis we conducted comparing endoscopies found 12, 18, or 24 months prior to colorectal cancer diagnosis; roughly the same number were found regardless of the time frame, therefore we used one year as the cutoff.
Each data source uses a different coding system and coding systems changed from ICD-9 to ICD-10 in April 2002 for the hospital datasets. In order to identify endoscopy codes from each data source appropriately, a literature review was conducted and input from local physicians was obtained. Since our purpose was to identify all lower gastrointestinal endoscopies regardless of purpose, all codes that indicated use of an endoscope were included. The endoscopy procedure codes included in the study from each data source are listed in Additional file 1.
Combined administrative dataset
The three administrative datasets were combined using the assumption that if an endoscopy was identified in any source then it was assumed to have occurred. This is because: 1) we expect that most patients will have had an endoscopy prior to colorectal cancer diagnosis and 2) it is unlikely that an endoscopy would be identified in any of the data sources if it was not actually performed; that is, the probability of a false positive is low. The data were combined in such a way as to minimize error in identifying unique endoscopies and also to assess accuracy with respect to the date of the endoscopy in the various data sources. In practice, it would be reasonable for an endoscopy code for the same event to appear in a hospital inpatient record and physician billing record or hospital outpatient record and physician billing record. Coding rules and practices should prevent the same event from being coded in both hospital inpatient and outpatient data unless an error is made. This is because procedures that happen to patients as outpatients should not be entered as a procedure as an inpatient (and vice versa), even if the patient is admitted the same day. Similarly, it is unlikely that a patient would undergo more than one endoscopy on the same day. Furthermore, dates for events in the hospital databases are expected to be accurate because the data are entered and coded by trained health records technicians. Physician billing, however, is more prone to error with respect to both the accuracy of the code and the date. In order to minimize the chance of counting a given endoscopy more than once and minimize the chance of counting two or more events as one when combining the datasets, the following rules were applied: 1) if an endoscopy appeared in both the inpatient and outpatient datasets for the same individual and date it was considered to be the same endoscopy; 2) if an endoscopy in the physician billing data was within three days of an endoscopy in either hospital dataset then it was counted as the same endoscopy. These rules were tested against rules using three and seven day windows, respectively, with the result that there was minimal difference in the number of unique endoscopies identified. If a patient did not appear in a dataset then the patient was assigned to the “No Endoscopy” category for that particular dataset.
The gold standard dataset was created by combining all administrative datasets and the chart review data. If a procedure was identified in any data set, it was considered to have occurred in the gold standard. The cancer clinic medical chart was not adopted as the gold standard because, even though information that is collected by the cancer registry to code and stage patients is in these charts, it is possible that an endoscopy that did not result in removal of tissue would be missed. Furthermore, although pathology reports are obtained when possible, some information may be obtained from referral letters or dictation notes which are subject to error. For this reason a gold standard was created to maximize the probability of identifying all unique endoscopies conducted in the year prior to colorectal cancer diagnosis. The same rules and assumptions that were followed to create the combined administrative dataset were applied in creating the gold standard: 1) if an endoscopy appeared in either data source then it was assumed to have occurred (probability of false-positive is low) and 2) endoscopies in the chart review dataset that were within three days of the date of an endoscopy in the combined administrative dataset were counted as the same endoscopy.
The measures to evaluate the completeness of the data were calculated at two levels: 1) comparing the total number of patients that underwent endoscopy and 2) comparing the total number of endoscopy procedures identified in each data source. The following descriptive statistics were calculated regarding patients who received an endoscopy and endoscopies identified from each dataset using the respective totals identified in the gold standard as the denominators for percentages: 1) total number and percent, 2) the number and percent identified from one and only one data source, by data source and, 3) the number and percent identified from one and only one of the administrative data sources, by administrative data source; note, these may have also been identified from the chart review. The purpose of this latter set of statistics is to indicate the extent to which each administrative data source contributes uniquely in the absence of a chart review. The percentage of endoscopy procedures that had exact date matches was used to determine the accuracy of the data.
In order to assess the likelihood that endoscopies were missed, clinical characteristics and health care service utilization were compared between patients who had an endoscopy to those who did not. Specifically, patient age at diagnosis, disease stage, type of first colorectal cancer-related healthcare visit (pre-diagnostic or not), and time from diagnosis to death were explored. These were selected because they were considered to be potentially relevant reasons individuals may not receive an endoscopy prior to colorectal cancer diagnosis. Statistical significance was defined at the α=0.05 level. All analyses were performed using statistical software SAS 9.1.3 (SAS Institute, Cary, NC, USA) or STATA/SE 10.0 (StataCorp LP, TX, USA).