Using administrative health data to describe colorectal and lung cancer care in New South Wales, Australia: a validation study

Background Monitoring treatment patterns is crucial to improving cancer patient care. Our aim was to determine the accuracy of linked routinely collected administrative health data for monitoring colorectal and lung cancer care in New South Wales (NSW), Australia. Methods Colorectal and lung cancer cases diagnosed in NSW between 2000 and 2002 were identified from the NSW Central Cancer Registry (CCR) and linked to their hospital discharge records in the NSW Admitted Patient Data Collection (APDC). These records were then linked to data from two relevant population-based patterns of care surveys. The main outcome measures were the sensitivity and specificity of data from the CCR and APDC for disease staging, investigative procedures, curative surgery, chemotherapy, radiotherapy, and selected comorbidities. Results Data for 2917 colorectal and 1580 lung cancer cases were analysed. Unknown disease stage was more common for lung cancer in the administrative data (18%) than in the survey (2%). Colonoscopies were captured reasonably accurately in the administrative data compared with the surveys (82% and 79% respectively; 91% sensitivity, 53% specificity) but all other colorectal or lung cancer diagnostic procedures were under-enumerated. Ninety-one percent of colorectal cancer cases had potentially curative surgery recorded in the administrative data compared to 95% in the survey (96% sensitivity, 92% specificity), with similar accuracy for lung cancer (16% and 17%; 92% sensitivity, 99% specificity). Chemotherapy (~40% sensitivity) and radiotherapy (sensitivity≤30%) were vastly under-enumerated in the administrative data. The only comorbidity that was recorded reasonably accurately in the administrative data was diabetes. Conclusions Linked routinely collected administrative health data provided reasonably accurate information on potentially curative surgical treatment, colonoscopies and comorbidities such as diabetes. Other diagnostic procedures, comorbidities, chemotherapy and radiotherapy were not well enumerated in the administrative data. Other sources of data will be required to comprehensively monitor the primary management of cancer patients.


Background
Colorectal and lung cancers are the second and fifth most common cancers in New South Wales (NSW), Australia's most populous state. In 2008, the two cancers together accounted for 22% of all new cancers and 33% of cancer deaths [1]. Monitoring treatment patterns and evaluating associated outcomes is a necessary requirement for improving care amongst these cancer patients. While population-based patterns of care surveys are valuable for this purpose, they are resource-intensive and provide only a snapshot of care. The use of linked routinely collected administrative health data, if sufficiently reliable, would be more efficient, potentially cost-effective and allow for the monitoring of cancer care over time.
The NSW Central Cancer Registry (CCR) and NSW Admitted Patient Data Collection (APDC) are two routinely collected administrative data sources that together could provide information on cancer treatment in NSW. A recent validation study found these data sources accurately recorded radical prostatectomy and brachytherapy treatment for prostate cancer patients, but not external beam radiotherapy [2]. An earlier breast cancer study described reasonable enumeration of surgery for breast cancer [3]. However there is little other published material investigating the validity of these data sources for describing patterns of cancer care, despite their increasing use for this purpose e.g [4][5][6].
Cancer stage information is vital for assessing the appropriateness of care. The previous prostate cancer study found a high proportion of tumours had unknown stage in the NSW Cancer Registry [2]. Another study reported 70% agreement between the CCR and colorectal cancer stage collected in a survey of treating clinicians [7]. Similar studies in another Australian state and New Zealand reported around 80% agreement/accuracy of the pathology-based colorectal cancer staging information that is reported in cancer registries, suggesting it is a valid source of high-level stage information [8,9].
Here we report on the validity of the administrative data for recording diagnostic procedures and treatment received by colorectal and lung cancer patients, along with cancer stage and selected comorbidities for lung cancer patients. This study adds to the limited existing literature regarding the use of these data to assess and monitor patterns of cancer care over time. Given the potential utility of these population-based data sources for this purpose, with only a fraction of the resources required by other methods, this study makes an important contribution to the literature.

Patterns of care study data
Two population-based studies carried out by Cancer Council NSW collected detailed treatment data for colorectal and lung cancer patients diagnosed in NSW. The NSW Colorectal Cancer Care Survey (called the "colorectal cancer survey") collected data on the patterns of care for colorectal cancer patients notified to the CCR between February 2000 and January 2001 [10]. The NSW Lung Cancer Patterns of Care study (called the "lung cancer survey") collected treatment data for lung cancer cases from the CCR diagnosed between November 2001 and December 2002 [11].
For both studies, clinicians who treated these patients were identified from CCR notifications. The physicians were then sent questionnaires seeking information on the patient's initial presentation, investigations and surgery, chemotherapy and radiotherapy in the primary treatment phase. A field officer collected this information from clinicians' records where necessary and feasible. Patients normally resident outside NSW were excluded. In the colorectal cancer survey, treating institutions were identified and categorised by type and location. In the lung cancer survey, the comorbidities recorded were conditions assessed at initial presentation that were likely to impact on the patient's disease or treatment; the patient's performance status and weight loss prior to initial presentation were also recorded.
The lung cancer survey classified morphology into either small cell lung cancer (SCLC), non-small cell lung cancer (NSCLC) or not pathologically confirmed (NPC). SCLC disease stage was classified according to the Veteran's Administration staging system [12] and categorised into limited, extensive or unknown. For comparison with other data sets, limited stage was considered to be "localised" and extensive stage was considered to be "non-localised" disease. For cases with NSCLC or NPC, disease stage was recorded in terms of tumour stage, nodal involvement and distant metastases (TNM). Disease stage was defined as localised (tumour size T0-T2 and no known nodal involvement or metastases), non-localised (T3-T4 or nodal involvement or presence of metastases) or unknown. The colorectal cancer survey classified disease stage into localised (involvement of the submucosa or muscularis propria with no known nodal involvement or metastases), non-localised (subserosa or serosal involvement, adjacent organ invasion, nodal involvement or distant metastases) or unknown.

Routinely collected health data
The administrative data sources have been described previously [2]. Briefly, the CCR is notified of all cancer diagnoses in NSW and collects information including month and year of diagnosis, cancer site and spread of disease at diagnosis. The latter was defined as localised, non-localised (adjacent organs or regional lymph nodes involved, or distant metastases) and unknown. The CCR does not record treatment information. CCR records for people diagnosed with colorectal or lung cancer in NSW from January 1999 to December 2002 were included in the linkage.
The APDC collates procedures and diagnosis information for all admitted patient episodes in NSW public and private hospitals. Procedures were coded using the Medicare Benefits Schedule-Extended classification of the International Classification of Diseases 10th revision, Australian Modification (ICD-10-AM). Diagnosis information was recorded as the primary diagnosis and additional diagnoses (additional diagnoses affecting treatment or length of stay) and coded to ICD-10-AM. Up to 31 procedure codes and 40 diagnosis codes could be recorded for each admission. APDC records from July 1998 to June 2003 were included in the linkage to ensure full coverage of admissions relevant to the primary treatment of each cancer.

Treatment and comorbidities
For APDC records, ICD-10-AM codes corresponding to the procedures recorded in the surveys were identified by cancer specialists. Chemotherapy and radiotherapy were identified using procedure codes and supplemented with diagnoses indicating that the treatment had been received (e.g. "Radiotherapy session") or that the admission was related to convalescence or sequelae of the treatment (e.g. "Convalescence following chemotherapy"). Radiotherapy is not indicated for colon cancer patients so the evaluation of the recording of radiotherapy treatment was restricted to rectal and lung cancer cases.
Comorbidities were included in comparisons because of their important role in determining patterns of care. The presence of relevant comorbidities was identified among the APDC principal and additional diagnoses using the codes described by Quan et al. [13] for the Charlson Comorbidity Index [14]. We considered two different algorithms using records representing: (1) hospital episodes in the 12 months up to and including the month of diagnosis, plus the first cancer-related admission if it occurred after the month of diagnosis; and (2) all available hospital episodes for 1998-2003. Comparable comorbidities were ascertained only in the lung cancer survey, and as those ascertained did not correspond exactly to those in the Charlson Index, ischaemic heart disease was combined with other atherosclerotic disease recorded in the lung cancer survey to be compared with the combination of myocardial infarction, congestive heart failure and additional ischaemic heart disease diagnoses recorded in the APDC (referred to as "heart disease").

Record linkage
As described previously [2], the NSW Department of Health used probabilistic matching to link CCR and APDC records. The Centre for Health Record Linkage (CHeReL) then matched records from the CCR and APDC to those in the colorectal and lung cancer surveys using probabilistic matching [15]. Uncertain matches and a sample of "certain" matches were reviewed clerically. The CHeReL estimated that there were approximately 0.1% false positive and less than 0.1% false negative linkages.
The patterns of care studies and linkage processes were approved by the ethics committees of the NSW Department of Health, Cancer Institute NSW and Cancer Council NSW.

Statistical analysis
Individual patient data provided by doctors or collected from doctors' records in the colorectal and lung cancer surveys were compared with APDC records for diagnostic investigations and treatment (curative surgery, chemotherapy, or radiotherapy). For lung cancer patients only, they were also compared with disease stage data in the CCR (we have previously reported this comparison for colorectal cancer [7]), and with selected comorbid conditions in the APDC. All comparisons of survey observations with APDC records include only those patients who linked to at least one APDC record. For the purpose of this analysis, the survey data were considered to be the "gold standard". Sensitivity was defined as the probability of an event being recorded in the administrative data if it was in the survey data and specificity was the probability of an event not being recorded in the administrative data if it was not in the survey data.
The local government area of the patient's place of residence at the time of cancer diagnosis was used to determine their accessibility to services as defined by the Accessibility/Remoteness Index for Australia [16].
Using chi-square tests, the proportions of patients on which there was agreement between the surveys and administrative data were compared across groups defined by age, sex, remoteness of residence, year of diagnosis, and disease stage for all cases, and tumour morphology (recorded by the survey), performance status, weight loss and comorbidities for lung cancer cases. Analyses were carried out in SAS version 9.1 (SAS Institute Inc., Cary, NC, US).

Results
There were 3091 colorectal cancer cases and 1810 lung cancer cases with treatment data from the surveys. 3038 (98%) colorectal cancer cases and 1707 (94%) lung cancer cases successfully linked to the CCR (Figure 1). Of these, 2917 (96%) colorectal cancer cases and 1580 (93%) lung cancer cases linked to at least one APDC record and were included in the analyses ( Figure 1, Table 1).
Colorectal cancer survey cases linked to the CCR and APDC were more likely than those who were not linked to be female (43% and 34% respectively, 95% confidence interval [CI] for difference: 1-16%), and to have had a colonoscopy (79% and 72%, 95% CI for difference: 0-14%). Lung cancer survey cases linked to the CCR and APDC were more likely than those who were not linked to have had surgery (17% and 8%, 95% CI for difference: 4-12%) and to have had a bronchoscopy (51% and 44%, 95% CI for difference: 0-14%). Failure to link to the CCR or APDC could have been due to insufficient matching of identifying details to be certain of a match. Not linking to the APDC could also have been due to not having any APDC inpatient hospital episodes (due to no hospital admission, non-recording of hospital episodes, or treatment outside NSW). Not linking to the CCR could have been due to the cancer not being registered in the CCR within the study period; early notification records were used to identify patients for the surveys.

Disease stage
Non-localised lung cancer was the most common disease stage, accounting for 72% of cases in the survey and 57% of cases in the CCR. Eighteen percent of lung cancer cases had unknown disease stage in the CCR, compared to only 2% in the lung cancer survey. This contributed to the poor sensitivity for both non-localised (sensitivity 68%) and localised (sensitivity 52%) disease. There was agreement between the survey data and the CCR for 63% of cases. After excluding lung cancer cases with unknown disease stage in either source of information, there was agreement on stage for 77% of the 1283 cases and the specificity with which the CCR recorded non-localised disease was 65%.

Diagnostic procedures
Only colonoscopies were recorded accurately in the APDC (Tables 2,3). The sensitivity with which the APDC recorded colonoscopies for colorectal cancer cases was highest for those treated in private hospitals (97%) or with private health insurance (97%) and lowest for those without health insurance (85%). Specificity was highest for cases aged 80 years or more (63%) and those treated in public hospitals (62%). There were 284 patients with a colonoscopy recorded in the APDC and not in the survey. Of these, 118 (42%) were after the month of diagnosis and were more likely to be related to post-treatment monitoring rather than pre-operative tests and thus might not have been captured in the survey.
The APDC recorded around two-thirds of the bronchoscopies and biopsies, but one-third or fewer of the radiography procedures for the diagnosis of lung cancer.

Potentially curative surgical treatment
The sensitivity with which the APDC recorded potentially curative surgical treatment was over 90%, but the recording of the actual surgical procedure was less  accurate (Tables 2,3). One-fifth of lobectomies for lung cancer were recorded as pneumonectomies or other definitive resections in the APDC. Four percent of rectal cancer cases who had a rectal resection recorded in the survey were recorded as having a colon resection in the APDC; the converse error occurred in 1% of colon cancer cases who had a colectomy. The sensitivity with which any curative colorectal cancer surgical treatment was recorded in the APDC was lowest for cases with unknown disease stage (88%), but it was at least 94% for all other patient groups. Of the 123 cases in the colorectal cancer survey who had curative surgical treatment but no corresponding record in the APDC, 45 (37%) had a matching admission date in the APDC, with around a third of these having an intestinal resection or other (minor) rectal resection recorded.
The sensitivity with which any lung cancer surgical treatment was recorded in the APDC was lowest for cases from rural areas (78%, 95% in non-rural areas); there was no appreciable variation for any other patient groups. Excluding seven cases who were likely to have been treated interstate, all of whom were from rural areas, increased the sensitivity with which the APDC captured surgical treatment for cases from rural areas to 94%. Of the thirteen other cases who had undergone surgery according to the lung cancer survey but had no record of surgery in the APDC, eight had a non-surgical admission recorded in the APDC on the same day that the surgery recorded in the survey was performed.
For the cases who had surgery recorded in both sources, date of surgery differed slightly between the survey and administrative data for 20% of colorectal cancer cases and 16% of lung cancer cases with the majority having surgery up to a week earlier according to the APDC.

Chemotherapy
The receipt of chemotherapy was under-enumerated in the APDC for both colorectal and lung cancer cases, with records in the APDC for less than half of the cases treated with chemotherapy according to the surveys (Tables 2,3). Of the cases identified in the APDC as having had chemotherapy, over 90% were identified from the procedure codes and the remainder were identified through relevant diagnosis codes only.

Radiotherapy
Enumeration of radiotherapy treatment in the APDC was even lower than that for chemotherapy, with less than one-sixth of rectal and one-third of lung cancer cases treated with radiotherapy identified (Tables 2,3). Radiotherapy treatment recorded in the APDC was identified from diagnosis codes only for 80% of rectal cancer cases and one-third of lung cancer cases. The majority of diagnosis codes in the APDC that identified radiotherapy treatment indicated after-effects of treatment not radiotherapy administered during the hospital stay. There were five lung cancer cases who, according to the lung cancer survey, had radiotherapy after the end of the period covered by the APDC. These were the only survey treatment records outside the period covered by the APDC, and they account for only 1% of the 440 cases with lung cancer who had radiotherapy that was not captured in the APDC.

Comorbidities
For key comorbidities, the level of agreement between the survey data and APDC for lung cancer cases was reasonable for diabetes but poor for COPD and heart disease ( Table 3). When we considered comorbidities recorded in the APDC over the entire study period (our secondary analysis), the sensitivity with which each condition was recorded increased by 14-16% with only small reductions in specificity (e.g. 88% sensitivity and 96% specificity for the recording of diabetes).

Discussion
Linked routinely collected administrative health data provided reasonably accurate information about curative surgery for colorectal and lung cancer cases, colonoscopies for colorectal cancer patients and comorbidities such as diabetes. The recording of disease stage was less accurate and the administrative data did not capture the majority of diagnostic investigations other than colonoscopies, nor comorbidities other than diabetes, nor treatment with chemotherapy or radiotherapy. While surgical treatment was well enumerated overall, there were some discrepancies in the recording of specific surgical procedures. Other studies have reported that agreement was lower for less definitive and less commonly performed procedures, and this may relate to the interpretation of the surgeons' notes [17,18]. We previously found that for prostate cancer, radical prostatectomy was recorded in the administrative data with 91% sensitivity and 100% specificity [2]. Another NSW study reported some mis-coding of mastectomies and breast conserving surgery [3]. Surgical treatment was not as well enumerated for cancer cases living in more rural areas, mainly due to data not being available for treatment in hospitals in neighbouring states. Chemotherapy, radiotherapy and diagnostic investigations other than colonoscopy are often carried out on an outpatient basis, so analyses using inpatient episodes only are expected to under-enumerate the use of these procedures. Radiotherapy appeared more likely to be identified for either cancer type when a long hospital admission coincided with the patient having radiotherapy. In contrast, previous research found that radiotherapy in the form of brachytherapy for prostate cancer patients was enumerated accurately as it requires a specific hospital admission [2].
Our results concur with previous studies using NSW linked administrative health data that reported a small under-enumeration of cancer-specific surgery [2,3] and a larger shortfall for radiotherapy [2]. Other Australian and international validation studies have also reported high accuracy for major surgical procedures [18][19][20][21] in administrative data collections, reasonable recording of disease stage [7][8][9] and under-enumeration of diagnostic investigations, chemotherapy and radiotherapy [20][21][22][23]. We previously found that the inclusion of Australian Medicare claims data substantially improved the enumeration of radiotherapy and also captured many of the cases receiving surgery who were missed by the APDC [2].
While the presence of diabetes was reasonably well captured for lung cancer cases, the other comorbidities investigated were vastly under-enumerated. Others have also reported that routinely collected diagnosis information under-enumerates comorbidities with the possible exception of diabetes [24][25][26]. This may be due to the comorbidity information being collected in the administrative data and surveys for different purposes. It may also depend on the period over which comorbidity is enumerated, as we found that sensitivity of APDC recording of comorbidity increased when we enumerated it over a longer period. While it seems that hospital records do under-report information on comorbidities, the available comorbidity data are still important when assessing patient outcomes [25,27].
The poor agreement between the administrative and survey data with regards to cancer stage suggests that we cannot judge the appropriateness of treatment based solely on administrative data [7,28]. The colorectal and lung cancer surveys recorded detailed information on tumour stage, lymph node involvement, site(s) of distant spread, patient performance status, weight loss prior to presentation, patient preferences (with respect to choice of treatment) and quality of life, thus providing a more comprehensive picture of cancer management.
We excluded cancer cases who did not link to the APDC so our estimates of sensitivity for procedures are likely to be somewhat optimistic. When all cases who did not link to the APDC were considered not to have had any of the procedures according to the administrative data, the sensitivity was reduced by 3-5% for each of the major procedure types and comorbidities.
Our study has other limitations. The comorbid conditions that are recorded in the hospital data are those that caused the admission or had some effect on the hospital stay, so this might not capture all relevant comorbid conditions. Also, data we used might now be considered relatively old. However, we believe there have not been any major changes in data quality or treatment that would substantially alter the quality of more recent data, thus our results are still relevant.
The administrative data have some key strengths. First, they are population-based, which removes some of the potential biases introduced by single-centre data collections or other area-based samples. Second, perhaps most important in a research environment with finite funding and resources, the data are relatively inexpensive and timely to acquire and are already being collected by experts in the field, making it possible to undertake regular large-scale analyses.
How can the administrative data be used to provide more comprehensive information on cancer treatment patterns? Marginal gains are possible with improved quality and availability of patient identifiers (name, date of birth, etc.) for record linkage. However the underenumeration of diagnostic procedures, chemotherapy and radiotherapy deserves more attention. The addition of other routinely collected data sources would help address this issue, in particular Medicare claims data, which have been shown to improve the accuracy of treatment and comorbidity information [2,24,[29][30][31]. The use of treatment data recorded by clinical cancer registries in NSW would also be a step forward; although currently these registries do not cover all cancers diagnosed in NSW [32]. There is also a need for information that is not currently routinely recorded, such as performance status on admission and clinicians' recommendations or patients' preferences for treatment. These data may only be possible through patient or clinician surveys, although well designed and well functioning electronic medical record systems could facilitate their collection.

Conclusions
Overall, the linked routinely collected administrative health data we used accurately described the overall use of potentially curative surgery for colorectal and lung cancer patients in NSW. This, combined with our previous findings for the treatment of prostate cancer, suggests that population cancer registries together with hospital admissions data are sufficiently accurate to monitor patterns of surgical care for different cancer types. Diagnostic procedures, chemotherapy, radiotherapy, comorbidities and cancer stage at diagnosis however, were not as well recorded in the administrative data, but information on colonoscopies might be sufficiently reliable. Information from other sources, such as Medicare claims data, is also required before routinely collected administrative data can be used to monitor cancer care at the population level.