(Dis)concordance of comorbidity data across ICD codes, medical charts, and self-reports

Background Benchmarking outcomes across settings commonly requires risk-adjustment for comorbidities that must be derived from extant sources that were designed for other purposes. A question arises as to the extent to which differing available sources for health data will be concordant when inferring the type and severity of co-morbidities, how close these are to the “truth”. We studied the level of concordance for same-patient comorbidity data extracted from administrative data (coded from International Classication of Diseases, Australian modication,10th edition [ICD-10AM]), from the medical chart audit, and data self-reported by men with prostate cancer who had undergone a radical prostatectomy. Methods We included six hospitals (5 public and 1 private) contributing to the Prostate Cancer Outcomes Registry-Victoria (PCOR-Vic) in the study. We listed eligible patients from the PCOR-Vic who underwent a radical prostatectomy between January 2017 and April 2018 for the Health Information Manager in each hospital, who provided each patient’s associated ICD-10AM comorbidity codes. Medical charts were reviewed to extract comorbidities used to generate the Charlson Comorbidity Index. The self-reported comorbidity questionnaire (SCQ) was distributed through PCOR-Vic to eligible men. Results The percentage agreement between the administrative data, medical charts and self-reports ranged from 92% to 99% in the 122 patients (from 217 eligible participants, 56%), who responded to the questionnaire. The prevalence-adjusted bias-adjusted kappa (PABAK) coecient was from 0.83 to 0.98 for all conditions aside from cancer, reecting a strong level of agreement for the absence of comorbidities. Conversely, the presence of comorbidities showed a poor level of agreement between data sources. There was concordance on 213/277 (77%) comorbidities when comparing medical charts and administrative data; 102/238 (30%) comorbidities when comparing medical chart and self-reports; and 34/150 (23%) comorbidities when comparing administrative data and self-reports.


Background
Prostate cancer (CaP) is the most common non-skin cancer in men worldwide.(1) In Australia it represents the second leading cause of cancer-related mortality in males. (2) Optimal disease management prevents progression of the cancer and preserves quality of life through avoidance of unnecessary treatment. The need to monitor the health and wellbeing of men with CaP has been recognised by physicians and healthcare workers, leading to the establishment of clinical quality registries. A clinical registry "collect(s) uniform data to evaluate speci c outcomes for a population de ned by a speci c disease, condition or exposure that serves one or more prede ned scienti c, clinical or policy purpose". (3) The Prostate Cancer Outcome Registry-Victoria (PCOR-Vic) was developed in 2009 as a clinical quality registry, to measure and report on quality of care, using benchmarking of performance at a clinician and hospital level. Benchmarking is one of the most effective strategies for quality improvement, as it provides useful information to medical professionals on where to improve clinical practice.(4) PCOR-Vic collects data on aspects relating to the diagnosis, treatment, quality of life and outcomes of men diagnosed with CaP.(5) Quality of care is examined using speci c quality of care indicators (see supplementary Table 1);; selected after a literature review and a prioritisation process (modi ed Delphi process (6)) which considered the importance and feasibility of each proposed quality indicator.
Indicators were categorised according to whether they assessed structures, processes or outcomes of care. (7) Both clinical (e.g. positive surgical margins and 90-day prostate cancer speci c mortality (PCSM) following treatment) and patient-reported outcomes (urinary, bowel and sexual quality of life) are reported.
Positive surgical margins and 90-day mortality are currently risk-adjusted using the National Comprehensive Cancer Network (NCCN) risk model, which considers the clinical stage prior to treatment, the prostate speci c antigen (PSA) level and the Gleason score at biopsy, but not patient comorbidities.
(8) Patient-reported outcomes are strati ed according to the treatment modalities (surgery, radiotherapy, androgen deprivation therapy (ADT), active surveillance/watchful waiting). If comorbidities are found to be independently associated with outcomes such as positive surgical margins (PSM) and PCSM, it may be appropriate to include them in the risk models used to generate benchmark reports for hospitals and clinicians. Additionally, the inclusion of comorbidities in modelling may be used for pre-operative risk strati cation, provision of information during the informed consent process with patients and their families, as well as in decision-making regarding the suitability for patients to undergo surgery.
However, prior to adjusting for comorbidities in the risk models, it is essential to understand the extent to which they are accurately captured in the data source from which they will be extracted. Comorbidity data may be derived from International Classi cation of Diseases (ICD) codes, manually abstracted from medical charts or collected directly from patients. It is unclear as to whether there would be concordance between each of these data sources and what would be considered the 'gold standard'. Clinical notes might be incomplete, the patient may not be informed of all comorbidities and coding system may not document all comorbidities.
The aim of this project was to examine the completeness and agreement of comorbidity data obtained from three data sources: ICD coded administrative data, medical charts and self-reports in men contributing to PCOR-Vic who have had a radical prostatectomy following a diagnosis of prostate cancer.

Study design
A retrospective cohort study design was employed for this project. Men contributing to PCOR-Vic who had undergone a radical prostatectomy between January 2017 and April 2018 at one of six convenient hospitals were eligible to participate in this study. The ve public hospitals included four metropolitan and one regional hospital, while the private hospital was located in metropolitan Melbourne. These six major health care facilities were chosen to maximise the sample size at each site.
A patient information and consent form, along with the questionnaire, were sent to eligible men.

Comorbidity data sources
Three data sources were used to compare comorbidities: The Charlson Comorbidity Index (CCI) was calculated from the ICD-10AM codes, using work developed by Sundarararajan et al. (9,10) The CCI is a universally employed index used to quantify comorbidity in ill patients, with weightings being assigned to each comorbidity depending on severity (see supplementary Table 2).(11) It captures 19 comorbidities. For the purposes of inter-data comparison, a modi ed weighted CCI (mCCI) was developed which contained only the eight variables present in both the CCI and the self-reported comorbidity questionnaire (SCQ). As the CCI weighted three comorbidities (liver disease, diabetes, cancer) based on their severity (moderate or severe), two mCCI scores were computed for each patient to factor in the possibility that a patient had a milder or more severe version of the particular comorbidity, as the severity of each of these three conditions was not captured in the self-reported comorbidity survey.
The SCQ made available to men was to be completed either on paper, online or via telephone (see supplementary Table 3). (12) The questionnaire included eight comorbidities captured by the CCI and ve additional comorbidities: hypertension, depression, osteoarthritis, back pain and dementia.
A medical chart audit was undertaken to capture CCI comorbidities reported in both the CCI and the SCQ, identi ed during the hospital admission related to the radical prostatectomy. For patients treated in the private hospital, the medical audit was undertaken using medical charts held in consulting rooms.

Statistical analysis
Characteristics of the respondents are reported as medians and interquartile ranges due to the skewed distribution or ordinal nature of variables such as age, PSA levels, Gleason score. The Index of Relative Socio-economic Advantage and Disadvantage (IRSAD) was used to assess socio-economic status. An IRSAD score of 0 represented the greatest societal disadvantage and a score of ten represented the greatest societal advantage.(13) Statistical signi cance was de ned as a p-value < 0.05.
Concordance was calculated using (1) the kappa statistic, (2) the PABAK statistic (kappa adjusted for prevalence bias); and (3) the Gwet's AC1 statistic which calculates the "conditional probability that two randomly selected raters will agree, given that no agreement will occur by chance".(14) The Gwet's AC1 statistic is founded on the assumption that chance agreement should not exceed 0.5.(18) Previous studies have highlighted how this statistic is resistant to changes in prevalence, and remains close to the percentage of agreement. (15) The level of agreement for the three inter-rater statistics was de ned according to Supplementary Table 4.((16) "Poor" agreement was de ned as a score less than zero, "slight" agreement was de ned as a score of 0 to 0.20, "fair" agreement was de ned as a score of 0.21 to 0.40, "moderate" agreement was de ned as a score of 0.41 to 0.60, "substantial" agreement was de ned as a score of 0.61 to 0.80 and "almost perfect" agreement was de ned as a score of 0.81 to 1.00.
Stata 15 (StataCorp, College Station, Texas) was used for data analysis.
The ethics application for this study was approved by the Monash University Human Research Ethics Committee (HREC/18/MonH/62). Governance approvals were obtained from each hospital.

Results
The recruitment frame is described in Figure 1. Table 1 describes the characteristics of the cohort contributing to this analysis. The median age of the 217 male participants was 66 years (IQR, 9 years). The median PSA level and Gleason score at diagnosis was 6.91 ng/mL (IQR, 4.7 ng/mL) and 7 (IQR, 1.), respectively. The majority of patients had an IRSAD score ≥8 indicating they resided in a more advantageous socio-economic area. Most patients in this study were classi ed in the NCCN intermediate risk group.
The median CCI and mCCI score based on the administrative data and medical chart audit were each 2 (IQR, 0). The median mCCI score based on the patient self-reports' mild and severe assumptions were 0 (IQR, 0), due to the large proportion of patients not self-reporting any comorbidity. With the assumption that all self-reported patients had cancer, the median mCCI score based on the patient self-reports' mild and severe assumptions was 2 (IQR, 0) and 6 (IQR, 0), respectively. All further analysis report only on comparisons across the three data sources using the mCCI.
The median comorbidity count, for the administrative data, self reports and medical chart review were 1 (IQR, 0), 1 (IQR, 1), and 2 (IQR, 2) respectively (Table 1).. Figure 2A-D depicts the distribution of comorbidities for the 112 patients for whom data was available in all three sources, while Figure 2E and Figure 2F demonstrates distribution of comorbidities for the 201 patients who had data recorded in both the administrative datasets and medical charts. All distributions follow a discrete number scale due to the CCI being a non-continuous comorbidity index.
Based on administrative and medical chart distributions (Figure 2A-B),, no patients had a CCI score of 0 or 1, as cancer was recorded for each patient, and this carries a weighting of 2 in the CCI. In 22.3% of patients, the administrative dataset had a mCCI score ≥3, in contrast to only 5.7% of patients in the medical charts. The patient self-report CCI distributions indicate that 56.3% of patients had CCI of zero (no self-reported comorbidities) ( Figure 2C-D).. In 0.9% of patients, the CCI score was ≥6 based on the mild comorbidity assumption ( Figure 2C),, with this gure increasing to 25% for the severe comorbidity assumption ( Figure 2D)..
The patient self-report distributions were notably more spread compared to the administrative dataset and medical chart-based distributions.
Medical charts vs administrative datasets Table 2 reports on the level of statistical concordance between the medical chart and administrative datasets. In total 201 cases were compared; 272 comorbidities were identi ed through audit of medical chart and 218 through administrative data. There was a match on 213/277 (77%) unique comorbidities reported across both datasets. Four out of the thirteen analysed conditions were unable to have a reporting statistic computed due to the prevalence of the condition being either 0% or 100% in both of the compared datasets. Of the remaining comorbidities, the kappa statistic reported "poor" concordance between the medical chart and administrative data other than for diabetes, which had a "substantial agreement". According to the PABAK and the Gwet AC1 statistic, nine out of thirteen conditions had "almost perfect" agreement between both data sources, re ecting the very low incidence of these comorbidities in the cohort.
Medical chart vs self-reports Table 3 compares comorbidities captured by medical chart and SCQ data. Of the 121 cases present in both cohorts, medical charts detected 237 comorbidities, while patients reported 203 comorbidities. However, only 102 of the 338 unique comorbidities (30%) were matched between the two datasets. Based on the kappa score, concordance ranged from "poor" to "substantial", with the best agreement being attributed to diabetes with a kappa score of 0.71. Agreement was considered as "substantial" for eight of thirteen conditions based on both the PABAK and Gwet's AC1 statistic. Cancer was classi ed to have extremely "poor" concordance, with negative concordance scores being cited. Back pain and osteoarthritis were suggested to have "moderate" agreement based on the PABAK score, but "substantial" agreement based on the Gwet's AC1 statistic.
Administrative data vs self-reports Table 4 summarises the level of concordance between administrative data and patient self-report data using the SCQ. Only eight conditions were comparable. Of the 121 charts compared between these two datasets, administrative data identi ed 120 comorbidities while 64 were self-reported by patients. There was a match on 34 of the 150 comorbidities (23%) across the two groups. Due to the relatively small number of cases reported for each comorbidity in both datasets, the kappa score reported "poor" agreement between the administrative data and self-reports, with the exception of diabetes which had "almost perfect" agreement. With the exception of cancer, the PABAK and Gwet AC1 statistic reported "almost perfect" agreement between both datasets for the remaining conditions.
Comparison across the three data sources Table 5 provides a summary of the concordance of comorbidity data across the three data sources. Apart from diabetes, the Kappa statistic suggested a poor to slight agreement for seven of the eight comorbidities. In contrast, the PABAK and Gwet's AC1 statistic suggested almost perfect agreement across the datasets, aside from cancer. Based on all three concordance statistics, cancer had the lowest level of agreement across the three datasets.

Discussion
This study examined the concordance of comorbidity data extracted from administrative data, medical charts and self-reported. Our ndings showed that agreement between comorbidity data collected by the Victorian Admitted Episodic Dataset, medical charts, and self-reports by men with CaP who have undergone a radical prostatectomy varied across the analysed comorbidities.
In terms of the comparison of all three data sources, a "slight", less than "moderate" agreement was observed for the majority of comorbidities when calculated using the kappa. However, the PABAK and Gwet's AC1 statistic suggested an "almost perfect" agreement (asides from cancer) for all conditions. Interpretation of these ndings is di cult when contradictory results are presented depending on the statistical test used. The higher agreement for the latter two statistics was due to a substantial number of patients not having certain diseases based on ≥1 data sources.
In light of this, the raw data for this study was analysed when investigating inter-rater concordance. No studies have previously compared these three data sources, so it is not possible to discuss our ndings in context with others. However, studies have assessed concordance between two of these three groups and these are discussed below.

Concordance between self-reports and other data sources
Cancer was one of the most interesting self-reported conditions not reported by the majority of patients with only 26% of respondents reporting the condition. A negative statistic was reported with the kappa, PABAK and Gwet'sAC1 when this condition was analysed between the self-reports and other datasets, highlighting "poor" agreement. Additionally, when comparing medical charts and self-reports, 74% (n = 121) of patients had cancer in the medical charts, even though they did not self-report the condition themselves. A similar observation was noted with the administrative datasets, with 76% of patients compared with the self-reports failing to state that they had cancer even though this was recorded in the administrative dataset.
The nding that men did not report having cancer may be due to the framing of the SCQ question that asked: "Do you have cancer?". Given that the selected cohort of patients had undergone surgery it may be that they believe having the prostate gland removed meant they no longer had cancer. However, while for many men this is the case, 27-53% of men will develop prostate-speci c antigen (PSA) recurrence up to 10 years after radical prostatectomy and 16-35% of patients receive second-line treatment within ve years of surgery. (17) Other studies have noted that the framing of a question relating to conditions such as cancer can impact reporting. Katz et al analysed concordance of Charlson comorbidities between the medical chart and selfreports, and found that there was only "moderate" concordance (kappa of 0.45 (95% CI: 0.28 to 0.62)) between the medical chart and self-reports in regard to the presence of cancer. (18) The researchers had intentionally omitted asking patients whether their tumour (if present) had been treated within the last 5 years. More interestingly, the Spearman correlation (a statistical measure used to quantify the statistical dependence between the ranking of two variables) between the comorbidity questionnaire and the medical chart-based Charlson index increased from 0.63 (P = 0.0001) to 0.70 (P = 0.0001) after the tumour condition was excluded. This highlights how the explicit wording of questions relating to cancer can result in various levels of recall bias.
There was a three-fold difference in reporting of depression (6 vs 17 reports), a nine-fold difference in reporting of back pain (4 vs 38 reports) and a nearly two-fold difference in reporting of osteoarthritis between medical charts and self-reports. Studies have highlighted that these conditions, in particular depression, are under-diagnosed in hospitals, with physicians not actively investigating whether patients have the condition due to its non-acute nature. (19) This has been especially noted with back pain, a chronic but non-life threatening comorbidity. (20) Patients with osteoarthritis have been noted to view their condition as part of the ageing process (21), with 50% of patients with severe knee pain not reporting to their physician about it (22) and thus reducing their inclination to actively seek medical assistance for their condition. In a study of 2380 community-dwelling patients aged 55-85 years a comparison of medical charts and self-report data showed a kappa of 0.31 (95% CI: 0.27-0.35) for osteoarthritis, with 21.8% of patients stating that they were affected by this condition even though their medical chart did not support the claim.(23) Those with a mobility limitation were more likely to self-report their condition than those without this limitation (OR: 2.68, 95% CI 2.10-3.44).(23) Ultimately, this highlights that patient-and physician-speci c views towards certain comorbidities can in uence their tendency of being recorded in the medical charts or self-reports.

Concordance between administrative data versus other datasets
In this study, we found that there was strong concordance between the administrative comorbidity data and the other studied data sources based on the reporting statistics used. However, after a more nuanced and deeper investigation, agreement with regard to the reporting of comorbidities was modest, at best.
With conditions such chronic pulmonary disease and rheumatic disease, cases were often recorded in the medical chart but not in the administrative datasets.
Although the PABAK and Gwet's AC1 statistic suggested "almost perfect" agreement for the majority of conditions between the administrative datasets and the other data sources, seven of the thirteen comorbidities compared between the medical charts and administrative datasets were identi ed to have cases recorded in the former but not in the latter data source. This was visibly depicted in the CCI distributions of both datasets, with 26% of patients having a CCI≥ 3 in the medical charts in contrast to only 7% of patients in the administrative datasets. Of these seven conditions, chronic pulmonary disease was recorded 25 times in the medical chart but was not coded in the administrative dataset. A similar observation was seen with rheumatic disease and cerebrovascular disease, where it was recorded in the medical chart of 8 patients but was not coded for in the administrative dataset. There may be a few reasons for this observed discrepancy. Several studies have identi ed that hospital coders prioritise the coding of symptomatic comorbidities over asymptomatic ones, due to the higher level of hospital funding associated with the former (24) or due to Australian guidelines dictating that "additional diagnoses can only be assigned if they affect patient care during admission". (25,26) The rst point was somewhat proven in the case of chronic pulmonary disease in this study, as it is a condition that may only manifest depending on certain environmental and physiological stimuli. (27) The Australian guidelines surrounding coding comorbidities is interesting, given that in the Australian Modi cation of the ICD-10AM codes, coders are provided with ample space ( fty slots) to state any secondary diagnoses of a particular patient. (26) Other studies have highlighted how the level of experience of the hospital coder can impact the accuracy of the ICD-10AM codes. (28) Differences in coding practice between inexperienced and experienced coders have been shown to exist. (24,29,30,31) The "substantial" to "almost perfect" agreement reported by the kappa, PABAK and Gwet's AC1 statistic for diabetes may be attributed to the highly scrutinized nature of this condition in clinical settings.(32, 33) Clinical coders are required to document ICD codes for conditions which require health services resources. (24) The codes are used to assign a Diagnosis Related Group (DRG), which in turn translates to funding for the health service. For patients with Type I diabetes, blood sugar levels are usually required at least twice daily, and insulin must be administered, usually by nursing staff. Type II diabetes is a condition that is monitored strictly within hospitals.(34)

Strengths and limitations:
This study has several strengths. This is the rst time that comorbidities have been compared across medical charts, administrative data and patient self-reports. Given the increasing focus on the use of patient reported data, this study enhances our knowledge on the reliability and accuracy of such data. This is particularly important as self-reported comorbidity data is a cost-effective way of collecting comorbidity data compared to the extraction of data from medical charts or administrative datasets. (35) While the SCQ survey has been validated, with good test-retest reliability being reported (36, 37), this is the rst time that it has been examined in a prostate cancer population. This is despite the fact that it has been recommended by ICHOM as the preferred method for collecting comorbidities in men with localised prostate cancer (38). Our nding that less than 30% of self-reported comorbidities reported by the patient appeared in the medical chart or coded administrative data, suggests that more research is required before we can use it to risk adjust health outcomes. While pre-operative administration of the SCQ will likely improve the likelihood that patients will self-report cancer, there remain other comorbidities such as heart disease, which were reported by patients but not documented in the other data sources.
However, this study has a number of limitations which will impact the interpretation of the ndings. One relates to the potential discrepancy in interpretation of the de nitions for each comorbidity. A list of ICD-10AM codes pertaining to certain conditions such as heart disease, lung disease and kidney disease were used to identify whether a patient was regarded to have the condition or not. These same criteria were not used uniformly in the other data sources. Indeed, the SCQ purposely diluted the complexity of the comorbidity labels to allow the conditions to be understood by patients "without any prior medical knowledge". (12) This likely impacted concordance. For example, in this analysis, ischaemic heart disease (IHD) was not classi ed as "heart disease" as the ICD-10AM codes pertaining to the two Charlson comorbidities of heart disease (myocardial infarction and congestive heart failure) did not consider IHD.
However, patients with the condition may have self-reported it as heart disease.
Non-response bias may have in uenced our results, given that we only had a response rate of 55.3%. It is not possible to know whether the non-responders would differ systematically from responders in terms of their comorbidities and whether they were recorded in the medical chart and administrative data. Another bias likely introduced into this study relates to recall bias as men may have had trouble recalling comorbidities.
The sample size for this study was relatively small (N = 217), preventing sub-group analysis, such as whether there were differences based on type of hospital (public/private), where documentation practices may differ. Also, more nuanced ndings were unable to be revealed, as shown in the difference in CCI distributions for the administrative datasets for n = 112 (population whom had data across all three data sources) and n = 201 (sample who had data across administrative dataset and medical charts).

Conclusions
This study has shown that there are limitations in the recording of comorbidities for each dataset, that impacts the ability to generate comorbidities indices. Using the SCQ as a risk adjustment tool will likely over represent comorbidities when compared to the use of medical reports or administrative data. The Charlson score, initially developed to be calculated from data extracted from medical charts, is likely to under-represent the presence of comorbidities compared to using administrative codes. In terms of raw counts of comorbidities, there were 272 and 218 in the medical charts and administrative codes, respectively.
Given the relative disconcordance of comorbidity data, it appears that computing comorbidity indices from a chosen data source may not accurately capture the true health pro le of the patient. This is a signi cant barrier that requires further investigations to determine the prognostic capacity of comorbidities on PSM or PCSM, outcomes that are currently reported by PCOR-Vic.
Further work is required to ensure that comorbidity-speci c information is accurately collected for eventual risk-modelling purposes in hospital and clinician benchmarking reports. The SCQ can be tailored to include non-medical jargon that is familiar to patients, and possibly include more contemporary questions. Future studies should also ensure that more robust statistical measures are employed when reporting on the level of agreement.