Agreement between referral information and discharge diagnoses according to Norwegian elective treatment guidelines – a cross-sectional study

Background Norway introduced 32 priority guidelines for elective health treatment in the specialist health service in the period 2008–9. The guidelines were intended to reduce large differences in waiting times among hospitals, streamline referrals and ensure that patients accessed the necessary healthcare to which they were entitled for certain conditions. Referral information guided the priorities. As the referral information was key to future evaluation of the guidelines, this study validates the referral information in hospital patient records against discharge diagnoses, because only the discharge diagnosis is recorded in the Norwegian Patient Register (NPR) database, which is used in the main evaluation. Methods Of the specific conditions from 10 priority guidelines, 20 were selected for review for the period 2008–9 at 4 hospitals in Norway. The ICD-10 diagnoses per disease or condition were given in retrospect by clinicians who participated in the expert groups developing the priority guidelines. Reasons for deviations between referral information and discharge diagnoses were coded into four categories, according to the degree of precision of the former compared with the latter. Results In all, 1854 medical records were available for review. The diagnostic precision of the referrals differed significantly between hospitals, and across the 2 years 2008 and 2009. The overall sensitivity was 0.93 (95% confidence interval 0.92–0.94). For the separate conditions, sensitivity was in the range 0.60–1.00. Experience showed that it was necessary to pay careful attention to the selection of ICD-10 diagnoses for identifying patients. The medical records of psychiatry patients were unavailable in some cases and for certain conditions some were unavailable after use of our record extraction algorithm. Conclusion The sensitivity of the referral information on diagnosis or condition was high compared with the discharge diagnosis for the 20 selected conditions from the 10 priority guidelines. Although the review assessed a limited number of the total, we consider the results sufficiently representative and, hence, they will allow use of the NPR data for analyses of the introduction and follow-up of the 32 priority guidelines.


Background
Hospital and administrative databases in the health service are a source of valuable information for both analyses of health service quality assessment and research. Use of these data may require analyses of data validity to be made to assess variability and the degree of error in the data [1][2][3][4][5][6][7][8][9][10][11][12][13].
The Norwegian Directorate for Health directed the development of the priority guidelines for elective treatment in 32 distinct medical fields within the specialist health service. The aims of the guidelines were to reduce the large differences in waiting times among Norwegian hospitals, to streamline referral and to ensure patients' legal rights to treatment.
The development of the guidelines, which specified any treatment priorities, were carried out by separate expert groups for each medical field, including clinicians, specialized practitioners, general practitioners (GPs) and representatives from particular patient organizations, as relevant [14]. The evaluation of the introduction of the 32 priority guidelines will be carried out on administrative data from the NPR which receives discharge diagnoses (ICD-10 codes) and procedures for all admissions to public hospitals, but not referral information [15,16]. However, the priority with regard to elective health treatment for the different diseases was based on conditions or diseases specified by the diagnostic code or symptom descriptions by the GP's or medical specialist's referral. The referral is at the discretion of the referring physician or specialist, or by internal referral, as per a letter or internal note/electronic message in hospitals. The hospital must respond within 2 weeks and, if the referral is about a condition that should receive priority treatment, the patient should receive a fixed date for treatment or examination within the timeframe given in the relevant priority guideline.
It is interesting to use hospital or administrative databases for epidemiological research or quality projects in the public health service. Some comparisons, a few referred to here, have been made comparing administrative databases with clinical records [1][2][3][4][5][6][7][8][9][10][11][12][13]. Norwegian studies include validation of a Norwegian stroke register, including non-hospitalized and hospitalized cases, to discharge diagnosis; they found a sensitivity of 0.86 for ICD-9 codes 430-438 of cerebrovascular disease [4]. Only 4.6% of the discharge diagnoses were classified as non-stroke ones. Furthermore, they found that the distinction between subtypes should not be made unless coding practices were improved. Thomsen et al. were satisfied with the validity of the diagnosis of pre-eclampsia over the period 1967-2005 in the Medical Birth Registry of Norway [12]. A single hospital evaluation of hip replacements during 1999-2002, for the NPR, and 1987-2003, for the Norwegian Arthroplasty Register (NAR), found that the NPR missed 3.4% and the NAR missed 0.4%, confirming the latter database to be valid and reliable [13]. This contrasts with the report in 2005 by Lofthus et al. which questioned the validity of electronic databases for the registration of hip fractures [11]. The total number of hip fractures, confirmed by review of medical records and logbooks of operating theatres, showed that the NPR database had over-estimated the number of fractures by 19%, whereas local electronic databases had both over-and under-estimated [11]. Since these publications, the NPR established a database in 2007 using person-identified records. The ultimate validity in fatal events comes from postmortem examinations, which Gulsvik et al. used to assess the validity of the mortality statistics for fatal cerebral stroke and coronary deaths in Bergen, Norway [3]. For fatal stroke they found a sensitivity of 0.75 (95% confidence interval [CI] 0.66-0.83) and a positive predictive value of 0.86 (CI 0.77-0.92). For coronary deaths, the values were 0.87 (CI 0.84-0.91) and 0.85 (CI 0.81-0.89), respectively.
Sørensen et al. have developed a framework for evaluating secondary data sources for epidemiological research [17]. They have outlined seven criteria to be fulfilled before administrative data can be used in research: 1. Completeness of registration of individuals 2. Accuracy and degree of completeness of the registered data 3. Size of the data source 4. Registration period 5. Data accessibility, availability and cost 6. Data format 7. Possibilities of linkage with other data sources (record linkage).
Thus, by using the NPR database we address the second criterion of accuracy and degree of completeness of the registered data. In the current study the sensitivity is an estimate of the agreement between the referral information and the hospital data [18].
The aim of the current study was to compare the accuracy and degree of completeness of the referral information with the registered discharge diagnoses by reviewing patient records of 20 selected diagnoses and conditions from 10 priority guidelines in the period 2008-9, when the priority guidelines were being implemented.

Methods
Although 32 priority guidelines had been implemented, only 10 were found to be feasible for a study estimating the validity of discharge diagnoses recorded in the NPR ( Table 1). The selection was the result of the selection criteria, sample size calculation and resources needed for this study. The NPR was established in 1997 and is a register of all people awaiting or receiving treatment in the Norwegian specialist health service. The purpose of the register is fourfold: 1. Provide data for administration, policy and quality of the specialist health service including activity-based financing. 2. Contribute to medical research including research on the health services, effect of treatment, diagnoses and disease causes, prevalence, progress and preventive measures. 3. Be the foundation for the establishment and quality assessment of hospital and quality registers. 4. Contribute to knowledge about safety and accident prevention.
Since 2008, the registry has provided patient identifiable data.
The Norwegian Knowledge Centre for the Health Service (Knowledge Centre) initially used a two-stage process to select conditions in the relevant medical fields. A team of three researchers independently suggested five guidelines, each of which gave a broad representation of the medical fields in the guidelines. This was a challenge because there were 32 guidelines with a total of 399 conditions; they were reduced to 11 guidelines. The criteria for selection were medical field, age (adult versus children), gender-specific disorders, volume of hospital discharges per relevant diagnosis and disease severity. From each of the 11 guidelines 2 conditions or diseases were chosen in view of the following criteria: the grade of priority for elective treatment, urgency of treatment, chronic disease versus non-elective treatment and volume/prevalence ( Table 1).
The estimated sample size required for the study was calculated as 92 records per selected condition, giving 95% confidence intervals with an expected half-width of 10%. This allowed assessment of 22 conditions in 11 guidelines, within a frame of 2000 records, for review. A pilot study reduced the number of guidelines to 10 because adult psychiatry data proved difficult to access. Therefore, 100 records per selected prioritized condition in 10 guidelines were chosen. The Knowledge Centre was given permission by the Norwegian Data Directorate and the Regional Committee for Ethics in Research to examine medical records at the hospitals.
The expert groups for the different guidelines selected and defined the conditions according to clinical signs and symptoms of disease. They did not give them ICD-10 or procedure codes. To ensure the quality of the ICD-10 and procedure codes used, all the expert groups were contacted to help compile this information. The expert groups provided information in all but two cases, in which physicians at the Knowledge Centre assisted. Despite this effort, not all the 399 conditions could be identified by these codes.
Hospital records provided both referral information and discharge diagnoses, avoiding the need to search across multiple primary care systems. We did not have the opportunity to select records using the referral information in the hospital's data systems because this was incomplete for our purposes.
The following hospitals were visited: Haukeland University Hospital in Bergen (Haukeland), University Hospital of North Norway in Tromsø (UNN), St.Olav Hospital, Trondheim University Hospital in Trondheim (St Olav) and Akershus University Hospital (AHUS), Lørenskog. These hospitals were considered to have enough patients representing each of the four health regions of Norway. The four hospitals were initially chosen to ensure that we had the greatest chance of identifying sufficient relevant cases at each hospital. A visit to all Norwegian hospitals would have been costly with a minimum benefit to the project. All hospital data are stored in electronic systems. The letters of referral were scanned and added to the relevant medical record. Receipt of the referral was noted in the system. The medical record system in use was DOCULIVE at St Olav and DIPS at Haukeland, UNN and AHUS. An initial feasibility analysis was performed at UNN in November 2010. Two people estimated the concordance of their interpretations of the extracted record information, which was found to be satisfactory. The record review was subsequently delayed for administrative reasons, restarted in May 2012 and completed in June 2013. The hospitals were asked to extract records in line with our specifications: 1. Select patients with the listed main discharge diagnoses at the first consultation or treatment session after the referral had been received. 2. In the period 2008-9 choose one patient each month who has been referred and treated, and the twenty-fifth patient from any month. The referral date should preferably not be before 1 January 2007. 3. If there is no relevant patient in a particular month, select two the next month. 4. Select only patients referred for elective treatment, not emergency treatment.

The list must contain different individuals.
For certain diagnoses, there were fewer than 100 patient records available so that our target of 2000 patients was not achieved. The data were recorded on a separate PC offline and no data were transferred on the web.
The statistical analyses involved estimating sensitivity overall by hospitals and by condition. The true cases of this study have matching referral information and discharge diagnosis. The sensitivity is the proportion of cases with the given discharge diagnosis that are identified by the referral information. The need to select samples on the basis of discharge diagnosis codes excludes the estimation of the specificity of referral information. Non-identification was due to different causes such as poorly formulated referral, faulty assessment or missing information. We also regarded an inaccurate discharge ICD-10 diagnosis code, compared with the written diagnosis on the record, as non-identification. The agreement of true cases was further divided into three categories to assess precision of the referral information to identify true cases: 1. Clear agreement between referral and discharge 2. Poorly formulated referral, some modification by the hospital 3. Not sufficiently specific but adequate.
An overall comparison of precision between hospitals was done by the χ 2 test for 2008 and 2009 separately, and combined. Age is presented by median and interquartile values. Analyses were performed using SPSS Software, version 15.0.

Results
In all, 1854 medical records were reviewed concerning 20 conditions selected from 10 priority guidelines, to give a broad insight into hospital referrals (Table 1). They had a fairly equal sex distribution (53.9% women and 46.1% men), with two conditions that concerned just women ( Table 2). The age ranged from <1 year to 108 years. The study also tested the codes from the priority guidelines and, apart from the condition of haematuria at the Haukeland University Hospital, where the Z-codes Z03.1 and Z03.8 were found be insufficiently specific for the condition, the codes identified the cases. These selection criteria were removed in the subsequent reviews at the other three hospitals (Table 3). Medical records for mental health of children and young people proved difficult to access and, for some diagnoses, hospitals did not have the required number of relevant records for review.
The referral information was coded to show how well it matched the discharge diagnosis ( Table 4). The frequencies of these categories varied between the hospitals in both 2008 and 2009, and in combination (χ 2 test; p <0.05). The sensitivity increased after adding less specific but sufficient referral information to the definite referral diagnosis and information. For Haukeland the sensitivity increased from 74.4 to 88.2 (including haematuria), for UNN from 83.9 to 96.3, for St Olav from 89.5 to 95.4 and for AHUS from 88.5 to 91.7. The reasons for incorrect referrals were poor formulation, faulty physician assessment or lack of information (n = 10, 0.6% of total). In the course of the record review, discharge diagnoses were identified by chance that did not have a corresponding correct referral for the treatment subsequently given (n = 121, 6.5% of total).

Discussion
The overall agreement between the discharge diagnoses for elective treatment and the referrals was high. The sensitivity ranged from 0.88 (0.87-0.90) at Haukeland, to 0.96 (0.94-0.98) at UNN, 0.95 (0.91-0.99) at St Olav and 0.92 (0.90-0.94) at AHUS, and overall 0.93 (0.92-0.94). The referral information was categorized according to the level of precision, and there was no common trend towards systematic bias. The level of precision differed significantly between the hospitals in 2008 and 2009, and combined. Although the precision in the referrals differed, the high degree of sensitivity indicated the high accuracy of the information re the conditions and other relevant information in the referrals, both external and internal. The correct coding of the prioritized diseases or conditions was critical for identifying the patients, as shown with haematuria. The main categories of the ICD-10 are more precise and subsidiary diagnoses should be used in combination. The age ranged from <1 year to 108 years, and males and females were reasonably represented, giving confidence in the representative nature of the samples taken from four major hospitals in each of the four regional health regions of Norway.
The Norwegian studies mentioned earlier [3,4,[11][12][13] and international studies confirm a degree of variability in sensitivity, specificity and positive predictive values [1,2,[5][6][7][8][9][10]. Lacasse et al. underlined the importance of routinely assessing the validity of diagnoses before making use of administrative databases in research [9]. Validity studies from other countries include the Finnish Hospital Discharge Register, which made a systematic overview of 32 studies that had validated their data, many using medical records [1]. They assessed the completeness and accuracy as ranging from satisfactory to very good. Two validity studies of the Danish National Registry of Patients (DNRP) found, after reviewing medical records, a sensitivity of 0.91 for pulmonary empyema overall, but sensitivity decreased with patient age. Another study of the DNPR on atrial fibrillation and atrial flutter was also satisfactory, but the researchers found that the precision of the ICD-10 coding was important for correctly identifying cases [6,7]. Our results are in line with these studies. Herrett et al. compared the different sources of data from primary care, hospital care, disease registry and national mortality records in England [2]. Investigating the incidence of acute myocardial infarction, they found that use of a single source, compared with all three sources of information, on average under-estimated the incidence by 25-50%. In that study the disease registry was taken as the gold standard. The sensitivity of primary care data was 0.93 (95% CI 0.92-0.93) and for hospital admissions 0.925 (0.91-0.92).
The source of data is important, because Lee et al. found that the validity of hospital discharge data was insufficient to get the incidence for several cancer diagnoses [5]. In the UK a validation study of cancer diagnoses in the general practice research database (GPRD) and of the Cancer Registry (CR) database was carried out [10]. There were 91% cancer events in both databases. The researchers found that false-negative primary care records were due to   a delay by an average of 11 days from registration in the GPRD to that in the CR database. When comparing data sources attention must be paid to disease codes and dates for recording diagnoses. Examining the validity of heart failure diagnoses, Lee et al. compared ICD-9 codes and the Framingham criteria, and in addition studied the impact of including Charlson's index of co-morbidity [9]. The ICD-9 codes were highly predictive and co-morbidity information could enhance future studies on heart failure mortality. This aspect is relevant if the objective is to gain further understanding of single diagnoses.
The current study has contributed to increased knowledge in Norway about the validity of referral information versus discharge diagnosis. Ideally, the sample size should have been larger and all the guidelines should have been examined in this manner, ready for our main study. However, this was found not to be feasible. If the results of this validation had indicated poor agreement between referral information and discharge diagnosis, the main study would have been more limited. Errors that occurred were wrongly coded discharge diagnoses, an example being a neurological lesion in a foot coded as Bell's palsy based on the guideline on oral surgery and oral medicine. Real changes in diagnosis from referral to discharge were few. The differences in the two electronic record systems did not represent a big problem for extracting the data. However, information in the records was sometimes missing and, in some cases, inaccessible to the reviewer. Incomplete recording occurred commonly. This study could not address the specificity of the referral information. Another of its limitations is that it relies on a single observer (LLH) abstracting records and judging agreement, although the pilot study did show agreement between two observers.
The current study does not validate NPR data as such against hospital discharge registers. In the period under study, there was a transition from CD to electronic transfer of data (in 2008 about 50:50; in 2013 only a few minor units reported by sending CDs). However, an assumption was made that the electronic transfer of data from the hospital discharge registers to the NPR functions well. The codes are quality assured before being transmitted to the NPR. The NPR in turn also performs quality controls and reports back to the hospitals, which return corrected data to the NPR. This study does, however, give an estimate of the accuracy and degree of completeness of the registered data in the hospital records and thus in the NPR. Hence, it gives an overall estimate of error in the NPR data; this can be used to identify patients within the different conditions of the 32 guidelines.

Conclusion
In view of these results, we believe that the referral information agrees with the discharge diagnosis with a sufficiently high sensitivity among patients with the discharge diagnoses tested. The challenge is to correctly specify the codes of the conditions in the priority guidelines. A limitation of the main study will be the inability to follow those clinical conditions that did not have identified ICD-10 or procedure codes. Although this limits the number of conditions studied, we believe that the NPR databases can be used with reasonable confidence for the purpose, in our main study, of analysing the introduction of 32 priority guidelines, knowing the estimated overall level of error.