Skip to main content

Validity of an algorithm to identify cardiovascular deaths from administrative health records: a multi-database population-based cohort study

Abstract

Background

Cardiovascular death is a common outcome in population-based studies about new healthcare interventions or treatments, such as new prescription medications. Vital statistics registration systems are often the preferred source of information about cause-specific mortality because they capture verified information about the deceased, but they may not always be accessible for linkage with other sources of population-based data. We assessed the validity of an algorithm applied to administrative health records for identifying cardiovascular deaths in population-based data.

Methods

Administrative health records were from an existing multi-database cohort study about sodium-glucose cotransporter-2 (SGLT2) inhibitors, a new class of antidiabetic medications. Data were from 2013 to 2018 for five Canadian provinces (Alberta, British Columbia, Manitoba, Ontario, Quebec) and the United Kingdom (UK) Clinical Practice Research Datalink (CPRD). The cardiovascular mortality algorithm was based on in-hospital cardiovascular deaths identified from diagnosis codes and select out-of-hospital deaths. Sensitivity, specificity, and positive and negative predictive values (PPV, NPV) were calculated for the cardiovascular mortality algorithm using vital statistics registrations as the reference standard. Overall and stratified estimates and 95% confidence intervals (CIs) were computed; the latter were produced by site, location of death, sex, and age.

Results

The cohort included 20,607 individuals (58.3% male; 77.2% ≥70 years). When compared to vital statistics registrations, the cardiovascular mortality algorithm had overall sensitivity of 64.8% (95% CI 63.6, 66.0); site-specific estimates ranged from 54.8 to 87.3%. Overall specificity was 74.9% (95% CI 74.1, 75.6) and overall PPV was 54.5% (95% CI 53.7, 55.3), while site-specific PPV ranged from 33.9 to 72.8%. The cardiovascular mortality algorithm had sensitivity of 57.1% (95% CI 55.4, 58.8) for in-hospital deaths and 72.3% (95% CI 70.8, 73.9) for out-of-hospital deaths; specificity was 88.8% (95% CI 88.1, 89.5) for in-hospital deaths and 58.5% (95% CI 57.3, 59.7) for out-of-hospital deaths.

Conclusions

A cardiovascular mortality algorithm applied to administrative health records had moderate validity when compared to vital statistics data. Substantial variation existed across study sites representing different geographic locations and two healthcare systems. These variations may reflect different diagnostic coding practices and healthcare utilization patterns.

Peer Review reports

Background

Cardiovascular death is a cause-specific outcome of interest in many studies about the comparative effectiveness of new healthcare interventions. For example, studies about the safety and effectiveness of new prescription medications compared with existing medications frequently use both all-cause and cause-specific death as endpoints [1, 2]. When studies that include cause-specific mortality as an outcome are conducted using population-based data, vital statistics registration systems are often the preferred source of information about cause-specific mortality because they capture verified information about the deceased, the circumstances of death, and the direct antecedent and underlying cause(s) of death [3]. However, there can be challenges associated with using vital statistics registrations for population-based comparative effectiveness studies. The data may not be sufficiently timely for investigations of new interventions, such as new medications that have recently come to market, because the process required for verification of cause of death may be lengthy [4]. In addition, routine linkage of vital statistics registrations to other population-based administrative data may not be possible in all jurisdictions [5, 6], in part due to legislation governing data access [7].

Routinely-collected, population-based administrative health data, including hospital records and physician visit records, represent an alternative source to identify specific causes of death [8]. Administrative health data are potentially advantageous because in many jurisdictions, they are relatively straightforward to access, and processes have been established to link multiple sources of administrative data while ensuring that health privacy legislation requirements are met [9]. However, given that administrative data are captured for purposes of health system management and healthcare provider remuneration and not for identifying specific causes of death, their validity for the latter purpose has been questioned [10]. There are few studies that have examined the accuracy of administrative health data for investigating specific causes of death [10], particularly across multiple jurisdictions. A recent systematic review about sources of bias in drug safety and effectiveness studies conducted using population-based routinely-collected data emphasized the importance of validation studies to identify potential sources of bias and strategies to address these sources when measuring study exposures and outcomes [11].

The aim of our study was to assess the validity of an algorithm applied to administrative health records in multiple jurisdictions for identifying cardiovascular deaths. We used vital statistics registrations as the reference standard to validate the cardiovascular mortality algorithm.

Methods

Data sources

Data were from an existing multi-database retrospective cohort study conducted by the Canadian Network for Observational Drug Effect Studies (CNODES) [12], a pan-Canadian network that examines questions of drug safety and effectiveness at the request of government stakeholders. This cohort study investigated the safety and effectiveness of sodium-glucose cotransporter-2 (SGLT2) inhibitors, a new class of antidiabetic medications, compared to dipeptidyl peptidase-4 (DPP-4) inhibitors [13,14,15,16]. Databases from five Canadian provinces (Alberta, British Columbia, Manitoba, Ontario, and Quebec) and the United Kingdom (UK) Clinical Practice Research Datalink (CPRD) were used. The study period was from 2013 to 2018.

In each Canadian province, study data included vital statistics registrations, health insurance registrations, physician billing claims, hospitalization records, emergency department (ED) visit records (not available in Manitoba), and prescription drug dispensation records. These data sources can be linked at the individual level using anonymized personal health numbers. Vital statistics registrations capture official records of births, stillbirths, deaths, and marriages. In death records, the underlying cause of death is recorded using the World Health Organization’s International Statistical Classification of Diseases and Related Problems (ICD), 10th revision (i.e., ICD-10) [3]. The registration of deaths is a legal requirement in all Canadian provinces and as such, reporting is virtually complete; under-reporting may occur as a result of late or incomplete registration, but non-registration or over-reporting is unlikely [3]. Health insurance registration files capture start and end dates of health insurance coverage, including the date of loss of coverage due to death or migration; demographic and residence location information is also maintained in these files. Physician billing claims contain information about ambulatory services provided by specialists and general practitioners, including the type of service, date of service, and at least one diagnosis code associated with the reason for the service (in Quebec, some claims are missing a diagnosis code, although the overall completion rate is in excess of 88%); the latter are recorded using the 8th (Ontario only) and 9th revisions of ICD (i.e., ICD-8 and ICD-9) [17]. Hospitalization records contain information for each patient during the period of the hospital stay, including up to 25 diagnoses codes recorded using ICD-10-CA (i.e., enhanced Canadian revision). Prescription drug claims capture medications dispensed by community pharmacies; in-hospital medication dispensations are not included. ED visit records contain information about visits to hospital-based EDs, including the date of the visit, chief complaint (i.e., reason for the visit), and diagnosis codes (where available).

Study data were also obtained from the CPRD, a large UK primary care database containing medical information documented by primary care physicians for approximately 15 million patients enrolled in over 700 general practices [18]. The data are regularly reviewed and considered to be valid and of high quality [19,20,21]; they capture patient demographics, medical history, prescribed medications, and clinical measures, but do not capture emergency department (ED) visits. CPRD data were linked to the Hospital Episode Statistics (HES) database; this linkage is available for general practices in England that have consented to the linkage. The HES contain hospitalization information, including diagnoses recorded using ICD-10 codes. CPRD data were also linked to national death registrations from the Office of National Statistics (ONS); this linkage is available for general practitioners in England who have consented to the linkage. The underlying cause of death is recorded in registrations using ICD-10 codes.

Study cohort

The cohort has been described in detail elsewhere [13,14,15,16]. Briefly, the cohort for the initial multi-database study included patients who received a prescription for a SGLT2 inhibitor or a DPP-4 inhibitor. The dispensation date (prescription date for CPRD) for either medication had to occur on or after the date of the first dispensation or prescription of a SGLT2 inhibitor for each site and on or before June 30, 2018. Cohort entry was the date of the first SGLT2 or DPP-4 inhibitor dispensation or prescription in this study period. Cohort exit was the date of censoring due to discontinuation of the study drug, death, end of healthcare coverage, or end of the study period. The initial study cohort excluded individuals less than 66 years of age in Ontario, 19 years in Alberta, and 18 years in British Columbia and Manitoba and in the CPRD. In Quebec, the initial cohort was restricted to individuals who were greater than 65 years, or who were receiving social assistance, or who did not have access to a private insurance plan. These exclusions were based on drug data availability in the sites. Additional exclusions from the initial study cohort were due to missing sex, date inconsistencies, no follow-up (i.e., cohort exit date less than or equal to cohort entry date), SGLT2 and DPP-4 inhibitor prescriptions on the same day after the cohort entry date, or less than 365 days of health insurance coverage prior to the cohort entry date.

We constructed our validation cohort from this initial study cohort for those sites where linkage of administrative health records and vital statistics registrations (death registrations from ONS in CPRD) was possible and for those years for which these registration data were available (see Table 1 for available data at each site). The validation cohort excluded individuals who were alive, based on health insurance coverage information in the Canadian provinces and no recorded date of death in the CPRD data, as of June 30, 2018. We subsequently excluded individuals who were missing a date of death, as well as individuals for whom the difference in dates of death recorded in administrative health records and vital statistics registrations was greater than 60 days; the latter was an indicator of potential data quality issues.

Table 1 Start and end dates of study period at each site for validation cohort creation

Outcome measure

The outcome of cardiovascular death in administrative health records used the following algorithm: (a) in-hospital death with a cardiovascular disease diagnosis in the primary/most responsible diagnosis position, or (b) out-of-hospital death (including death in an ED) without documentation of cancer in the 365 days prior to and including the date of death and without documentation of trauma in the 30 days prior to and including the date of death. A significant proportion of all cardiovascular-related deaths are known to occur outside of hospital [22,23,24]. We searched hospitalization records, ED visit records, and physician billing claims in provincial data, and all CPRD and HES records for documentation of cancer or trauma diagnoses for out-of-hospital deaths.

The list of relevant diagnosis codes to identify in-hospital cardiovascular deaths is provided in Table 2 [25]. For out-of-hospital cardiovascular deaths, the cancer diagnosis codes included ICD-9-CM 140 to 172 and 174 to 209 and ICD-10-CA C00 to C43 and C45 to C97, and the trauma-related diagnosis codes included ICD-9-CM 800 to 999 and E000 to E999 and ICD-10-CA S00 to T98 and V01 to Y98.

Table 2 ICD-10 diagnosis codes for cardiovascular disease

In vital statistics registrations, which were used to validate the algorithm, cardiovascular deaths were those that had an underlying cause of death with a cardiovascular disease diagnosis. The relevant ICD-10 codes are provided in Table 2.

Statistical analysis

The validation cohort was described using frequencies and percentages. Validity of the cardiovascular mortality algorithm was assessed using sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). All estimates are reported as percentages.

Sensitivity was calculated as the number of correctly-identified cardiovascular deaths in administrative health records divided by the total number of cardiovascular deaths from vital statistics registrations. Specificity was calculated as the number of correctly-identified non-cardiovascular deaths from administrative health records divided by the total number of non-cardiovascular deaths from vital statistics registrations. PPV was calculated as the number of correctly-identified cardiovascular deaths in administrative health records divided by the total number of cardiovascular deaths identified from administrative health records. NPV was calculated as the number of correctly-identified non-cardiovascular deaths in administrative health records divided by the total number of non-cardiovascular deaths identified from administrative health records. The 95% confidence intervals (CIs) were calculated for all estimates; they were based on the binomial distribution.

Estimates were produced overall (i.e., by combining frequencies for the six sites and then calculating the validity estimates), for the five Canadian provinces, and individually for each of the six sites. Overall and site-specific estimates were also stratified by location of death (in-hospital; out-of-hospital), sex and age group (< 70 years; ≥70 years).

Results

As Fig. 1 reveals, the initial study cohort was comprised of 683,325 individuals of whom 96.9% were alive on June 30, 2018. There were few additional exclusions to arrive at the final validation cohort of 20,607 individuals. Specifically, less than 0.1% of individuals were missing a date of death in at least one data source or had dates of death greater than 60 days apart in administrative health records and vital statistics registrations.

Fig. 1
figure1

Study flow chart for development of the validation cohort. Legend: Initial study cohort was from an existing multi-database retrospective cohort study about the safety and effectiveness of sodium-glucose cotransporter-2 (SGLT2) inhibitors compared to dipeptidyl peptidase-4 (DPP4) inhibitors

More than two-thirds of the validation cohort (Table 3) were from the Canadian provinces of Ontario and Quebec. More than half (58.3%) of the validation cohort members were male and more than three-quarters were at least 70 years of age. The majority of validation cohort members had dates of death in 2016 or 2017 (data not shown). Overall, 31.7% of the deaths captured in vital statistics registrations were cardiovascular deaths.

Table 3 Characteristics of the validation cohort

Slightly more than half (10,807; 52.4%) of the deaths included in the validation study were identified as in-hospital deaths (Table 3). Amongst all in-hospital deaths, 17.8% were identified as cardiovascular deaths in both data sources and 62.3% were identified as non-cardiovascular deaths in both data sources. Amongst all out-of-hospital deaths, 24.5% were identified as cardiovascular deaths in both data sources and 38.7% were identified as non-cardiovascular deaths in both data sources.

Table 4 contains the validity estimates for the cardiovascular mortality algorithm for individual sites, as well as overall and for the Canadian provinces. Overall estimates were 64.8% (95% CI 63.6, 66.0) for sensitivity, 74.9% (95% CI 74.1, 75.6) for specificity, 54.5% (95% CI 53.7, 55.3) for PPV, and 82.1% (95% CI 81.6, 82.6) for NPV.

Table 4 Validity estimates (95% confidence intervals) for the cardiovascular mortality algorithm, overall and by site

The CPRD produced the highest site-specific estimates of overall sensitivity (87.3%; 95% CI 83.6, 90.3) and NPV (89.9%; 95% CI 86.9, 92.3). The Canadian province of Manitoba produced the lowest estimates of sensitivity (54.8%; 95% CI 49.9, 59.6) and NPV (58.3%; 95% CI 53.7, 63.0), but the highest estimate of PPV (72.8%; 95% CI 67.5, 77.5). The province of Quebec had the highest estimate of specificity (82.8%; 95% CI 81.5, 83.7). The lowest estimate of PPV was for the province of British Columbia (33.9%; 95% CI 31.4, 36.5).

Figure 2 provides validity estimates for the cardiovascular mortality algorithm stratified by location of death (i.e., in-hospital versus out-of-hospital). Overall sensitivity was 57.1% (95% CI 55.4, 58.8) for in-hospital deaths and 72.3% (95% CI 70.8, 73.9) for out-of-hospital deaths. Overall specificity was 88.8% (95% CI 88.1, 89.5) for in-hospital deaths and 58.5% (95% CI 57.3, 59.7) for out-of-hospital deaths. Overall PPV was 68.4% (95% CI 66.9, 69.9) for in-hospital deaths and 47.1% (95% CI 46.2, 48.0) for out-of-hospital deaths. Overall NPV was similar in both locations (83.0% for in-hospital, 95% CI 82.4, 83.5; 80.5% for out-of-hospital, 95% CI 79.6, 81.5). Sensitivity was higher for out-of-hospital deaths than for in-hospital deaths in all sites with the exception of Quebec and the CPRD. Specificity and PPV were higher for all sites for in-hospital deaths, with the exception of the CPRD.

Fig. 2
figure2

Validity estimates (%) for the cardiovascular mortality algorithm, by location of death. Legend: Error bars = 95% confidence intervals, All = all sites, Can = all Canadian sites, PPV = positive predictive value, NPV = negative predictive value, AB = Alberta, BC = British Columbia, MB = Manitoba, ON = Ontario, QC = Quebec, CPRD = UK Clinical Practice Research Datalink

Validity estimates were stratified by sex and age group, respectively (see Additional File 1). Sensitivity estimates were similar at all sites for males and females and for younger and older age groups. Specificity estimates were similar at all sites, except for Ontario where the estimates were lower for males than females and for older than younger cohort members. The same was true for PPV, which was lower for older than younger age groups in Ontario. The PPV estimate for the Canadian province of Alberta was lower for females than males. Estimates of NPV were similar across all sites.

Discussion

In this study, we applied an algorithm to administrative health records to identify cardiovascular deaths. We assessed the validity of this algorithm using vital statistics registrations, which contain information about the underlying cause of death. The study was conducted using data from five Canadian provinces and the UK. Overall validity estimates were modest, suggesting that the algorithm had moderate to low validity for identifying cardiovascular deaths. However, there was substantial variability across study sites.

The cardiovascular algorithm resulted in slightly less than one-half of the cardiovascular deaths being identified as out-of-hospital deaths; a US study found about one-third of cardiovascular deaths occurred out of hospital [24], although these results were based only on ischemic heart disease deaths and were for an earlier time period (1979 to 1989) than our study observation period; a Swedish study reported an increasing rate of out-of-hospital cardiovascular deaths between 1991 and 2006 [22]. Not unexpectedly, the algorithm had greater specificity and PPV but lower sensitivity for in-hospital deaths than for out-of-hospital death for most sites due to the challenges of identifying the specific cause for out-of-hospital deaths.

Variation in the validation results are consistent with the results of a previous multi-database study conducted by CNODES that showed substantial variation across Canadian provinces in the association of medication exposure with health outcomes [26]; these variations were attributed to differences in the data, including diagnostic coding practices. While Canada has a universal healthcare system, the responsibility for delivery of services exists with the individual provinces and territories. A consequence is that administrative health records are not captured in a standardized way in all jurisdictions with the exception of hospitalization data, which are standardized in all provinces except Quebec. As well, the training and skills of coders across jurisdictions is unlikely to be the same, because there are no national standards for this training. Examination of our site-specific validation results revealed that the CPRD data from the UK had the highest overall sensitivity and NPV. This finding might be attributed to differences in the data (i.e., primary care electronic medical records versus physician billing claims), coding practices, and/or differences in healthcare use (e.g., likelihood of hospitalization) between the UK and Canada.

In addition to conducting this validation study, we compared the risk estimates obtained using the cardiovascular mortality algorithm and the risk estimates obtained using cardiovascular deaths from vital statistics registrations in a real-world study about SGLT2 inhibitors compared to DPP4 inhibitors [14]. A composite endpoint of major adverse cardiovascular events (MACE) was constructed, which included myocardial infarction, ischaemic stroke, and cardiovascular death. When the composite endpoint used the cardiovascular mortality algorithm to identify cardiovascular deaths, a hazards ratio (HR) of 0.76 (95% CI 0.69, 0.84) was produced for SGLT2 inhibitors compared to DPP4 inhibitors (number of events: 2146 for SGLT2 inhibitors; 3001 for DPP4 inhibitors). When the composite endpoint used vital statistics registrations to identify cardiovascular deaths, the HR was similar (0.78; 95% CI 0.63, 0.97) for SGLT2 inhibitors compared to DPP4 inhibitors (number of events: 920 for SGLT2 inhibitors; 1257 for DPP4 inhibitors).

A major strength of this study is the assessment of validity of the cardiovascular mortality algorithm across multiple sites, including both Canadian and UK sites with different healthcare systems and healthcare use. Within Canada, the vast majority of validation studies for administrative health data algorithms have only been conducted in a single site [27], which limits their potential generalizability. Another strength is that we examined validity of an algorithm for a commonly-used endpoint in drug safety studies. Finally, we produced site-specific estimates of sensitivity, specificity, PPV, and NPV so that the magnitude of potential misclassification bias can be assessed at the site level.

This study is not without limitations. First, we acknowledge that vital statistics registrations may not be error free. Statistics Canada notes that the last comprehensive investigation of errors in vital statistics registrations occurred in the 1980s, although some province-specific data quality assessments have since been conducted [3]. Errors in the cause of death recorded in the vital statistics registrations, which could result in bias and loss of precision in the validity estimates, may arise because of differences of interpretation amongst coders about the information contained on a death certificate [28]. One US study found that for coronary heart disease deaths, death certificates had sensitivity of 84%, PPV of 67%, specificity of 84%, and NPV of 93% when a physician panel assessment of cause of death was adopted as the reference standard [29]. A multi-site US study of coronary heart disease deaths in death certificates reported PPV of 67% and sensitivity of 81% when physician review of cause of death was used as the reference standard; there was substantial variation across sites in these estimates, as well as for in-hospital versus out-of-hospital deaths [30]. The authors of this study also noted the challenges associated with classifying a death as a coronary heart disease death versus a non-coronary heart disease death using diagnosis codes. As well, we acknowledge that the results of this study may not generalize to the population of each jurisdiction because the original study cohort was limited to individuals receiving selected antidiabetic medications and the majority (i.e., greater than 75%) were at least 70 years of age. A recent review paper reported that the cardiovascular death rate amongst individuals with diabetes was approximately 4.5 times greater than amongst individuals without diabetes of the same age, without considering other cardiovascular risk factors [31]. Our estimates of PPV and NPV may not generalize because they are influenced by prevalence of cardiovascular disease in the population; as prevalence increases, PPV will also increase but NPV will decrease [32]. Older populations under treatment for diabetes have more underlying comorbid conditions and therefore are a more challenging group in which to identify the underlying cause of death than the general population [28], which could result in misclassification of cause of death.

Future research could validate the proposed cardiovascular mortality algorithm in a general population as opposed to a treatment-specific population. As well, a model-based approach could be explored as an alternative approach to develop an algorithm for cardiovascular mortality. Machine-learning models that take account of multiple characteristics of the individual, including their history of comorbid conditions (e.g., hypertension, prior coronary artery disease) and relevant medications may result in increased accuracy. This finding of increased accuracy has been observed for cardiovascular disease risk predictions from machine-learning algorithms when compared to risk predictions based on conventional statistical models [33].

Conclusions

Cardiovascular diseases are a major cause of death worldwide. A cardiovascular mortality algorithm based on routinely-collected administrative health records is therefore potentially valuable for many population-based studies, including those about comparative effectiveness of new healthcare interventions or treatments, such as new prescription medications. This study found only modest overall validity of the cardiovascular mortality algorithm when compared with vital statistics registrations, but substantial variation in validity estimates across sites. This variation suggests there are opportunities for methodological studies to address the bias associated with using a cardiovascular mortality algorithm derived from administrative health records.

Availability of data and materials

The data that support the findings of this study are not publicly available, in accordance with site-specific privacy restrictions. The data that support the findings of this study are available, with submission of appropriate ethics and data access approvals, from Alberta Health, the British Columbia Ministry of Health, the Manitoba Centre for Health Policy, the Institute for Clinical Evaluative Sciences (ICES), the Institut national d’excellence en santé et en services sociaux (INESSS), and the Independent Scientific Advisory Committee (ISAC) of the CPRD .

Abbreviations

CI:

Confidence interval

CNODES:

Canadian Network for Observational Drug Effect Studies

CPRD:

Clinical Practice Research Datalink

DPP-4:

Dipeptidyl peptidase-4

ED:

Emergency department

HES:

Hospital episode statistics

HR:

Hazard ratio

ICD:

International Statistical Classification of Diseases and Related Problems

MACE:

Major adverse cardiovascular events

NPV:

Negative predictive value

ONS:

Office of National Statistics

PPV:

Positive predictive value

SGLT2:

Sodium-glucose cotransporter-2

UK:

United Kingdom

References

  1. 1.

    Blessberger H, Lewis SR, Pritchard MW, Fawcett LJ, Domanovits H, Schlager O, et al. Perioperative beta-blockers for preventing surgery-related mortality and morbidity in adults undergoing cardiac surgery. Cochrane Database Syst Rev. 2019;9(9):Cd013435.

    PubMed  Google Scholar 

  2. 2.

    Fei Y, Tsoi MF, Cheung BMY. Cardiovascular outcomes in trials of new antidiabetic drug classes: a network meta-analysis. Cardiovasc Diabetol. 2019;18(1):112. https://doi.org/10.1186/s12933-019-0916-z.

    Article  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Canada S. Canadian vital statistics: death database (CVSD). Ottawa: Statistics Canada; 2020. [Available from: https://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&amp;SDDS=3233]

    Google Scholar 

  4. 4.

    Phillips DE, Lozano R, Naghavi M, Atkinson C, Gonzalez-Medina D, Mikkelsen L, et al. A composite metric for assessing data on mortality and causes of death: the vital statistics performance index. Popul Health Metrics. 2014;12(1):14. https://doi.org/10.1186/1478-7954-12-14.

    Article  Google Scholar 

  5. 5.

    Chiu M, Lebenbaum M, Lam K, Chong N, Azimaee M, Iron K, et al. Describing the linkages of the immigration, refugees and citizenship Canada permanent resident data and vital statistics death registry to Ontario's administrative health database. BMC Med Inform Decis Mak. 2016;16(1):135. https://doi.org/10.1186/s12911-016-0375-3.

    Article  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Moorin RE, Holman CD. The cost of in-patient care in Western Australia in the last years of life: a population-based data linkage study. Health Policy. 2008;85(3):380–90. https://doi.org/10.1016/j.healthpol.2007.08.003.

    Article  PubMed  Google Scholar 

  7. 7.

    Mähönen M, Salomaa V, Keskimäki I, Moltchanov V. The feasibility of routine mortality and morbidity register data linkage to study the occurrence of acute coronary heart disease events in Finland. The Finnish cardiovascular diseases registers (CVDR) project. Eur J Epidemiol. 2000;16(8):701–11. https://doi.org/10.1023/A:1026599805969.

    Article  PubMed  Google Scholar 

  8. 8.

    Mähönen M, Jula A, Harald K, Antikainen R, Tuomilehto J, Zeller T, et al. The validity of heart failure diagnoses obtained from administrative registers. Eur J Prev Cardiol. 2013;20(2):254–9. https://doi.org/10.1177/2047487312438979.

    Article  PubMed  Google Scholar 

  9. 9.

    Paprica PA, de Melo MN, Schull MJ. Social licence and the general public's attitudes toward research based on linked administrative health data: a qualitative study. CMAJ Open. 2019;7(1):E40–e6. https://doi.org/10.9778/cmajo.20180099.

    Article  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Rampatige R, Mikkelsen L, Hernandez B, Riley I, Lopez AD. Systematic review of statistics on causes of deaths in hospitals: strengthening the evidence for policy-makers. Bull World Health Organ. 2014;92(11):807–16. https://doi.org/10.2471/BLT.14.137935.

    Article  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Prada-Ramallal G, Takkouche B, Figueiras A. Bias in pharmacoepidemiologic studies using secondary health care databases: a scoping review. BMC Med Res Methodol. 2019;19(1):53. https://doi.org/10.1186/s12874-019-0695-y.

    Article  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Suissa S, Henry D, Caetano P, Dormuth CR, Ernst P, Hemmelgarn B, et al. CNODES: the Canadian network for observational drug effect studies. Open Med. 2012;6(4):e134–40.

    PubMed  PubMed Central  Google Scholar 

  13. 13.

    Douros A, Lix LM, Fralick M, Dell'Aniello S, Shah BR, Ronksley PE, et al. Sodium-glucose cotransporter-2 inhibitors and the risk for diabetic ketoacidosis: a multicenter cohort study. Ann Intern Med. 2020;173(6):417–25. https://doi.org/10.7326/M20-0289.

    Article  PubMed  Google Scholar 

  14. 14.

    Filion KB, Lix LM, Yu OH, Dell'Aniello S, Douros A, Shah BR, et al. Sodium glucose cotransporter 2 inhibitors and risk of major adverse cardiovascular events: multi-database retrospective cohort study. BMJ. 2020;370:m3342.

    Article  Google Scholar 

  15. 15.

    Yu OHY, Dell'Aniello S, Shah BR, Brunetti VC, Daigle JM, Fralick M, et al. Sodium-glucose cotransporter 2 inhibitors and the risk of below-knee amputation: a multicenter observational study. Diabetes Care. 2020;43(10):2444–52. https://doi.org/10.2337/dc20-0267.

    CAS  Article  PubMed  Google Scholar 

  16. 16.

    Fisher A, Fralick M, Filion KB, Dell'Aniello S, Douros A, Tremblay É, et al. Sodium-glucose co-transporter-2 inhibitors and the risk of urosepsis: a multi-site, prevalent new-user cohort study. Diabetes Obes Metab. 2020;22(9):1648–58. https://doi.org/10.1111/dom.14082.

    CAS  Article  PubMed  Google Scholar 

  17. 17.

    Lix LM, Walker R, Quan H, Nesdole R, Yang J, Chen G. Features of physician services databases in Canada. Chronic Dis Inj Can. 2012;32(4):186–93. https://doi.org/10.24095/hpcdp.32.4.02.

    CAS  Article  PubMed  Google Scholar 

  18. 18.

    Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data resource profile: clinical practice research datalink (CPRD). Int J Epidemiol. 2015;44(3):827–36. https://doi.org/10.1093/ije/dyv098.

    Article  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Herrett E, Thomas SL, Schoonen WM, Smeeth L, Hall AJ. Validation and validity of diagnoses in the general practice research database: a systematic review. Br J Clin Pharmacol. 2010;69(1):4–14. https://doi.org/10.1111/j.1365-2125.2009.03537.x.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Jick SS, Kaye JA, Vasilakis-Scaramozza C, Garcia Rodríguez LA, Ruigómez A, Meier CR, et al. Validity of the general practice research database. Pharmacotherapy. 2003;23(5):686–9. https://doi.org/10.1592/phco.23.5.686.32205.

    Article  PubMed  Google Scholar 

  21. 21.

    Khan NF, Harrison SE, Rose PW. Validity of diagnostic coding within the general practice research database: a systematic review. Br J Gen Pract. 2010;60(572):e128–36. https://doi.org/10.3399/bjgp10X483562.

    Article  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Dudas K, Lappas G, Stewart S, Rosengren A. Trends in out-of-hospital deaths due to coronary heart disease in Sweden (1991 to 2006). Circulation. 2011;123(1):46–52. https://doi.org/10.1161/CIRCULATIONAHA.110.964999.

    Article  PubMed  Google Scholar 

  23. 23.

    Levitan EB, Tanner RM, Zhao H, Muntner P, Thacker EL, Howard G, et al. Secular changes in rates of coronary heart disease, fatal coronary heart disease, and out-of-hospital fatal coronary heart disease. Int J Cardiol. 2014;174(2):436–9. https://doi.org/10.1016/j.ijcard.2014.04.027.

    Article  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Sorlie PD, Coady S, Lin C, Arias E. Factors associated with out-of-hospital coronary heart disease death: the national longitudinal mortality study. Ann Epidemiol. 2004;14(7):447–52. https://doi.org/10.1016/j.annepidem.2003.10.002.

    Article  PubMed  Google Scholar 

  25. 25.

    McCormick N, Lacaille D, Bhole V, Avina-Zubieta JA. Validity of myocardial infarction diagnoses in administrative databases: a systematic review. PLoS One. 2014;9(3):e92286. https://doi.org/10.1371/journal.pone.0092286.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Doyle CM, Lix LM, Hemmelgarn BR, Paterson JM, Renoux C. Data variability across Canadian administrative health databases: differences in content, coding, and completeness. Pharmacoepidemiol Drug Saf. 2020;29(Suppl 1):68–77. https://doi.org/10.1002/pds.4889.

    Article  PubMed  Google Scholar 

  27. 27.

    Hinds A, Lix LM, Smith M, Quan H, Sanmartin C. Quality of administrative health databases in Canada: a scoping review. Can J Public Health. 2016;107(1):e56–61. https://doi.org/10.17269/cjph.107.5244.

    Article  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Lu TH, Lee MC, Chou MC. Accuracy of cause-of-death coding in Taiwan: types of miscoding and effects on mortality statistics. Int J Epidemiol. 2000;29(2):336–43. https://doi.org/10.1093/ije/29.2.336.

    CAS  Article  PubMed  Google Scholar 

  29. 29.

    Lloyd-Jones DM, Martin DO, Larson MG, Levy D. Accuracy of death certificates for coding coronary heart disease as the cause of death. Ann Intern Med. 1998;129(12):1020–6. https://doi.org/10.7326/0003-4819-129-12-199812150-00005.

    CAS  Article  PubMed  Google Scholar 

  30. 30.

    Coady SA, Sorlie PD, Cooper LS, Folsom AR, Rosamond WD, Conwill DE. Validation of death certificate diagnosis for coronary heart disease: the atherosclerosis risk in communities (ARIC) study. J Clin Epidemiol. 2001;54(1):40–50. https://doi.org/10.1016/S0895-4356(00)00272-9.

    CAS  Article  PubMed  Google Scholar 

  31. 31.

    Glovaci D, Fan W, Wong ND. Epidemiology of diabetes mellitus and cardiovascular disease. Curr Cardiol Rep. 2019;21(4):21. https://doi.org/10.1007/s11886-019-1107-y.

    Article  PubMed  Google Scholar 

  32. 32.

    Irwig L, Bossuyt P, Glasziou P, Gatsonis C, Lijmer J. Designing studies to ensure that estimates of test accuracy are transferable. BMJ. 2002;324(7338):669–71. https://doi.org/10.1136/bmj.324.7338.669.

    Article  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Krittanawong C, Virk HUH, Bangalore S, Wang Z, Johnson KW, Pinotti R, et al. Machine learning prediction in cardiovascular diseases: a meta-analysis. Sci Rep. 2020;10(1):16057. https://doi.org/10.1038/s41598-020-72685-1.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This study was made possible through data sharing agreements between the CNODES member research centres and the respective provincial governments of Alberta, British Columbia, Manitoba (HIPC # 2018/2019-58), Ontario, and Quebec. This study was approved by the Independent Scientific Advisory Committee (ISAC; protocol # 19_007A2) of the CPRD; the approved protocol was made available to journal reviewers. The BC Ministry of Health approved access to and use of BC data for this study. Data sources were as follows (https://www2.gov.bc.ca/gov/content/health/conducting-health-research-evaluation/data-access-health-data-central): British Columbia Ministry of Health [creator] (2018): Medical Services Plan (MSP) Payment Information File. BC Ministry of Health [publisher]. MOH (2018); British Columbia Ministry of Health [creator] (2018): Consolidation File (MSP Registration & Premium Billing). BC Ministry of Health [publisher]. MOH (2018); British Columbia Ministry of Health [creator] (2018): PharmaNet. BC Ministry of Health [publisher]. Data Stewardship Committee (2018); and Canadian Institute for Health Information [creator] (2018): Discharge Abstract Database (Hospital Separations). BC Ministry of Health [publisher]. MOH (2018). BC Ministry of Health [publisher]. MOH (2018); BC Vital Statistics Agency [creator] (2018): Vital Statistics Deaths. V2. BC Ministry of Health [publisher].). Parts of this material are based on data and information compiled and provided by the Ontario Ministry of Health and Long-Term Care (MOHLTC). This study was supported by ICES, which is funded by an annual grant from the MOHLTC. Parts of this material are based on data and/or information compiled and provided by the Canadian Institute for Health information (CIHI). The opinions, results, and conclusions reported in this paper are those of the authors. No endorsement by the provinces, data stewards, ICES, CIHI, or the Institut national d’excellence en santé et en services sociaux is intended or should be inferred.

We thank Ms. Corine Mizrahi at the CNODES Coordinating Center for her important contributions to this work. We also acknowledge the programming and analytical support of the analysts at each site: Greg Carney PhD and Jason Kim MPH (British Columbia), Zhihai Ma MSc and Jianguo Zhang MSc (Alberta), Matthew Dahl BSc (Manitoba), C. Fangyun Wu MSc MA (Ontario), and Hui Yin MSc and Christopher Filliter MSc (CPRD). We also thank Colin Dormuth ScD, Eric Tremblay MSc, Hala Tamim PhD, and Vanessa Brunetti MSc for their contributions to this study.

Dr. Lix is supported by a Tier 1 Canada Research Chair. Dr. Filion is supported by a Senior salary support award from the Fonds de recherche du Québec – santé (FRQS; Quebec Foundation for Research – Health) and a William Dawson Scholar award from McGill University. Dr. Yu receives salary support from the FRQS. Dr. Douros is supported by a Junior 1 salary support award from the FRQS.

We acknowledge the support of CNODES Investigators: Samy Suissa (Principal Investigator); Colin R. Dormuth (British Columbia); Brenda R. Hemmelgarn (Alberta); Jacqueline Quail (Saskatchewan); Dan Chateau (Manitoba); J. Michael Paterson (Ontario); Jacques LeLorier (Québec); Adrian R. Levy (Atlantic: Nova Scotia, Newfoundland and Labrador, New Brunswick, Prince Edward Island); Pierre Ernst and Kristian B. Filion (UK Clinical Practice Research Datalink (CPRD)); Lisa M. Lix (Database Development Team); Robert W. Platt (Methods Team); and Ingrid S. Sketris (Knowledge Translation Team).

Funding

The Canadian Network for Observational Drug Effect Studies (CNODES), a collaborating centre of the Drug Safety and Effectiveness Network (DSEN), is funded by the Canadian Institutes of Health Research (CIHR; Grant # DSE-146021). The funders had no role in the design of the study, analysis, and interpretation of data and in writing of the manuscript.

Author information

Affiliations

Authors

Contributions

LML, SS, ASJ, JMD, AF, OHYY, SD, NH, SCB, BRS, PER, SAS, AD, PE, and KBF were involved in data extraction, study design, interpretation of results, and critically reviewed the manuscript for important intellectual content. LML and SS drafted the manuscript. All authors approved the final version of the manuscript.

Corresponding author

Correspondence to Lisa M. Lix.

Ethics declarations

Ethics approval and consent to participate

Approvals were provided by the following ethics boards: the Conjoint Health Research Ethics Board at the University of Calgary, the Clinical Research Ethics Board at the University of British Columbia, the Health Research Ethics Board at the University of Manitoba, and the Medical/Biomedical Research Ethics Committee of the CIUSSS West-Central Montreal (the latter was for Quebec and CPRD data). Study cohort members were not required to provide informed consent for participation in this study because this study involved a retrospective review of electronic healthcare records; it was impracticable to obtain consent, as approved by the research ethics boards and/or data access committees at each participating site. Specifically, informed consent to participate was waived by the following ethics boards: the Conjoint Health Research Ethics Board at the University of Calgary, the Clinical Research Ethics Board at the University of British Columbia, the Health Research Ethics Board at the University of Manitoba, and the Medical/Biomedical Research Ethics Committee of the CIUSSS West-Central Montreal (the latter was for Quebec and CPRD data). All study protocols were carried out in accordance with relevant guidelines and regulations at each participating site.

Consent for publication

Not applicable.

Competing interests

Dr. Alessi-Severini received research grants from Pfizer and Merck for studies not involving SGLT2 inhibitors or DPP-4 inhibitors. The remaining authors have no relevant conflicts of interest to disclose.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Validity estimates for a cardiovascular mortality algorithm applied to administrative health records stratified by sex and age group. This file contains estimates of sensitivity, specificity, positive predictive value, and negative predictive value stratified by sex and age group (< 70 years; ≥70 years).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lix, L.M., Sobhan, S., St-Jean, A. et al. Validity of an algorithm to identify cardiovascular deaths from administrative health records: a multi-database population-based cohort study. BMC Health Serv Res 21, 758 (2021). https://doi.org/10.1186/s12913-021-06762-0

Download citation

Keywords

  • Accuracy
  • Cause-specific mortality
  • Death certificates
  • Hospital records
  • Physician claims
  • Validation