Skip to main content

Evaluating the coding accuracy of type 2 diabetes mellitus among patients with non-alcoholic fatty liver disease



Non-alcoholic fatty liver disease (NAFLD) describes a spectrum of chronic fattening of liver that can lead to fibrosis and cirrhosis. Diabetes has been identified as a major comorbidity that contributes to NAFLD progression. Health systems around the world make use of administrative data to conduct population-based prevalence studies. To that end, we sought to assess the accuracy of diabetes International Classification of Diseases (ICD) coding in administrative databases among a cohort of confirmed NAFLD patients in Calgary, Alberta, Canada.


The Calgary NAFLD Pathway Database was linked to the following databases: Physician Claims, Discharge Abstract Database, National Ambulatory Care Reporting System, Pharmaceutical Information Network database, Laboratory, and Electronic Medical Records. Hemoglobin A1c and diabetes medication details were used to classify diabetes groups into absent, prediabetes, meeting glycemic targets, and not meeting glycemic targets. The performance of ICD codes among these groups was compared to this standard. Within each group, the total numbers of true positives, false positives, false negatives, and true negatives were calculated. Descriptive statistics and bivariate analysis were conducted on identified covariates, including demographics and types of interacted physicians.


A total of 12,012 NAFLD patients were registered through the Calgary NAFLD Pathway Database and 100% were successfully linked to the administrative databases. Overall, diabetes coding showed a sensitivity of 0.81 and a positive predictive value of 0.87. False negative rates in the absent and not meeting glycemic control groups were 4.5% and 6.4%, respectively, whereas the meeting glycemic control group had a 42.2% coding error. Visits to primary and outpatient services were associated with most encounters.


Diabetes ICD coding in administrative databases can accurately detect true diabetic cases. However, patients with diabetes who meets glycemic control targets are less likely to be coded in administrative databases. A detailed understanding of the clinical context will require additional data linkage from primary care settings.

Peer Review reports


Health systems routinely use digital databases to store and code health information. The International Classification of Diseases (ICD) was developed by the World Health Organisation and is used to translate extensive details from electronic medical records (EMR) into standardised codes. ICD codes have been utilized for decades, with ICD code-driven algorithms being routinely employed for identifying chronic conditions, such as the Charlson comorbidity index [1] and the Elixhauser index [2].

Much like healthcare systems worldwide, Canada has multiple administrative health databases that are widely employed in health research. These databases, underpinned by ICD coding, encompass the Discharge Abstract Database (DAD) which contains inpatient data, National Ambulatory Care Reporting System (NACRS) which collects outpatient and emergency department visit details, and Physician Claims which collects details in inpatient and outpatient (e.g., primary care) settings.

ICD codes serve as a common tool for chronic disease and comorbidity surveillance in the populations of both Canada and various other countries [3, 4]. In Canada, national agencies like the Canadian Institute for Health Information have issued directives for coding specifics conditions inclusive of diabetes, leading to the establishment of the National Diabetes Surveillance System [5]. Notably, Type 2 diabetes is strongly associated with non-alcoholic fatty liver disease (NAFLD) [6, 7] and is considered requiring close monitoring for NAFLD [7].

NAFLD, the most common liver disease worldwide [6], is a progressive disease that advances from a non-alcoholic fatty liver to non-alcoholic steatohepatitis (NASH) to NASH with fibrosis [8, 9]. This progression can eventually lead to end stage liver disease or hepatocellular carcinoma [8, 9]. Accurate identification of comorbidity information, such as diabetes, in electronic databases is crucial in this patient population to ensure timely intervention. In Calgary, Canada, a prospective cohort of NAFLD patients from primary care settings has been evaluated for liver fibrosis. Primary Care Providers (PCP) in Calgary are equipped to promptly assess NAFLD patients without a referral to tertiary care [10]. They are also well-informed about the association between diabetes mellitus and NAFLD, and that it is a criterion for NAFLD evaluation (or assessment) [11]. Several studies to date [12, 13] have assessed the accuracy of ICD codes for diabetes diagnoses, but these were related to the general population. Designing a surveillance program by integrating laboratory data and administrative data could inform PCPs on comorbidities such as diabetes for NAFLD, but validation of the diabetes-related ICD codes in a NAFLD population is required.

To that end, we designed this study with the focus on detecting and reporting diabetes in patients with NAFLD. Our primary objective was to assess the accuracy of diabetes ICD coding in administrative databases among a cohort of confirmed NAFLD patients. Our secondary objectives were to assess inpatient EMR data, visit data, geographical data, and BMI, and to assess how they could be used to peripherally confirm the accuracy of diabetes codes.


Cohort selection: Calgary NAFLD population and data linkage

The Calgary NAFLD Pathway Database (CNPD) was established in 2016 to identify primary care patients with incident NAFLD in the Calgary metropolitan area [10]. NAFLD suspected patients with initial abnormal alanine aminotransferase levels, diabetes mellitus or metabolic syndrome undergo stepped clinical protocols (Additional File 9). Patients’ medications and lifestyle are reviewed by physicians while laboratory tests are initiated to rule out other causes of liver diseases. Only patients formally diagnosed with NAFLD are recorded in the CNPD database. CNPD collects and records demographics and administrative details at the time of shear wave elastography (SWE) testing, and SWE diagnosis information [10]. SWE is a non-invasive imaging technique employed by clinicians to diagnose liver tissue stiffness and identify NAFLD stages [14]. Patients enter CNPD at differing stages of NAFLD based on initial clinician assessment. There were approximately 12,012 patients enrolled in this database at the end of the CNPD study (April 2022). SWE results contained in CNPD were validated and confirmed NAFLD status and stage.

We deterministically linked the CNPD cohort to the following administrative health databases and inpatient EMR using a previously established process [15]: physician claims, NACRS, DAD, pharmaceutical information network (PIN), laboratory database, and Sunrise Clinical Manager (SCM) EMR. Alberta has a unique lifetime identifier known as the Personal Health Number (PHN) which can be used to trace the healthcare utilization of individuals. Inpatient administrative databases have assigned codes which points to EMR encounter records. PHN, dates of visits, and these access codes were used to access and pull required sub-tables. Data from the five-year period prior to SWE and NAFLD diagnosis were linked and extracted from these databases. These databases are under the jurisdiction of Alberta Health and Alberta Health Services. Brief descriptions of these databases are provided below.

  1. 1.

    Physician claims: collects all physician-submitted ICD-9 billing codes from outpatient and inpatient care.

  2. 2.

    DAD: collects all ICD-10-CA codes from inpatient care.

  3. 3.

    NACRS: collects all emergency and outpatient ICD-10-CA codes.

  4. 4.

    PIN: collects all pharmacy dispensed medications details in community settings.

  5. 5.

    Lab: collects all laboratory test results from outpatient and inpatient care.

  6. 6.

    SCM EMR data: inpatient EMR records. Specifically, information tables on intake, discharge, and laboratory data were extracted.

Other data such as visits, geographical data, and BMI were extracted and presented in our work for future comparisons of our cohort to other cohorts in future studies.

Defining outcome and predictor/feature variables

Our outcome of interest was diabetes coding within the NAFLD population. We defined diabetes categories following the Diabetes Canada Clinical Practice Guidelines [16] by using laboratory hemoglobin A1c [17,18,19,20] and supplemented this phenotyping algorithm with diabetes medication data (Additional file 1). It should be noted that different jurisdictions may have different laboratory thresholds for defining diabetes. Specifically, absence of diabetes was defined as the highest HbA1c laboratory result below 6.1% [18, 19] with no evidence of prescribed and fulfilled medications. Prediabetes was defined as HbA1c between 6.1 and 6.4% or an oral glucose tolerance test or random plasma glucose test or fasting plasma glucose test exceeding the thresholds listed in the Diabetes Canada Guidelines [16]. Diabetes category of meeting glycemic target was defined as (a) HbA1c between 6.4 and 7.0%, if no evidence of medication, and (b) HbA1c values < 7.0%, with evidence of prescribed and fulfilled medications. Diabetes category of not meeting glycemic target was defined as the highest HbA1c laboratory result above 7.0% [20]. The presence of fast plasma glucose ≥ 7.0 mmol/L or 2-hour plasma glucose in a 75 g oral glucose tolerance test ≥ 11.1 mmol/L or Random plasma glucose ≥ 11.1 mmol/L gave indication of diabetes in addition to HbA1c values. Intensified therapies such as (a) GLP1RA if obese or having cerebrovascular disease or stroke, and (b) SGLT2 if chronic kidney disease or albuminuria or cerebrovascular disease, were included as a part of the algorithm. This was achieved by applying Quan’s [21] ICD algorithm on cerebrovascular disease, stroke, chronic kidney disease, and cerebrovascular disease to define the sub-cohorts, and then checked for the presence of those medications.

Anatomical therapeutic classification (ATC) and drug identification numbers (DIN) were extracted from PIN to identify diabetes medications. These medication groups included insulin, oral hypoglycemic drugs, biguanides, Glucagon like peptide-1 receptor agonists, dipeptidyl peptidase-4 inhibitors, sulfonylureas, and thiazolidinediones, and sodium-glucose transporter-2 inhibitors (Additional File 1). Dates were checked to precede NAFLD diagnosis date. The list of diabetes medications was developed and assessed by physician authors of this study. Specific categorical variables were created for patients meeting HbA1c values but not receiving medications for later analytical steps. The list was validated against Canada’s drug product database [22] based on the active ingredients and their activity status was confirmed.

The presence of diabetes ICD codes, as defined by Quan et al. [21] utilizing the standard of 2 outpatient physician claims or 1 hospital discharge diagnosis [23], determined whether a patient had diabetes, based on ICD-10 codes E10.0 to 14.7 [21, 23]. We also introduced a basic algorithm requiring either one physician claim or one hospital discharge diagnosis to enhance the verification of our findings. We abstracted from the physician claims database the number of visits to inpatient and outpatient care providers by each patient within five years prior to NAFLD assessment. Five years was considered clinically sound taking into the account the conditions onset [24] which typically takes 3–7 years to fully manifest. This timeframe also allowed for the identification of diabetes using a well-established validation algorithm (2 physician claims or 1 hospitalization within a 2-year period). Geographical data from DAD and physician claims were used to define rural/urban status of patients. Continuous body mass index (BMI) was calculated from weight in kg/height in m2 data available from the CNPD database. Hospitalization record details (intake, progress of care, and discharge status) were extracted from SCM EMR which validated records in administrative databases. Sex was coded as male or female. Postal code from physician claims and DAD/NACRS were converted to identify geographic location (urban/rural status). We determined continuous age at the time of registry entry by subtracting the date of birth from the recorded NAFLD confirmation date. Physician claims data contained the type of physician responsible for billing and their practice settings (community, emergency, inpatient, and diagnostic settings). Laboratory data from inpatient EMR was also evaluated and compared against the laboratory database for data completeness. PIN data contained ATC and DIN codes for all fulfilled community dispensed medications. We used Additional file 1 to identify patients who received these drugs and created a variable representing the treatment status.


Descriptive statistics were calculated for the four diabetes cohorts. Demographic and other basic patient characteristics such as age, sex, and Charlson comorbidities were reported. The total numbers of visits with distinct types of physicians were calculated. The DIN codes of the medications listed in Additional file 1 were compared against the PIN database to assess whether patients were being treated for diabetes.

The presence of ICD codes for diabetes was compared against the reference diabetes severity established above. Performance measures, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated. We assessed the accuracy of diabetes codes within various categories, including medication (treated), medication (untreated), and two sub-cohorts: oHbA1c between 6.4 and 7.0%, and HbA1c above 7.0%.

The total numbers of true positives, true negatives, false positives, and false negatives were identified for these four diabetes groups. These categories were compared using appropriate statistical techniques, such as the t-test and chi-square test for respective data types. A p-value cut-off point of 0.05 was used for bivariate analysis. Non-parametric tests such as the Mann Whitney U test were used in cases where data was not normally distributed. Additionally, we used a time-series analysis to assess for diabetes remission status within each cohort as defined by the Diabetes Canada Clinical Practice Guidelines [25]. These individuals may have had a HbA1C > 6.5% on one occasion but then their HbA1c dropped below this threshold and was maintained there without any antihyperglycemic agents. The latest interpretation closest to the NAFLD diagnosis date per each patient was reported.

The Conjoint Health Research Ethics Board at the University of Calgary approved this study (REB-20-1127). All methods were performed in accordance with the Declaration of Helsinki. Python version 3.1.1 (Python Software Foundation, and R [26] was used for data extraction, cleaning, and parts of the analyses. Appropriate R packages (e.g., rpy2) were imported into Python for statistical analyses.


Data linkage

The CNPD database recorded a total of 12,012 patients diagnosed with NAFLD. All patients were linked successfully to the administrative databases. Data linkage to SCM EMR data linked a total of 3,545 patients (29.5%) accounting for 8,425 admissions. These inpatient visits (n = 8,425) accounted for an exceedingly small proportion of the total 1.63 million healthcare visits. Table 1 provides the demographics and comorbidities of the patients with and without coded diabetes. Laboratory data retrieved from SCM was matched with inpatient laboratory records from the lab database, achieving a 100% match rate. Extracted information accounted for: 1.6 million records from PIN, 16.6 million records from Physician claims, 9 million records from laboratory data, and a total of 7,268 hospitalization records. This informed us empirically that NAFLD was a dominantly outpatient managed disease. The performance of the standard diabetes algorithm (2 outpatient claims or 1 hospital discharge code) and the minimal code (1 outpatient claim or 1 hospital discharge) prevalence did not differ.

The patients with coded diabetes were older than those with the absence of diabetes codes (mean 57.4 vs. 51.2). Both groups predominately resided in urban areas (92.5 and 93.3%. respectively) which reflects the cohort selection process of the CNPD database. Additionally, individuals with coded diabetes exhibited a higher prevalence of Charlson comorbidities in comparison to those without diabetes codes.

Table 1 Demographics and comorbidities in CNPD cohort with coded and non-coded diabetes within five years prior to NAFLD diagnosis

The performance of diabetes ICD codes, as defined by Quan et al. [21], was assessed and are shown in Table 2. Diabetes coding performance showed a sensitivity of 0.81 and a PPV of 0.87. Among patients who met glycemic control, a sensitivity of 0.58 and a PPV of 1.0 was found. The diabetes cohort not meeting glycemic control showed a sensitivity of 0.98 and a PPV of 1.0.

Table 2 Performance of diabetes codes against diabetes groups within five years to NAFLD diagnoses

Error rates within severity sub-cohorts

Among those with the absence of diabetes, 6,789 were true negative cases and 323 were false positive cases, representing a diabetes coding error rate of 4.5% over the 5-year period. Patients with HbA1c values above 7.0% had a total of 31 false negative cases and 1426 true positive cases, representing an error rate of 2.2% in the same period. The diabetic meeting glycemic control group (HbA1c between 6.4 and 7.0%) had a total of 736 false negatives and 1,008 true positives, resulting in a 42.2% coding error rate. Upon further investigations it was discovered that a total of 536 among 736 false negatives had received diabetic medications and met glycemic targets.

Tables 3 and 4 presents a comparison of comorbidities and healthcare utilization among patients who achieved glycemic targets (HbA1c group of between 6.4 and 7.0%). Specific comparisons for this HbA1c subgroup are shown in supplementary materials (Additional files 2 to 7). Notably, the number of emergency GP visits were statistically significant and ambulatory specialist visits approached the p-value threshold of 0.05 (p = 0.07). Slightly different visitation patterns were observed in the HbA1c greater than 7.0% groups. Among those with HbA1c greater than 7.0%, the false negative groups had fewer visits to community GPs (mean 57.5 vs. 70.8), and fewer to community specialists (mean 31.5 vs. 59.6) then true positives cases. Additional File 8 provides a detailed description of the diabetic remissions status, as outlined in the methods section. (Additional File 8).

Table 3 Demographic and comorbidities among patients with true positive and false negative among patients meeting glycemic targets (HbA1c group between 6.4 and 7.0%)
Table 4 Comparisons of number of healthcare providers seen among true positives, and false negatives among patients meeting glycemic targets (HbA1c group between 6.4 and 7.0%)

The remission status on diabetes closest to the NAFLD diagnosis date is reported in Additional File 8 and indicates that most individuals remained in their diabetic categories at the time of NAFLD diagnosis.


In this study, we examined the accuracy of diabetes severity coding in the NAFLD population by linking the NAFLD registry to multiple administrative and EMR databases. The primary aim was to identify predictive factors associated with error within the diabetes cohorts. In this study cohort it was observed that diabetes coding accuracy was not dependent on whether a patient received treatment with community-dispensed medications. The coding error among patients with clear indications of diabetes (HbA1c greater than 7.0%) was 6.4% (31/1,426), whereas among those without diabetes (HbA1c less than 6.1%) the error rate was 4.5% (323/7112) over a five-year period. In contrast, not meeting glycemic control group exhibited a considerably higher coding error rate of 42.2%.

In Canadian health systems, primary care physicians may submit up to three ICD codes as part of physician billing, which are compiled into claims databases [27]. Furthermore, physicians are only required to submit one code representing the commonly completed diagnoses during the patient encounter. Nearly all physician visits related to diabetes during the 5-year period took place in primary care settings, accounting for 99.9% of visits (1.629 million out of 1.630 million visits). However, it is noteworthy that 45.9% of the study cohort experienced at least one inpatient admission. Consequently, approximately half of the cohort had other primary medical conditions that were being managed and diabetes might have been considered as a comorbidity. It is hypothesized that the underreporting of diabetes codes in the glycemic control group may be due to this factor, contributing to the observed coding inaccuracies. The identification of 536 out of 736 false negatives within the cohort meeting glycemic control criteria, who also had documented prescriptions for diabetes medications and maintained a HbA1c control, further supports this observation. Despite linking to impatient EMRs and other administrative data in the 5-year period leading up to the NAFLD diagnosis, no specific details for the rationale for coding were obtained.

The list of ICD codes originally developed for defining diabetes was for identifying comorbidities for calculating the risk of mortality as part of the Charlson algorithm [1] undergone multiple revisions [21, 28]. These refined ICD code-based algorithms are used for syndromic public health surveillance of chronic conditions [3, 4, 29]. Primarily, diabetes codes are employed in prevalence studies to determine the presence of the disease within the population [30,31,32,33]. These prevalence studies play a pivotal role in informing health systems and guiding the planning for control strategies. Therefore, understanding coding errors is essential for evaluating and refining existing health programs and keeping health databases up to date.

It is worth noting that, from our current understanding, diabetes cohorts have not been adequately considered in existing literature when assessing ICD code accuracy. This study indicated that the cohorts at the extremes (i.e., the highest and lowest A1c groups) demonstrated relatively precise ICD coding accuracy. However, the diabetes cohort in the glycemic control group encountered challenges, likely stemming from the structure of the ICD code submission system and, possibly a lack of coding, as diabetes is often presumed to be a well-managed comorbidity. To address these issues, we propose a few solutions to mitigate this for the boundary group. Currently, the DAD allows the submission of up to twenty-five diagnostic codes for acute facility admission encounters, regardless of payment status [34]. Expanding the scope of physician claims beyond three codes may provide a more comprehensive understanding of patient profiles. However, this may not be easy to achieve given the complexities and barriers involved in processes for creating administrative data [35, 36]. Linkage to inpatient EMR confirmed the quality of extracted laboratory data and provided limited clinical context associated with lack of diabetes coding justification in this patient cohort. Connecting existing data systems with primary care EMRs and directly phenotyping diabetes from clinical notes may offer additional clinical context and contribute to enhancing the accuracy of ICD codes collected within administrative databases.

This study has several limitations. First, the claims database uses ICD-9 codes while DAD and NACRS use ICD-10 codes, and coding standards between the two could be different. Second, obtaining a comprehensive clinical context behind coding rationale can be challenging, as detailed data on patients, providers and context may not always be available. Third, our reference standard may not be perfect and there is a possibility that some diabetes cases were not phenotyped properly. Nevertheless, we followed clinical care practice guideline, and our observations align with clinical expectations. Lastly, the clinical and administrative data utilized in this study are specific to one city in a single Canadian province, and thus may not be generalizable to other settings. Additional external validation in diverse contexts is warranted.

Despite these limitations, this study offers a detailed assessment on coding accuracy for diabetes severity groups. Similar analyses could be conducted on other chronic conditions, contributing to the improvement of chronic disease surveillance programs. Furthermore, there is potential to enhance surveillance through ongoing research activities, including the incorporation of patient-reported outcomes and the artificial intelligence. The integration of self-reported diabetes data from patients [37, 38] into existing health system infrastructure, coupled with the development and deployment of self-reported tools via recommender systems [39], can complement the quality of administrative databases and address these limitations.


In summary, ICD codes demonstrate strong performance in identifying individuals without diabetes and those who do not meet glycemic control within the NAFLD population. However, the codes did not perform well for accurately identifying diabetes cases meeting glycemic control. Patients with false negative diabetes-related ICD-codes often exhibit evidence of glycemic control and receiving medications, highlighting the need for a more comprehensive clinical context, which may require additional data linkage from primary care settings. Our study provides insights on accuracy of diabetes coding among NAFLD population, and similar methodologies can be employed on to assess other chronic conditions.

Data availability

The data that support the findings of this study are available from Alberta Health Services and Alberta Health, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Alberta Health Services and Alberta Health.



Anatomical therapeutic classification


Calgary NAFLD pathway database


Discharge abstract database


Drug identification number


Electronic Medical Records


General practitioner


Hemoglobin A1c


International Classification of Diseases


National ambulatory care reporting system


Non-alcoholic fatty liver disease


Non-alcoholic steatohepatitis


Negative predictive value


Primary care providers


Pharmaceutical information network


Positive predictive value


Allscripts sunrise clinical manager


Shear wave elastography


  1. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40(5):373–83.

    Article  CAS  PubMed  Google Scholar 

  2. Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity measures for use with administrative data. Med Care. 1998;36(1):8–27.

    Article  CAS  PubMed  Google Scholar 

  3. Birtwhistle R, Keshavjee K, Lambert-Lanning A, Godwin M, Greiver M, Manca D, et al. Building a Pan-Canadian primary care sentinel surveillance network: initial development and moving forward. J Am Board Family Med. 2009;22(4):412–22.

    Article  Google Scholar 

  4. Buda S, Tolksdorf K, Schuler E, Kuhlen R, Haas W. Establishing an ICD-10 code based SARI-surveillance in Germany - description of the system and first results from five recent influenza seasons. BMC Public Health. 2017;17(1):612.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. LeBlanc AG, Jun Gao Y, McRae L, Pelletier C. At-a-glance - twenty years of diabetes surveillance using the Canadian chronic disease surveillance system. Health promotion and chronic disease prevention in Canada: research. Policy and Practice. 2019;39(11):306–9.

    Google Scholar 

  6. Tomah S, Alkhouri N, Hamdy O. Nonalcoholic fatty liver disease and type 2 diabetes: where do diabetologists stand? Clin Diabetes Endocrinol. 2020;6(1):9.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Targher G, Corey KE, Byrne CD, Roden M. The complex link between NAFLD and type 2 diabetes mellitus — mechanisms and treatments. Nat Reviews Gastroenterol Hepatol. 2021;18(9):599–612.

    Article  Google Scholar 

  8. Fernando DH, Forbes JM, Angus PW, Herath CB. Development and progression of non-alcoholic fatty liver disease: the role of advanced glycation end products. Int J Mol Sci. 2019;20(20):5037.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Calzadilla Bertot L, Adams LA. The natural course of non-alcoholic fatty liver disease. Int J Mol Sci. 2016;17(5).

  10. Shaheen AA, Riazi K, Medellin A, Bhayana D, Kaplan GG, Jiang J, et al. Risk stratification of patients with nonalcoholic fatty liver disease using a case identification pathway in primary care: a cross-sectional study. CMAJ Open. 2020;8(2):E370–E6.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Services AH. Non-Alcoholic Fatty Liver Disease (NAFLD) Primary Care Pathway 2022: Alberta Health Services; 2022 [updated May 22, 2022. Available from:

  12. Lipscombe LL, Hwee J, Webster L, Shah BR, Booth GL, Tu K. Identifying diabetes cases from administrative data: a population-based validation study. BMC Health Serv Res. 2018;18:1–8.

    Article  Google Scholar 

  13. Jiang J, Southern D, Beck CA, James M, Lu M, Quan H. Validity of Canadian discharge abstract data for hypertension and diabetes from 2002 to 2013. Can Med Association Open Access J. 2016;4(4):E646–E53.

    Google Scholar 

  14. Honda Y, Yoneda M, Imajo K, Nakajima A. Elastography techniques for the assessment of liver fibrosis in non-alcoholic fatty liver disease. Int J Mol Sci. 2020;21(11):4039.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Lee S, Li B, Martin EA, D’Souza AG, Jiang J, Doktorchik C, et al. CREATE: a new data resource to support cardiac precision health. CJC open. 2021;3(5):639–45.

    Article  PubMed  Google Scholar 

  16. Ivers NM, Jiang M, Alloo J, Singer A, Ngui D, Casey CG, et al. Diabetes Canada 2018 clinical practice guidelines: key messages for family physicians caring for patients living with type 2 diabetes. Can Fam Physician. 2019;65(1):14–24.

    PubMed  PubMed Central  Google Scholar 

  17. McBrien KA, Naugler C, Ivers N, Weaver RG, Campbell D, Desveaux L, et al. Barriers to care in patients with diabetes and poor glycemic control-a cross-sectional survey. PLoS ONE. 2017;12(5):e0176135.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Sherwani SI, Khan HA, Ekhzaimy A, Masood A, Sakharkar MK. Significance of HbA1c test in diagnosis and prognosis of diabetic patients. Biomark Insights. 2016;11:95–104.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Morris DH, Khunti K, Achana F, Srinivasan B, Gray LJ, Davies MJ, et al. Progression rates from HbA1c 6.0-6.4% and other prediabetes definitions to type 2 diabetes: a meta-analysis. Diabetologia. 2013;56(7):1489–93.

    Article  CAS  PubMed  Google Scholar 

  20. Nunes JPL, DeMarco JP. A 7.0-7.7% value for glycated haemoglobin is better than a < 7% value as an appropriate target for patient-centered drug treatment of type 2 diabetes mellitus. Annals of Translational Medicine. 2019;7(Suppl 3):122.

    Article  Google Scholar 

  21. Quan H, Li B, Couris CM, Fushimi K, Graham P, Hider P, et al. Updating and validating the Charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. Am J Epidemiol. 2011;173(6):676–82.

    Article  PubMed  Google Scholar 

  22. Canada Go. Drug Product Database 2023 [Available from:

  23. Hux JE, Ivis F, Flintoft V, Bica A. Diabetes in Ontario: determination of prevalence and incidence using a validated administrative data algorithm. Diabetes Care. 2002;25(3):512–6.

    Article  PubMed  Google Scholar 

  24. Allen AM, Therneau TM, Larson JJ, Coward A, Somers VK, Kamath PS. Nonalcoholic fatty liver disease incidence and impact on metabolic burden and death: a 20 year-community study. Hepatology. 2018;67(5):1726–36.

    Article  CAS  PubMed  Google Scholar 

  25. Kaberi Dasgupta M, Michel Gagner M, James Kim M, Barbara MacDonald R, Natalia McInnes M, Sonja Reichert M et al. Remission of type 2 diabetes Diabetes Canada Clinical Practice Guidelines Expert Working Group.

  26. R Core Team R. R: A language and environment for statistical computing. 2013.

  27. Cunningham CT, Cai P, Topps D, Svenson LW, Jetté N, Quan H. Mining rich health data from Canadian physician claims: features and face validity. BMC Res Notes. 2014;7:682.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Deyo RA, Cherkin DC, Ciol MA. Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J Clin Epidemiol. 1992;45(6):613–9.

    Article  CAS  PubMed  Google Scholar 

  29. Jayatilleke A, Kriseman J, Bastin LH, Ajani U, Hicks P. Syndromic surveillance in an ICD-10 world. AMIA Annual Symposium proceedings AMIA Symposium. 2014;2014:1806-14.

  30. Khokhar B, Jette N, Metcalfe A, Cunningham CT, Quan H, Kaplan GG, et al. Systematic review of validated case definitions for diabetes in ICD-9-coded and ICD-10-coded data in adult populations. BMJ Open. 2016;6(8):e009952.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Lin X, Xu Y, Pan X, Xu J, Ding Y, Sun X, et al. Global, regional, and national burden and trend of diabetes in 195 countries and territories: an analysis from 1990 to 2025. Sci Rep. 2020;10(1):14790.

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  32. Chang CH, Shau WY, Jiang YD, Li HY, Chang TJ, Sheu WH, et al. Type 2 diabetes prevalence and incidence among adults in Taiwan during 1999–2004: a national health insurance data set study. Diabet Medicine: J Br Diabet Association. 2010;27(6):636–43.

    Article  CAS  Google Scholar 

  33. Miller DR, Safford MM, Pogach LM. Who has diabetes? Best estimates of diabetes prevalence in the department of veterans affairs based on computerized patient data. Diabetes Care. 2004;27(Suppl 2):B10–21.

    Article  PubMed  Google Scholar 

  34. Nasr A, Sullivan KJ, Chan EW, Wong CA, Benchimol EI. Validation of algorithms to determine incidence of Hirschsprung disease in Ontario, Canada: a population-based study using health administrative data. Clin Epidemiol. 2017;9:579–90.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Tang KL, Lucyk K, Quan H. Coder perspectives on physician-related barriers to producing high-quality administrative data: a qualitative study. CMAJ Open. 2017;5(3):E617–e22.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Lucyk K, Tang K, Quan H. Barriers to data quality resulting from the process of coding health information to administrative data: a qualitative study. BMC Health Serv Res. 2017;17(1):766.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Mehraeen E, Mehrtak M, Janfaza N, Karimi A, Heydari M, Mirzapour P, et al. Design and development of a mobile-based self-care application for patients with type 2 diabetes. J Diabetes Sci Technol. 2022;16(4):1008–15.

    Article  PubMed  Google Scholar 

  38. Mehraeen E, Noori T, Nazeri Z, Heydari M, Mehranfar A, Moghaddam HR, et al. Identifying features of a mobile-based application for self-care of people living with T2DM. Diabetes Res Clin Pract. 2021;171:108544.

    Article  PubMed  Google Scholar 

  39. Khademzadeh S, Toosi MN, Mehraeen E, Roshanpoor A, Ghazisaeidi M. Common data elements and features of a recommender system for people living with fatty liver disease. J Iran Med Council. 2022.

Download references


Not applicable.


None declared.

Author information

Authors and Affiliations



S. Lee, A. Shaheen, C. Naugler, H. Quan, and J. Lee were involved in conceptualization of the study. A. Shaheen, and D. Campbell reviewed medications. A.Shaheen, J.Lee, D. Campbell, H. Quan, and C. Naugler provided mentorship to S.Lee. J.Jiang and S. Lee conducted data extractions. R. Walker provided expertise surrounding primary care settings. S. Lee conducted analysis. All authors reviewed the manuscript and approved submission to the journal.

Corresponding author

Correspondence to Seungwon Lee.

Ethics declarations

Ethics approval and consent to participate

This study was approved by University of Calgary’s Conjoint Health Ethics Research Board (REB-20-1127). Informed consent was waived by the University of Calgary’s Conjoint Health Research Ethics Board due to the large size of the cohort and as this study was a secondary data analysis. All methods were performed in accordance with the Declaration of Helsinki.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, S., Shaheen, A.A., Campbell, D.J.T. et al. Evaluating the coding accuracy of type 2 diabetes mellitus among patients with non-alcoholic fatty liver disease. BMC Health Serv Res 24, 218 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: