Lack of comparability in international health data can lead to issues in unnecessary resource use, an increase in reporting errors and omissions, and continuous use of crosswalk and mapping between systems [3]. We conducted an international online questionnaire to better understand the differences in ICD coding practices and hospital data collection systems across countries. Results from 47 participants from 26 countries revealed variances in all aspects of their hospital morbidity data collection systems, with special mention to the disparities in ICD-meta-features: the maximum number of coding fields allowed for diagnosis, the definition of main condition, and diagnosis timing as a mandatory data field to be captured in the hospital morbidity database. Ultimately, the results of the current survey might encourage countries to enhance the quality of their hospital morbidity databases and administrative health data. In particular, these findings offer insights regarding the potential to achieve greater comparability with adoption of the new ICD-11 for Mortality and Morbidity Statistics (ICD-11 MMS).
To our knowledge, this is the first survey inquiring about the number of coding fields for both diagnosis and hospital interventions across countries. Consistent with our results, a similar prior survey had explored ICD use, although they only reported on 9 countries and focused exclusively on the diagnosis coding fields [15]. The effects of limiting the allowable number of diagnosis fields in hospital administrative data had been investigated previously in the literature [15,16,17,18,19,20]. These studies coincided in their conclusions: there is a decrease in prevalence estimates and accuracy of coding (undercoding) when the number of coding fields is substantially reduced (e.g., truncating original abstracts from 25 to 5 diagnosis fields). When countries have a different number of diagnosis coding fields compared to one another, the clinical complexity of a hospitalized patient is not adequately reflected in their administrative health data and may lead to undercoding of health conditions. The current study reinforced that these differences still exist, and thus the quality and comparability of the data is affected.
Furthermore, it was previously identified that the definition used for describing the “main condition” or the principal diagnosis differ internationally [21], which was also reflected in the results of the current study. However, through the years, there have been changes in the main condition definition used by each country. For example, a country that reported using “reason for admission” in 2014 may have switched to using “main resource use” in our survey, which is the recommended definition by the WHO for the ICD-10 version. The lack of standardization in data collection systems affects the validity and usability of ICD-coded data within and across countries. Data with low validity can affect case selection and inferences made from coded data research, thus resulting in selection bias. More specifically, low validity could lead to issues such as: i) underestimating the disease burden, ii) inaccurate adjustment for severity of illness in assessing for quality and safety, and iii) incorrect calculation of estimated costs generated from disease grouping methodology for hospital payment [21]. Therefore, harmonizing and establishing consistent use of ICD meta-features for morbidity datasets should be a common goal to reduce variations in analyses.
To improve morbidity information in ICD-coded hospital data, the WHO Topic Advisory Group on Quality and Safety (QS-TAG) recommends expanding the number of coding fields to 15–20 (at least) [15], as well as adopting the diagnosis timing flag [22]. The diagnosis timing flag indicates whether conditions were developed in-hospital or prior, and has been incorporated into ICD-11 MMS. Combining these recommendations from the QS-TAG with enhanced education of healthcare providers and coders has also been suggested for optimal capture of clinical information [19, 22, 23]. It is important that the number of data fields and conditions captured is balanced with feasibility of data collection, as coders and coding managers have reported that data quality and quantity of the data collected must be balanced with meeting timelines and quotas set by their jurisdictions [24, 25].
Beyond the variations in ICD coding and data collection across countries, there are other factors that prevent adequate comparative analysis of international hospital data [15]. Such factors include documentation quality within medical charts, hospital payment mechanisms (financial incentives), coding guidelines, as well as ICD versions and modifications [3, 19, 26, 27]. As an example, the quality of the output (ICD codes) depends on the quality of the input (information documented in the medical chart). Missing data in the medical chart would decrease the prevalence of certain diseases and result in undercoding [26]. In parallel, reimbursement methods in some hospitals are based on diagnosis-related groups. Further, financial incentives might lead physicians to choose one diagnosis over the other, thus resulting in subsequent over- or undercoding of that condition [27]. By standardizing the features used for coding, countries can reduce within- and across-country differences in data collection processes, and ultimately increase the accuracy of disease burden estimates.
Further adding to the complexity of international coded data comparability are the modifications that various countries have adopted to meet their specific healthcare needs and contexts. For ICD-10, these countries include Australia (ICD-10-AM), Canada (ICD-10-CA), Germany (ICD-10-GM), and Korea (ICD-10-KM). Modified versions include an increased number of codes, and code-specific changes at the more granular levels (4th, 5th and 6th digit-levels). Certain categories of codes exist in one country but may not exist in another, limiting the ability to compare certain clinical contexts. For example, the ICD-10-GM has a chapter specific to behavioral and mental disorders (Chapter 5), whereas the ICD-10-CA does not, thus limiting the comparability of these disorders between the two countries [28, 29]. Issues with comparability will endure when countries do not adopt ICD systems simultaneously, or only adopt portions of it [3]. For example, in the current study, we found that some countries adopted ICD-10 for morbidity coding, but continued to use ICD-9 for coding severity and billing purposes. Finally, it is infeasible for some developing countries to adopt ICD, as it requires complex coding processes (e.g., advanced electronic infrastructure). To accommodate for hardware or software limitations, developing countries have created simplified versions of ICD, and with multiple versions in use, [5], international comparability is compromised.
The current study has limitations. The response rate was suboptimal (26 of the 117 countries currently using ICD), which is an inherent disadvantage of online surveys. To mitigate this, snowball sampling was used to reach as broad an audience as possible, followed up with reminder emails. There is also the potential for selection bias, particularly non-response bias, by which certain types of survey respondents (those countries who did not respond) are under-represented. As such, it is possible that the full extent of ICD and hospital morbidity data features was not captured from many countries. However, we did observe that our responses reached a certain level of saturation, and that our conclusion will remain that variability in data collection features exist. Despite the small number of responses, there were survey participants representing every continent, thereby gaining a deeper understanding of ICD and hospital morbidity data collection features worldwide. Furthermore, the survey results could incentivize countries to participate in future research, so that they may be able to understand how their data compares to other countries on a global scale. Another difficulty emerged during data analysis when grouping responses geographically, as sometimes there was more than one participant from the same country and their replies were discordant. When this happened, all answers were included and disclosed in the results for transparency. Conflicting information within countries may be explained by the lack of standards both at national and international levels. Further, the current questionnaire did not inquire how the hospital morbidity database was used in the participants’ country (e.g., for research, health system administration, etc.), thus we do not know the extent to which usability was impacted by data quality and the differences in data collection features. Lastly, the questions posed in the survey may have had a general quality to them, as they were not made to be granular, knowing some variation existed. General questions may have sacrificed some specificity for the characterization of data features, in order to offer a universally-interpretable survey. We also offered open-ended questions as options for countries to provide details about their unique data collection characteristics.