Identification of data elements for blood gas analysis dataset: a base for developing registries and artificial intelligence-based systems

Background One of the challenging decision-making tasks in healthcare centers is the interpretation of blood gas tests. One of the most effective assisting approaches for the interpretation of blood gas analysis (BGA) can be artificial intelligence (AI)-based decision support systems. A primary step to develop intelligent systems is to determine information requirements and automated data input for the secondary analyses. Datasets can help the automated data input from dispersed information systems. Therefore, the current study aimed to identify the data elements required for supporting BGA as a dataset. Materials and methods This cross-sectional descriptive study was conducted in Nemazee Hospital, Shiraz, Iran. A combination of literature review, experts’ consensus, and the Delphi technique was used to develop the dataset. A review of the literature was performed on electronic databases to find the dataset for BGA. An expert panel was formed to discuss on, add, or remove the data elements extracted through searching the literature. Delphi technique was used to reach consensus and validate the draft dataset. Results The data elements of the BGA dataset were categorized into ten categories, namely personal information, admission details, present illnesses, past medical history, social status, physical examination, paraclinical investigation, blood gas parameter, sequential organ failure assessment (SOFA) score, and sampling technique errors. Overall, 313 data elements, including 172 mandatory and 141 optional data elements were confirmed by the experts for being included in the dataset. Conclusions We proposed a dataset as a base for registries and AI-based systems to assist BGA. It helps the storage of accurate and comprehensive data, as well as integrating them with other information systems. As a result, high-quality care is provided and clinical decision-making is improved.


Background
Artificial intelligence (AI) has revolutionized the health care industry. The AI technologies allow data analysts to transform raw data generated in healthcare facilities into meaningful insights for an effective decision-making process [1]. The large amount of data generated daily in health facilities makes decision-making difficult. Clinical decision support systems are a subset of AI designed to Open Access *Correspondence: zandf@sums.ac.ir facilitate decision-making in healthcare facilities using a large amount of data, medical knowledge, and analysis engines. These systems make patient-specific assessments or recommendations for healthcare providers [2].
One of the challenging decision-making tasks in healthcare centers is the interpretation of blood gas tests. Arterial/venous blood gas tests are among the highcost and frequently-ordered tests in intensive care units (ICUs). These tests demonstrate the respiratory and metabolic status of patients, as well as acid-base balance [3,4]. Acid-base imbalance can cause negative outcomes in patients, such as damage to the kidneys, cardiovascular system, and nervous system; if serious, it can be considered as a risk factor for death [5]. Consequently, the rapid diagnosis of blood gas disorders and acid-base imbalance can prevent severe complications. In order to make these tests effective diagnostic tools, physicians need to be professional in interpreting blood gas analysis (BGA). However, in contrast to other tests with values higher or lower than normal, BGA contains more than six parameters, which are complicated and difficult to interpret [6].
To simplify the interpretation of BGA, AI-based decision support systems can be highly useful [7]. These systems assist healthcare providers by transforming raw health data, documents, and expert practice into sophisticated algorithms or techniques, such as machine learning or knowledge graphs. As a result, healthcare decision-makers can find appropriate solutions to the underlying medical problems [8]. AI-based decision support systems can support BGA according to their knowledge base and predefined algorithms.
An initial step for developing intelligent systems is to determine information requirements and automated input of data for secondary analyses [9]. Jamieson et al [10]. found that electronic documentation improves the quality of documentation. The interoperability of data among information systems is necessary for the automatic input of data. Datasets can help automated input of data from dispersed information systems [11,12]. Dataset is a comprehensive data element list on a specific clinical condition [13], procedure [14], specialty [15], healthcare process [16], or an entire domain with broad scope [17].
Datasets may include historical data which can help us interpret an impression, a diagnosis, or a treatment for planning future follow-up strategies [9]. In order to develop a robust AI-based system, one should ensure seamless and comprehensive access to the related information, suggestively an integrated data view comprising of electronic health records, computerized physician order entry, laboratory systems, and other related applications. Such an arrangement would facilitate access to information as a comprehensive centralized data repository, which can be used to support various clinical decision support systems, machine learning, data mining, and deep learning. Moreover, the quality of data remarkably affects the standards and outcomes of the resultant decision support system [18]. The quality of data can be enhanced by proper structuring following the data standardization approach [19]. Datasets have been used in previous researches for AI-based technologies, including machine learning, deep learning, and data mining. For instance, Muhammad et al. applied machine learning models for the prediction of Coronavirus disease 2019 using an epidemiology dataset [20]. Hussain et al. also applied data mining algorithms on an accident dataset to determine the causes of accidents or prone locations [21].
Langarizadeh and Gholinezhad [22] have emphasized the role of defining datasets in laboratory reports, such as demographic, administrative, clinical, insurance, anesthesia, laboratory, observation, and interpretation for exchanging with information systems. A blood gas test needs a dataset as a base for developing AI-based systems. To our knowledge, there is no dataset developed for BGA. Therefore, the present study aimed to identify the data elements required for supporting BGA as a dataset.

Study design and setting
This cross-sectional descriptive study was conducted in 2020-2021. Experts from two hospitals affiliated to Shiraz University of Medical Sciences, namely Nemazee and Rajaee hospitals, in addition to experts from Kashan University of Medical Sciences participated in this study. The present study was conducted in Nemazee Hospital with 925 active beds as the largest educational and treatment center in Shiraz and the only referral hospital in Southern Iran. This hospital is also a pioneer in developing information systems, especially for ICUs [23,24].

Data elements identification
A combination of literature review, experts' consensus, and the Delphi technique was used to identify the data elements.

Stage one: literature review
To determine the data elements for the BGA dataset, first, a review of the literature was performed on the electronic databases of Cochrane Library, PubMed, and SCOPUS. A combination of terms related to dataset or registry (e.g., "dataset", OR "common data", OR "element", OR "MDS", OR "algorithms", OR "Guideline", OR "Clinical Protocols", OR "registries", "information system", OR "electronic health record", OR "database" AND terms related to blood gas, including "Blood Gas Analysis", OR "arterial blood gas", OR "venous blood gas", OR "ABG", OR "VBG" were searched in titles and abstracts were performed. In addition, a manual search of the related textbooks, patients' records, and the following websites was performed: "American Thoracic Society", "American Association for Critical Care Nurses", "Respirology", "European Respiratory Society", "British Association for Critical Care Nurses", and "Emergency Medical Journal and Thorax".

Inclusion and exclusion criteria
Any relevant papers reporting the indications or considerations for ordering BGA, as well as papers reporting any influential factors in BGA, or presenting a protocol, algorithm, rules, or explanation on how to analyze the blood gas results were included. Moreover, the existing datasets or registries capturing the data related to blood gas disorders were investigated [25][26][27]. Any report, guideline, and form available on the searched websites were also included. Studies were included without time limit if were published in the English language and their full text contained the determined keywords in the title or abstracts. Single case reports and studies on neonates, children, or animals were excluded.

Stage two: experts' consensus
A team of four experts, including a critical care specialist, a general practitioner with sufficient knowledge about blood gases, and two health information management specialists, was formed as an expert panel. The list of data elements extracted through a literature search was presented to the expert panel. Several sessions were held to tailor the initial draft of the dataset to the specific needs and practices of the ICUs by incorporating the opinion of medical specialists. Experts were invited to discuss on, add, or remove the data elements presented in the draft dataset. The criteria that might influence blood gas based on rational principles and are likely to be considered by physicians when interpreting the test results or are used for taking actions received higher scores. On the other hand, the criteria that do not affect blood gas received lower scores.
Eleven expert panel sessions were held to finalize the dataset. These expert panels started on 10 November 2020 and ended on 2 May 2021. Some of these sessions were held in the office of central ICU in Nemazee Hospital and some were held in the Anesthesiology and Critical Care Research Center affiliated to Shiraz University of Medical Sciences. Diseases in the draft dataset were categorized based on the eleventh version of the International Statistical Classification of Diseases and Related Health Problems (ICD-11). After finalizing the initial draft of the dataset in expert panel sessions, the dataset was presented as a checklist, the content validity of which was confirmed by four experts, including two other critical care specialists, one internal medicine specialist, and one health information management. They assessed the criteria in terms of clarity, contribution to BGA, and interpretability.

Stage three: delphi technique
Delphi technique was used to reach consensus and validate the draft dataset. Delphi technique is utilized by researchers when the available knowledge/information/ dataset/study is incomplete or is subjected to uncertainty and hence, a group opinion or decision is made based on the interaction between the researchers and a group of identified experts [28]. Another group of experts, including two anesthesiologists, two critical care specialists, two nephrologists, and two neurosurgeons were invited to review the dataset draft. The researcher presented the questionnaire to the experts and a faceto-face brief explanation was given about the study and the dataset design. These experts were asked to answer the questionnaire based on "Yes" (including mandatory and optional) and "No" options. Mandatory or optional were selected based on the impact of the data element on BGA or the complication of the results, as well as their prevalence/frequency of use (for diseases, medications, or toxins). Furthermore, "mandatory" data elements are those required when the user expects AI-based decision support systems to present a simple BGA. On the other hand, "optional" data elements are those needed when the user expects an advanced comprehensive BGA. Previous studies mostly focused on simple BGA [6,29,30]. However, in the current study, we created the "mandatory" and "optional" divisions to determine data elements required for simple and advanced BGA, respectively. A blank row was considered at the end of each section for experts to leave comments or to add necessary data elements. If 75% or more experts selected the "YES" option (either mandatory or optional), the data element was considered to be contained in the datasets. If 50% of experts selected the "NO" option, the data element was removed. If the consensus was between 50%-75%, the data elements needed revision. Six anesthesiologists and critical care attendants participated in another expert panel to discuss on and decide about the inclusion or exclusion of data elements with a 50%-75% consensus. The reliability of the dataset was evaluated based on the split-half method (the Guttman split-half coefficient was 0.83).

Results
As shown in Fig. 1, following the literature review step, 385 data elements were extracted. After expert panel sessions, 43 data elements were deemed unnecessary and were excluded. Delphi technique also resulted in the exclusion of 18 data elements. Moreover, 21 data elements obtained a consensus rate of 50%-75% and needed revision. An expert panel was held to discuss the latter 21 data elements, of which 11 were excluded resulting in 313 data elements. Table 1 shows the agreement level between Delphi method and the experts voting in each level.  The dataset of BGA was categorized into ten categories: 1) Personal information, 2) Admission details, 3) Present illnesses, 4) Past medical history, 5) Social status, 6) Physical examination, 7) Paraclinical investigation, 8) Blood gas parameter, 9) Sequential organ failure assessment (SOFA) score, and 10) Sampling technique errors (ABG Error). Overall, 313 data elements, including 172 mandatory and 141 optional data elements were confirmed by the experts to be contained in the dataset ( Table 2).
Essential data elements of "personal information" entailed medical record number, national code, first and last name, father's name, age, gender, birth date, estimated height, and estimated weight. "Admission details" include date/time of admission to hospital/ICU, admission type, surgical admission, insurance coverage, primary diagnosis, ICU diagnosis, and ICU intervention.
"Present illnesses" were defined as diseases that influence BGA and affected patients during the week before admission to the hospital. "Present illnesses" and "Past medical history" both included the subcategories of respiratory disease, renal disease, gastrointestinal disease/ liver disease, endocrine disease, cardiovascular disease, hematologic disease, and neurologic disease. However, the subcategories did not contain the same data elements. In addition, "Present illnesses" included infectious disease, trauma, drugs, and toxins as the further subcategories that can affect BGA. The other subcategories of "Past medical history" were genetic/congenital disorders, rheumatology/musculoskeletal diseases, and malignancy.
"Social status" data elements that affect BGA included opioid dependency, chronic alcohol consumption, sedative dependency, and tobacco chewing. The subcategories of "Physical examination" entailed vital signs, GCS, respiratory status, sedation status (RAS score), numeric pain scale, behavioral pain score, diaphoresis, shivering, cyanosis (if spO 2 unavailable or suspicious), urine output, nasogastric drainage, edematous states, and poor tissue perfusion (regional hypo-perfusion). "Paraclinical investigation" category was all the examinations that can help analyze blood gas, including but not limited to hemoglobin, potassium, blood urea nitrogen, creatinine, chloride, glucose, lactate, anion and osmolar gap, as well as the related measurements. The complete proposed dataset for BGA is presented in Table 3.

Discussions
In the present study, 313 data elements were approved by the experts to be contained in the dataset, including 172 mandatory and 141 optional data elements. These data elements were categorized into ten main categories, namely "Personal information", "Admission details", "Present illnesses", "Past medical history", "Social status", "Physical examination", "Paraclinical investigation", "Blood gas parameters", "SOFA score", and "Sampling technique errors (ABG Error)".
Despite the wide adoption of AI-based applications, such as machine learning in ICUs, to our knowledge, this is the first developed dataset of data elements required for comprehensive BGA. However, according to the systematic reviews performed by Syed et al. and Shillan et al. [31,32], machine learning applications are widely applied for predicting ICU mortality, readmission, acute kidney injury, and sepsis. Although advances in AI-bassed techniques have turned from "a future possibility" to an "everyday reality" for managing patients in ICUs, there are still challenges in the usage of these systems [33].
Due to the lack of interoperability of electronic systems which results in a lack of data integration, the potential of hospital data for solving healthcare problems is yet to be fully realized. Developing AI-based systems requires large datasets for modeling complex and non-linear effects or developing evidence-based algorithms [34,35].
In an attempt to cover this issue in intensive care, Johnson et al. [25] released the Medical Information Mart for Intensive Care (MIMIC-III) dataset that allows researchers to solve complex healthcare problems through developing electronic systems [31]. For instance, through extracting relevant features from the MIMIC-III dataset, Yang et al. [36] proposed an algorithm based on the noninvasive physiological parameters of patients to calculate the partial pressure of oxygen/fraction of inspired oxygen (PaO 2 /FiO 2 ) ratio for the identification of patients with acute respiratory distress syndrome. However, contrary to our proposed dataset, the MIMIC-III dataset does not contain all the specific data required for BGA. Our proposed dataset has the potential to be used as a base for developing such databases.
Some of the obtained data elements in our study are similar to those of previous investigations. Australian and New Zealand Intensive Care Society (ANZICS) has built one of the largest single datasets for ICU adult patients [26]. It contains a section named "blood gases" which collects data on the date and time of blood gas test, FiO 2 , PaO 2 , the partial pressure of carbon dioxide (PaCO 2 ), pH, and whether patients were intubated. However, it lacks many of the data elements required for automatic BGA. In addition to these essential data elements, our dataset contains diseases, drugs, toxins, and other paraclinical investigations which might affect blood gas interpretation. As a secondary verification or rather a confirmation practice, we recommend further evaluations of AI methods, such as machine learning using the proposed dataset in future studies.
One concern in the proposed dataset is the high number of data elements required for automatic BGA. Many of these data elements can be uploaded using the existing Table 2 The categories and subcategories of the proposed dataset  Table 3 The complete proposed dataset for blood gas analysis         electronic systems. For instance, a dataset has been developed for collecting progress notes data in Nemazee hospital [37]. It helped the electronic documentation of progress notes in the ICU. Therefore, it can be used to feed AI-based decision support systems designed for BGA. Another solution is a parent-child format of the dataset. The main category of "Past medical history" is a parent with eleven children. The AI-based decision support system requires the users to answer to a parent (with "YES" or "NO"). If "NO" is selected none of the children will be shown, and the system would ask the user to answer to the next parent, for example, "social status" with "YES" or "NO". This approach would prevent designing a primitive user interface with complex menus and lots of scrolling to fill out the required data elements, which are not suited to the fast pace of the ICUs. Through reviewing the trend of "monitoring" and "data acquisition" systems in ICUs, Georgia et al. [38] found that acquiring, synchronizing, integrating, and analyzing patient data is difficult because of the insufficient computational power and a lack of specialized software, incompatibility between monitoring equipment, and limited data storage. The development and application of datasets in practice assist in removing these technical challenges. Moreover, creating "mandatory" or "optional" divisions allows decreasing the data elements to save the time required for BGA, which means if the user selects a simple analysis, the data elements, required to be filled, will dramatically decrease.

Conclusion
We proposed a dataset as a base for developing AI-based systems to assist BGA. It helps the storage of accurate and comprehensive data, as well as the integration of these data in other information systems. Moreover, it contributes to the provision of high-quality care and better clinical decision-making through implementing the AI methods that help manage patients. This dataset has the potential to foster building databases with ICUs which is helpful for researchers, students, and policymakers for improving patients care in ICUs.