Strategies for the quality assessment of the health care service providers in the treatment of Gastric Cancer in Colombia

Background While, at its inception in 1993, the health care system in Colombia was publicized as a paradigm to be copied across the developing world, numerous problems in its implementation have led to, what is now, an inefficient and crisis-ridden health system. Furthermore, as a result of inappropriate tools to measure the quality of the health service providers, several corruption scandals have arisen in the country. This study attempts to tackle this situation by proposing a strategy for the quality assessment of the health service providers (Entidades Promotoras de Salud, EPS) in the Colombian health system. In particular, as a case study, the quality of the treatment of stomach cancer is analyzed. Methods The study uses two complementary techniques to address the problem. These techniques are applied based on data of the treatment of gastric cancer collected on a nation-wide scale by the Colombian Ministry of Health and Welfare. First, Data Envelopment Analysis (DEA) and the Malmquist Index (MI) are used to establish the most efficient EPS’s within the system, according to indicators such as opportunity indicators. Second, sequential clustering algorithm, related to process mining a field of data mining, is used to determine the medical history of all patients and to construct typical care pathways of the patients belonging to efficient and inefficient EPS’s. Lastly, efforts are made to identify traits and differences between efficient and inefficient EPS’s. Results Efficient and inefficient EPS were identified for the years 2010 and 2011. Additionally, a Malmquist Index was used to calculate the relative changes in the efficiency of the health providers. Using these efficiency rates, the typical treatment path of patients with gastric cancer was found for two EPSs: one efficient and another inefficient. Finally, the typical traits of the care pathways were established. Conclusions Combining DEA and process mining proved to be a powerful approach understanding the problem and gaining valuable insight into the inner workings of the Colombian Health System, especially in terms of the treatment process performed by health care providers in critical illnesses such as cancer. However, no sufficiently compelling results were found to establish the contribution of such a combination to evaluate the quality in the delivery of health services.


Background
The Colombian Health System (CHS) is regulated by Law 100 of 1993. The CHS has evidence several problems in its implementation [1] and there is consensus across the country about the of an structural reform. Under these difficult circumstances, it is important to develop strategies and establish indicators that allow to estimate the quality of the health services in order to improve the system, which is of paramount importance for its population's well-being and quality of life. The heath care system is comprised of three independent actors. First, the institutions providing health services (IPS -Instituciones Prestadoras de Servicios de Salud) as hospitals, laboratories, health centers, among others; they are directly responsible for the treatment of patients and for providing all the necessary resources for the restoration of health and disease prevention. Second, the insurance companies known as Health Promotion Entities (EPS -Entidades Promotoras de Salud), they serve as intermediaries and administrators of the state resources. And third, the government, through the Colombian Ministry of Health and Welfare (MHW) and the Committee on Health Regulation (CHR), responsible for the regulation, control and management of the whole system.
As part of the compensation scheme the CHS, each IPS send to the EPS the minimum patient-data that the IPS are required to report in order to address, regulate and control the CHS. This data knows as Health Services Delivery Records RIPS (Registros Individuales de la Prestación de Servicios de Salud), includes, according to the law: user identification, type of consultation, type of procedures, emergency service data and medication and hospitalization data. The RIPS are issued at national-level and are consolidated and maintained by the MHW in a data warehouse.
The existence of RIPS enables the opportunity to measure the quality of the CHS, giving the possibility to identify strengths, weakness in the health system, and opportunities to improve the population's well-being. These issues, in conjunction with the necessity of measure the system quality and the need to develop unbiased measures of efficiency in the IPS to ensure an efficient system [2], motivate this work. This paper looks for evaluate the quality of the treatment for stomach cancer in Colombia, given that this type of cancer was one of the top thirteen causes of cancerrelated death in the country during the year 2015 [3], according to the Colombian epidemiological monitoring group on cancer. The proposal presents a strategy to evaluate the quality of gastric cancer treatment procedures in Colombia using DEA and process mining techniques. Firstly, quality indicators for each EPS are estimated. Secondly, a DEA model and the Malmquist Index are used to calculate the efficiency and the change in efficiency between 2010 and 2011, for a given institution. Finally, using sequential clustering, a typical algorithm in process mining, care paths for all patients are identified in order to assess the quality of a particular treatment. Furthermore, the proposed strategy can be used to identify the care paths for efficient and inefficient institutions, in order to compare them.
The paper is structured as follows: "Methods" section gives an overview of the methods used in the study and explores quality assessment in the context of gastric cancer. "Results and discussion" section presents the applicability of process mining and DEA to the treatment of gastric cancer in Colombia and presents our results. Finally, future work directions and conclusions are presented in "Conclusions" section.

Methods
The Colombian mandatory system of quality assurance (SOGCS, Sistema Obligatorio de Garantía de Calidad en Salud) includes a series of methodological elements to assess the quality of the health services. The current legislation states that the evaluation of the quality of services must be made by a comparison of observed quality and expected quality [4]. Such quality is validated based on pre-established rules, which are known to all participants in the health care system. These rules can come in different formats: manuals, practice guides, technical standards, and indicators, among others.
In this study, we chose to focus on the concept of clinical guidelines, which is defined as "a set of systematically developed recommendations to help practitioners and patients to make health care decisions and to select the most appropriate diagnostic and treatment options when addressing a health problem or specific condition" [5].
In order to evaluate the quality of gastric cancer treatments, we propose a three step strategy. First step prepares the information from RIPS to obtain those that are related to gastric cancer and with reliable information. Second step consists on applying DEA and the Malmquist Index techniques over the available information from different EPS. These techniques are applied by using a set of previously defined quality indicators related to the diseases treatment as base.
Finally, the third step selects a set of EPS with distinct efficiency characteristics and analyzed their treatment processes using the sequential clustering algorithm to construct typical care pathways. These efforts were aimed at identifying tendencies and patterns regarding the treatments applied to patients and relate them to the quantitative efficiency measures previously gathered for efficient and inefficient EPSs.

Data preparation
Data preparation consists of three basic steps: 1) Selection of patients diagnosed with gastric cancer; 2) Analysis of the quality of the available data; and 3) Grouping of the procedures performed on the patients starting from the time of the diagnosis.
Patient Identification. Table 1 shows stomach cancerrelated diseases codes used by ICD-10 standard (International Statistical Classification of Diseases and Related Health Problems 10th Revision). This standard is the one used on the RIPS an allows to identify all the patients suffering from stomach cancer in Colombia.
Analysis of the quality of the data. The patient data and their related procedures are analyzed in order to validate data quality. The two most common quality-related problems for the patients are data replication and inconsistencies between the procedures performed and the diagnosis; 9.9% of patients diagnosed with cancer are reported more than once between 2010 and 2011. Additionally, 1.2% of these patients did not have surgical procedures between 2010 and 2011, and only 39% of procedures had a patient associated to them in the analyzed data. All records with one of the aforementioned problems were excluded from the analysis.
Grouping of procedures. In order to reduce the number of different procedures to be analyzed and as suggested by an oncologist, a filter was applied based on the CUPS classification (Clasificación Única de Procedimientos en Salud). The CUPS, according to Resolution 1896 of 2001 of Colombia, are a logical, hierarchical and detailed classification of health procedures and interventions performed in Colombia, identified by a code and described by a unique nomenclature. The analyzed procedures in this work are related to the first three characters as follows: As in the precedent step, data quality-related problems were detected: (1) difficulty to validate the relationship between the procedure and the analyzed disease, (2) Data replication, and (3) reliability in the content of the attributes.
13.31% of records presented problems insofar as identifying a relationship between the procedure used and the disease analyzed. Approximately, 65% of procedures can be used for different health problems, affecting the precision in the analysis. Additionally, there are procedures related to the same patient on the same date, or a near date that are not possible. For example, complete stomach surgery followed by partial stomach surgery. Finally, the most important cases regarding reliability in the content of the attributes are related to time. In fact, the majority of records have 00:00:00.000 as their time value. All records with the aforementioned problems were identified and excluded.

Data envelopment analysis
Data Envelopment Analysis (DEA) is a non-parametric efficiency approach. It was developed by [6] and later elaborated by [7] (BCC model). DEA computes the relative efficiency of decision-making units (DMUs) with many inputs and outputs [8]. In the same way, DEA allows the quality assessment to include not only a set of performance indicators (outputs) but also the amount of consumed resources (inputs). As a result, DEA it is now considered mainstream to appraise the efficiency of health institutions [2]. Furthermore, Emrouznejad et. al in [9] concluded that DEA applications will continue to be a primary arena of research in the future. In operations management, DEA is used for benchmarking where a set of measures (indicators) is selected to estimate the performance of production and/or service operations, comparing multiple DMUs with a structure of multiple inputs and outputs. As a result, a set of DMUs that belong to a "best-practice frontier" [10], are identified. This frontier, allows us to calculate an efficient solution for every level of input or output. Any DMU not on the frontier is considered inefficient. A numerical coefficient is given to each firm, defining its relative efficiency. Where there is no actual corresponding firm, virtual efficient producers are identified to make comparisons [11].
Classical DEA models rely on the assumption that inputs have to be minimized while outputs have to be maximized [12]. However in health care, one or more outputs -called undesirable outputs-have to be minimized [13]. Even more, according to [14] considering such variables, in efficiency analysis, have paved the way for more thorough assessments. Nevertheless, modeling undesirable outputs has been object of considerable discussion in the efficiency literature, because of the lack of consensus about the most appropriated approach [3]. Even when variable transformations can be addressed to avoid this problem, [15] conclude that such transformations could generate loss of linearity. Authors also compare methods to deal with this situation.
Liu et al. [16] presented a survey of DEA applications from 1978 and 2010. According to the authors, health care is the second largest application area. Authors also state that most of the reviewed papers studied hospital performance. More recent papers studied the integration of DEA and complementary techniques to measure health care efficiency. As an example, Al-Refaie et al. [17] applied simulation and DEA to improve the emergency department of a Jordanian hospital. In this research DEA was used to identify the best possible scenario regarding nurses' workload. They concluded that using DEA to develop quality frontiers in health services is a new promising direction.
In recent years, several studies to evaluate health policies and health services has been developed. In Uganda, DEA was used to evaluate the efficiency of referral hospitals [18], as the same in Angola [19], Zambia [20], South Africa [21], among others. In Asia, recently [22] propose to evaluate the performance of maternal and child services in China hospitals using DEA, comparing poverty and non-poverty country hospitals. In Latin-America there are few studies that use DEA to evaluates performance in health institutions [23] propose DEA as new strategy to evaluate efficiencies in Chilean hospitals. In the best of our Knowledge there are not studies that allow us to evaluate the quality of a procedure in Colombia.
To evaluate the quality of gastric cancer treatments in different Colombian EPSs, a DEA model with non desirable output variables keeping the linearity between them are considered. After analyzing different models, we use the model proposed by [15], an output-oriented model with variable return to scale. The mathematical model is the following: Let x, y and b be sets that represent input desirable variables, output desirable variables and output non desirable variables. Equation 1 is the objective function that looks to maximize efficiency considering the constrains over each set of variables, represented by Eqs. 2 to 4. This last set of equations, are used to assign weights to output and input variables and to estimate the distance of each EPS from the efficient frontier.
[24] Productivity Index was developed to measure changes in technological productivity over a period of time. This index was initially proposed by [24] and has been used in several studies with multiple variants such as [25,26]. In order to measure the changes of EPSs' efficiency over the observed period, we used the index used by [27], defined as follows: ) be the efficiency of one DMU(x 0 , y 0 ) ( t 1 ) measured with respect to the technological frontier t 2 , and obtained from the results of the DEA model.
From the RIPS and based on health professionals' opinions, the following efficiency indicators were calculated for each EPS and used for the DEA analysis: • Once the input and output variables were defined, the most (and least) efficient EPSs in the Colombian Health System were identified. Subsequently, the efficient frontier was computed using the calculated efficiencies of each institution. Finally, the Malmquist indicator for each institution was calculated to evaluate the change in efficiency in the study period.

Process mining
Process mining is a field of data mining that allows the discovery of business processes in a given domain using different types of algorithms. Examples of process mining algorithms include genetic algorithms, heuristics mining and sequential clustering, among others [28]. This work uses a ProM plugin that provide the Sequence Clustering Algorithm, a combination between sequence analysis (first-order Markov chain) and clustering. In particular, we decided to use sequential clustering, given its popularity in the process mining community, and information provided to interpret the results. In fact, sequential clustering algorithms generate a series of discrete states that are very similar internally [29]. These series, known as clusters are represented by a Markov chain consisting in states and transition probabilities between them. These probabilities depend only on the current state.
The use of process mining in this work allowed us to identify typical treatments used in patients by distinct EPSs to analyze various treatment patterns and, in turn, improve the quality of the health services. To do this, we had to carefully select the EPSs to be analyzed and the model to be used to do this, as well as prepare the data to execute the model, and evaluate the results.
This project uses two different EPSs: the most and least efficient according to DEA results. This selection allows a comparison to be made between these EPS to understand patterns affecting treatment quality. On the other hand, the design of the model consists in creating the sequential clustering model, and preparing the data used in the analysis related to procedures applied in treatments. It is important to note that a detailed granularity level of information used can increase the complexity of the results because of the number of procedures and relations between them. Finally, the evaluation, presented in the following "Evaluation" section includes the criteria of an o ncologist to validate the quality of the identified treatments.

Data models
The data model used in the study includes two entities that consolidate information from patients with gastric cancer and procedures used in the treatment of these patients. According to patients, the entity has a unique patient identifier, information about institutions related to the health service (EPS and IPS unique codes), date of the first diagnosis, main diagnosis of the consultation, and other three possible diagnoses related to the consultation. On the other hand, the entity where the procedure takes place holds information about the date of procedure, institutions related to the health service (EPS and IPS unique codes), patient identifier, procedure code (CUPS identifier), and the main diagnosis of the procedure.
The entity where the procedure takes place does not hold information on time, given the previously discussed data quality problem in this field. As a consequence, the identified paths using sequential clustering represent patients immediately operated on after the diagnosis, although, in reality, that is not the case.

Data aggregation
An aggregation strategy based on the CUPS was used in order to reduce the complexity of the models arising from the vast number of different treatments that a cancer patient can be subjected to and to facilitate analysis by the experts, according to procedure characteristics such as: type of procedure (diagnostic, radiotherapy, surgery) and affected organ (intestine, stomach). Table 2 shows the aggregation scheme proposed to characterize non-surgical and surgical procedures, presented in the CUPS section column. This schema has two levels to define the procedures, Level 1 and Level 2 columns in the Table. According to non-surgical procedures, Level 1 represents diagnosis, radiotherapy and chemotherapy procedures. Level 1 for surgical procedures represents diagnosis, pre and post operative and surgery procedures. Level 2 has more specific information about a Level 1 procedure. For example, non-surgical diagnosis procedures are classified in terms of radiography, tomography, clinical examination, derived examination and scintigraphy.
Finally, the percentage in brackets in the Table represents the percentage of procedures in each category used in this work. In this way, the information used contains 92.2% of non-surgical procedures.

Evaluation
The evaluation phase is carried out by a gastric cancer oncologist, according to whom, the quality of a given medical treatment depends on some of the features included in the patient's care pathway. Thus, the treatments administered to a patient with gastric cancer should include the following features: • Will follow the treatment established in the clinical guide for the disease. • Will establish diagnostic procedures before and after a surgical procedure. • Will include procedures such as chemotherapy or radiation treatments. Furthermore, these procedures should be applied sequentially.

Results and discussion
The methodology described above, is tested using information from the reported RIPS in 2010 and 2011. The efficiency indicators are estimated for each EPS, for each year. Figures 1, 2   From the same figures, we can conclude that the variability of indicators decreased from 2010 to 2011, except in the case of "number of previous studies" (Fig. 2). Figure 5 shows how the treatment opportunity decreased from 2010 to 2011, reflecting the effort made for each EPS to diminish the time elapsed between the diagnoses and the beginning of treatment.
Using a mean differences hypothesis tests, with unknown variances, it is possible to say that there is no statistical evidence to affirm that there are differences in histological studies and in the number of readmissions to EPS. On the other hand, it is possible to reject the null hypothesis of average difference for treatment opportunity and previous studies indicators, which present improvements in the observation period.  Tables 3 and 4 respectively. For 2010, 27% of the EPS are considered efficient and 24% extremely inefficient (i.e., efficiencies below 60%). To become efficient, each EPS belonging to quantile 4 must improve its indicators by up to 80%. For example,  Considering the target values obtained for 2011, an overall improvement is observed: 20% of the EPS were efficient, and 37% of the EPS showed efficiencies below 60%. While in 2010, one "efficient EPS" had a treatment opportunity of 59 days and a 2.65 readmissions rate, for the same EPS, these values changed to 35 days and 1.53 readmissions. Finally, four of the efficient EPS in 2010, were also efficient during 2011. On the other hand, the Malquimst index allows for identifying the effect of EPS policies over efficiency. For the EPS 21, for example, efficiency increased from 0.67 in 2010 to 1 in 2011 (see Table 5). Target values show that, EPS21 must increase previous studies from 7 (2010) to 10.3 (2011), and decrease treatment opportunity from 445 to 233 days. However, in 2011, the EPS21 registers 55 studies and 11 days treatment opportunity. Since, the improvement in efficiency of EPS 21 is not due to changes in the frontier (given by the target values) but to its own policies, the Malquimst index is equal to 3.96. It is important to note that for one EPS with only one diagnosed patient, making changes of this magnitude in the indicators requires less investment than the one required for one EPS with 164 patients.

Process mining
This work used ProM 5.2 [30] as Process Mining tool, an open-source framework licensed under Common Public License (CPL), which allows the use of a variety of algorithms and friendly interface, in particular, sequential clustering. The sequential clustering needs to configure a number of parameters such as: minimum and maximum percentage of events that each cluster can have (assigned values were respectively 0, 100), minimum and maximum number of different events that a sequence can have (we use respectively 1 and 303), minimum and maximum number of sequences that a cluster can have (we use 1 to 102 values), and number clusters that the model should contain, we decide to use 1. It should be noted that the purpose of this work was to try to identify a general pattern of treatment for all the patients in a specific EPS. To do this, it was decided to use a single cluster for EPS to represent the entire treatment process.
According to "Methods" section, two types of EPS were chosen to study and to illustrate the methodology, using the efficiency scores obtained from the DEA process.

EPS 1:
This EPS is close to the efficient frontier and has a major positive change in the relative efficiency between the periods of study. 2. EPS 2: This EPS is well below to the efficient frontier and has a relatively low efficiency in the periods of study. Table 6 shows the number of patients, procedures and efficiency scores of the two chosen EPSs.

EPS 1 (Close to the efficient frontier)
Figures 6 and 7 demonstrate the typical treatment process for EPS 1 for each of the years studied. It is not possible to identify a general or distinctive path representing the treatment followed by patients in this EPS due to the complexity of the resulting model. This may be due to several reasons. First, a clinical guide for the treatment of gastric cancer may not be available or, this EPS may not follow it. In addition, it is also possible that the conditions during patient diagnosis are greatly varied, leading to a high number of possible treatments.
However, it is possible to discern paths in the treatment used by the EPS. For example, in the year 2010, the treatment usually started with either a non-surgical diagnostic procedure (75.9% of the time) or a surgical diagnosis procedure (18.7%). If a non-surgical diagnosis procedure was performed, usually an identical procedure was executed (82% of the time); then, the patient might have received chemotherapy treatment (6.5%) or a second operation (1.2%). Similarly, if a surgical diagnostic procedure was applied, typically a non-surgical diagnosis (74.6%) followed. Afterwards, the patient may have undergone chemotherapy treatment (2.5%) or a second operation (2.2%). Finally, when a patient underwent either surgery, chemotherapy or radiotherapy a diagnostic procedure normally followed.
Similarly, in the year 2011, the treatment usually started with either a non-surgical diagnostic procedure (71.8%) or a surgical diagnosis procedure (26.5%). If a non-surgical diagnosis procedure was performed, usually an identical procedure was executed (82%); then, the patient might have undergone chemotherapy treatment (6.3%) or entered into remission (3.4%). Likewise, if a surgical diagnostic procedure was applied, typically, a non-surgical diagnosis (66.3%) followed. Alternatively, the patient may have been subjected to chemotherapy treatment (3.3%) or a second operation (2.2%). Figures 8 and 9 exhibit the chemotherapy and/or radiotherapy cycle for each of the years studied. It should be noted that cycles of chemotherapy and radiotherapy were observed in the treatment of the patients, since 79.1% of those that received chemotherapy at least once, received chemotherapy again. The next most common procedure for the patients was radiotherapy, in 4.3% of the cases. Additionally, if a patient received radiotherapy, in 17.1% of the cases, she or he received another radiotherapy treatment and in 48.8% of the cases, the patient underwent chemotherapy.
EPS 1 diagnosed its patients in a timely, and effective manner, since its treatment traits are those identified  above as those of a high-quality procedure for gastric cancer. As a consequence, the position of the EPS on the efficient frontier is not surprising, given that the treatment of its users follows one of the most appropriate for such patients. Figures 10 and 11 show the typical treatment process for EPS 2 for each of the years studied. The resulting model is very complex as is the case for EPS 1. Hence, it is not possible to identify a distinct path representing the treatment of the patients, even though the number of patients was less than half of the patients for EPS 1.

EPS 2 (Far from the efficient frontier)
However, it is possible to discern particularities in the treatment that the EPS performed. For example, in the year 2010, 75.9% of the time the treatment started with a non-surgical diagnosis, 14.7% with a surgical diagnosis and 12% with chemotherapy or radiotherapy. If a nonsurgical diagnosis procedure was performed, usually an identical procedure was executed (87% of the times), and the patient probably ended her or his treatment (5.7%), 2% might have undergone chemotherapy or a second surgery (1.9%). Similarly, if a surgical diagnostic procedure was applied, typically a non-surgical diagnosis (64.2%) followed, after which the patient may have ended her or his treatment (15.1%), had a second surgical operation (11.3%) or received chemotherapy (5.7%). Finally, when a patient underwent either surgery, chemotherapy or radiotherapy a diagnostic procedure normally followed.
Similarly, during 2011, the treatment usually started with either a non-surgical diagnostic procedure (78.6%) or a surgical diagnosis procedure (14.3%). If a non-surgical diagnosis procedure was performed, usually an identical  procedure was executed (85.8%), then, the patient might have received chemotherapy treatment (2.2%) or a surgical diagnosis. Similarly, if a surgical diagnostic procedure was applied, typically a non-surgical diagnosis (64.2%) followed. Alternatively, the patient may have received chemotherapy treatment (5.7%) or a second surgery (11.3%).
Consequently, as we can see from the analysis of EPS 1, it is clear that EPS 2 performs more surgeries than its counterpart. Similarly, the likelihood of a patient undergoing a second surgery soars in comparison with EPS 1. However, it should be pointed out that in most cases, surgical and non-surgical diagnostics were performed before procedures such as surgeries, chemotherapy or  radiotherapy, showing that the treatment provided had at least a few desirable traits. Interestingly, unlike EPS 1, it was detected that 12% of the treatments began with a diagnosis that did not belong to the suggested clinical guidelines for the treatment of gastric cancer. Figures 12 and 13 present the chemotherapy and/or radiotherapy cycle for each of the years studied. It should be noted that cycles of chemotherapy and radiotherapy were observed in the treatment of the patients, since 49.2% of the patients that received chemotherapy at least once received chemotherapy again. The most common procedure for the patients was radiotherapy in 4.9% of the cases. Additionally, if a patient received radiotherapy, in 46.2% of the cases, the patient also received chemotherapy. It is noteworthy that none of the patients received radiotherapy in a consecutive, periodical manner.
In conclusion, the patients in EPS 2 have a higher probability of undergoing a second operation. This characteristic may be related to problems during the diagnostic step. Additionally, it does not apply radiotherapy consecutively and, in general, complies loosely with the available clinical guide. These elements can explain the position of EPS 2 in the efficient frontier.

Discussion
This paper proposes an innovative approach applying Data Envelopment Analysis (DEA), and Process Mining to evaluate efficiency and quality in the treatment of gastric cancer. The aggregate strategy proposed was very useful in reducing the complexity in the models. Data Envelopment Analysis has been used by other authors to analysis hospital efficiency in Zambia [20] and Iran [31]. Our results show variation between different indicators and treatments used by insurance companies and are similar to other studies that are more focused on clinical guidelines analysis [32]. Other studies combined DEA with Malmquist indices to measure hospital productivity [33]. There is limitation in the database used in this study and other authors recommended a level 3 data base (the most comprehensive) with information from health program enrollment files, including when eligibility begins and ends [34]. The analysis should consider factors such as complexity and current regulations and guidelines in the treatment of different types of cancer. The use of specific information about the disease analyzed such as the stage of the cancer, studies focusing on IPS, or studies related to less complex diseases are suggested for future work. Efforts should be aimed at designing objective metrics that differentiate between efficient and inefficient entities in terms of procedures. Analyses oriented to identify hospitals or doctor's behavior will be tackle using the same strategy. The analysis should consider factors such as complexity and current regulations and guidelines in the treatment of different types of cancer.

Conclusions
This paper proposes a combined strategy, applying Data Envelopment Analysis (DEA), and Process Mining to evaluate efficiency and quality in the treatment of gastric cancer undertaken by health service providers in Colombia. Results showed that the method used is a useful approach to understand and tackle the problem. The analysis of the treatment of gastric cancer allowed us to find relevant information on the factors affecting a treatment process performed efficient or inefficiently. However, the complexity of the disease studied and the diversity of the health system, make it difficult to confirm the contribution obtained using the combined strategy proposed. The use of specific information about the disease analyzed such as the stage of the cancer, studies focusing on IPS, or studies related to less complex diseases are suggested for future work. On the other hand, it is important to note that the data preparation phase is very important to understand the data and to correct data quality problems in order to improve the quality of the results. In the same way, the aggregate strategy proposed was very useful in reducing the complexity in the models. Finally, the generalized analysis of the treatment for gastric cancer (the procedures were grouped by their role in the treatment rather than being studied as individual procedures), allowed us to improve the knowledge of the clinical treatment of a typical gastric cancer patient. As future work, efforts should be aimed at designing objective metrics that measure the differences between efficient and inefficient entities in terms of procedures. Additionally, the proposed strategy must be extended so that a preliminary analysis is carried to select the disease. The analysis should take into account factors such as complexity and current regulations and guidelines in the treatment of different types of cancer.