Development of a benchmark tool for cancer centers; results from a pilot exercise

Background Differences in cancer survival exist between countries in Europe. Benchmarking of good practices can assist cancer centers to improve their services aiming for reduced inequalities. The aim of the BENCH-CAN project was to develop a cancer care benchmark tool, identify performance differences and yield good practice examples, contributing to improving the quality of interdisciplinary care. This paper describes the development of this benchmark tool and its validation in cancer centers throughout Europe. Methods A benchmark tool was developed and executed according to a 13 step benchmarking process. Indicator selection was based on literature, existing accreditation systems, and expert opinions. A final format was tested in eight cancer centers. Center visits by a team of minimally 3 persons, including a patient representative, were performed to verify information, grasp context and check on additional questions (through semi-structured interviews). Based on the visits, the benchmark methodology identified opportunities for improvement. Results The final tool existed of 61 qualitative and 141 quantitative indicators, which were structured in an evaluative framework. Data from all eight participating centers showed inter-organization variability on many indicators, such as bed utilization and provision of survivorship care. Subsequently, improvement suggestions for centers were made; 85% of which were agreed upon. Conclusion A benchmarking tool for cancer centers was successfully developed and tested and is available in an open format. The tool allows comparison of inter-organizational performance. Improvement opportunities were successfully identified for every center involved and the tool was positively evaluated. Electronic supplementary material The online version of this article (10.1186/s12913-018-3574-z) contains supplementary material, which is available to authorized users.


Background
The number of cancer patients is steadily increasing and, despite rapid improvements in therapeutic options, inequalities in access to quality cancer care and thus survival exist between different countries [1]. These inequalities indicate room for improvement in quality of cancer care, identifying good practices can assist cancer centers(CC's) in improving their services and can ultimately reduce inequalities, benchmarking is an effective method for measuring and analyzing performance and its underlying organizational practices [2]. Developed in industry in the 1930s, benchmarking made its first appearance in healthcare in 1990 [2]. Benchmarking involves a comparison of performance in order to identify, introduce, and sustain good practices, this is achieved by collecting, measuring and evaluating data to establish a target performance level, a benchmark [3]. This performance standard can then be used to evaluate the current performance by comparing it to other organizations, including goodpractice facilities [3]. Due to globalization, absence of national-comparators, and the search for competitive alternatives, there is an increasing interest in international benchmarking [4]. However, a study by Longbottom [5] on 560 healthcare benchmarking projects, showed only 4% of the projects involved institutions from different countries. In literature, relatively few papers are published on healthcare benchmarking methods [6]. Moreover, to the best of our knowledge, there is no confirmed indicator set for benchmarking comprehensive cancer care. In 2013, the Organization of European Cancer Institute (OECI) [7] therefore launched the BENCH-CAN project [8], aiming at reducing health inequalities in cancer care in Europe and improving interdisciplinary comprehensive cancer care by yielding good practice examples. In view of this aim, a comprehensive international benchmarking tool was developed covering all relevant care related and organizational fields. In this study comprehensive refers to thorough, broad, including all relevant aspects -which is also a means to describe interdisciplinary, state of the art, holistic cancer care. In line with the aim of the BENCH-CAN project, the objectives of this study were (i) to develop and pilot a benchmark tool for cancer care with both qualitative and quantitative indicators, (ii) identify performance differences between cancer centers, and (iii) identify improvement opportunities.

Study design and sample
This multi-center benchmarking study involved eight cancer centers (CCs) in Europe, six of which designated as a comprehensive cancer center (encompassing care, research and education) by the OECI [9]. A mix of geographic selection and convenience sampling was used to select the pilot sites. Centers were chosen based on national location, in order to have a good distribution between geographical regions in Europe and secondly willingness to participate. All centers had to be sufficiently organized and dedicated to oncology, and treat significant numbers of cancer patients. Centers were located in three geographical clusters: North/Western-Europe (n = 2), Southern-Europe (n = 3) and Central/Eastern-Europe (n = 3). The benchmark tool was developed and executed according to the 13-step method by van Lent et al., [6] (see Table 1). In short, the first five steps involve the identification of the problem, forming the benchmarking team, choosing benchmark partners and define their main characteristics, and identify the relevant stakeholders. Step 6 to 12 will be explained in more detail in the following paragraphs. Ethical consideration was not applicable in this study.

Framework and indicators
As described in step 6 we developed a framework to structure the indicators. The European Foundation for Quality Management (EFQM) [10] Excellence Model (comparable to the Baldridge model [11]) was used for  performance-assessment and identification of key strengths and improvement areas [12]. Apart from the enabler fields, we adapted the Institute of Medicine domains of quality [13] for outcomes or results: effective, efficient, safe, patient-centered, integration and timely (Fig. 1). Indicators (step 7) were derived from literature [14] and expert opinion. Existing assessments were used as basis for the benchmark tool [15]. Stakeholders of the BENCH-CAN project such as representatives from the European Cancer Patient Coalition (ECPC), and clinicians and experts (such as quality managers) from cancer centers (OECI member centers, n = 71) provided feedback to reach consensus on the final set of indicators to be used in the benchmark (step 8). As one person per center was asked to collect feedback within that specific center, it cannot be determined whether the feedback was shared equally by the different stakeholder groups. The combination of data provision, site visit by a combined team and feedback provided sufficient possibilities for cross checking. For the financial and quantitative indicators this included the standardization of data collection to allow comparison between pilot centers and determining the level of detail for cost accounting.

Reliability and validity
A priori stakeholder involvement was used to ensure reliability and validity [6]. After collecting the indicators in step 9, the validity of the indicators was checked using feedback from the pilot centers based on three criteria [16,17]: 1) definition clarity, 2) data availability and reliability, 3) discriminatory features and usability for comparisons.

Indicator refinement and measurement
The indicators were pre-piloted in three centers to see whether the definitions were clear and the indicators would yield relevant, discriminative information. These three centers were selected based on willingness to participate and readiness to provide the data in a short period. Based on this pilot, we decided to add and remove indicators, and refine definitions of some indicators. After refinement, the resulting set of 63 qualitative indicators and 193 quantitative indicators was measured in the five remaining centers. The pre-pilot centers submitted additional information on the added indicators in order to make all centers comparable.
We collected data from the year 2012 and each pilot center appointed a contact person who was responsible for the data collection within the institute and the delivery of the data to the research team. After a quick data scan, a one-day visit to each pilot center was performed to verify the data, grasp the context and clarify questions arising from the provided data. The visits were performed by the lead researcher, a representative from the ECPC and representatives of (other) members of the consortium. The visits were also used to collect additional information through semi-structured interviews and to acquire feedback on the benchmark tool. In the semi-structured interview, the lead researcher provided some structure based on the questions that arose from the quick scan (see Additional file 1: Appendix 1 for a selection of five topics and corresponding questions in the semi-structured interviews) but worked flexibly and allowed room for the respondent's more spontaneous descriptions and narratives and questions from the other site visit members [18].

Analysis
Two methods were used to compare the qualitative and quantitative data. A deductive form of the Qualitative Content Analysis was used to analyze the qualitative data [18]. This method contains eight steps which are described in Table 2.
Quantitative data was first checked for consistency and correctness, and all cost data was converted into euros and adjusted for Purchasing Power Parity [19]. In Fig. 1 the BENCH-CAN framework. Note: The enabler domains from the EFQM model describe factors that enable good quality care. The results domains adapted from the IOM domains of quality describe how good quality care can be measured addition, data was normalized when necessary to be able to compare different types and sizes of centers. Used normalizations were: 1) openings hours of departments, 2) number of inpatient beds, 3) number of inpatient visits, and 4) number of full-time equivalent (FTE). All data was summarized and possible outliers were identified. Outliers were discussed with the relevant centers to elaborate on the possible reasons for the scores.
To ensure validity, a report with all data (qualitative and quantitative) was send to the pilot centers for verification. Not all centers were able to provide all data, as some were not able to retrieve and produce the data and others were concerned with the time needed to gather all the requested information. Hence, for some indicators centers are missing, as we did not use imputation. Data is structured according to the adapted domains of quality from the IOM; effective, efficient, safe, patient-centered, and timely.

Improvement suggestions
After comparison of all quantitative and qualitative data, three researchers independently identified improvement opportunities for each center. Improvement suggestions or opportunities (at least three per center) were only mentioned for those areas where the researchers felt the center could actually make the improvement without being restricted by for example regulations. Based on these improvement suggestions, if in agreement, pilot centers developed improvement plans.

Reliability and validity
Ten indicators deemed irrelevant (such as sick leave) were removed after the pre-pilot. Nineteen indicators were added based on evaluation criteria and feedback. Several indicator definitions were clarified. The final pilot-list contained 63 qualitative indicators and 193 quantitative indicators. After the pilot data collection, a secondary evaluation of the definition clarity, data availability, data reliability and discriminative value was performed. This re-valuation resulted in a final set of 61 qualitative indicators and 141 quantitative indicators that were deemed suitable for wider use in benchmarking cancer centers (Additional file 2: Appendix 2).

Performance differences between centers
The performances of the participating centers varied on many indicators, of which a selection is shown in Table 3 and described below. Organizations are anonymized. The results are structured according to the adapted domains of quality [13].

Effective
The majority of centers register crude mortality rates of their patient groups (n = 6) as shown in Table 3. Only Institute A publishes this rate. Another type of mortality, 30-day surgical mortality, was not registered in center B, C and G. Centers also reported difficulties with providing novel technologies and therapies limiting their ability to provide the optimal care for patients.

Efficient
Medical efficiency The medical efficiency, defined as the use of medical production factors to gain desired health outcome with a minimum waste of time, effort, or skills, greatly varies between the participating centers as shown in Fig. 2. Center G scores high (ratio of 7), whereas center C has a low number of daycare treatments (ratio 0.3) in relation to their inpatients visits compared to the other centers.
The utilization of beds differs between centers, as shown in Fig. 3. Especially center C, G and H have a relatively low inpatient bed utilization. Similarly, a large variation in utilization of the daycare beds is observed. Center E has a high daycare bed utilization, but scores average in the ratio between daycare treatments/inpatient visits. In contrast, center G also had a relatively high number of daycare treatments but a lower utilization.
Input efficiency Number of scans per radiology device varies between centers, as shown in Fig. 4. Center D scores high on the efficiency of MRI (4462 scans per MRI) X-ray (7703 scan per X-ray machine), and CT(13,836 scans). Center H scores high on the efficiency of MRI and CT. Center E has outsourced their MRI and no data was available from center G considering X-rays. Table 2 steps Qualitative Content Analysis [26] Step Action

Safe
Center A has a safety management system which is audited annually by an independent external agency. Prospective risk assessments are performed in center A before implementing new treatments, new care pathways or critical changes in key processes. Center B divided risk management into general risk management (e.g. risks of fire) and clinical risk management (e.g. transfusion risks and medication errors). Institute H adopted the "International Patient Safety Goals" (IPSG) issued by the Joint Commission International [20]. Most centers (n = 7)have an institution-wide reporting systems that registers different types of adverse events: near miss; incident; adverse event; sentinel event. Only doctors can make official notifications of a medical error in institute E and nurses cannot report an incident directly. Center G uses a system that generates reports for patient satisfaction, patient safety and patient complaints. Near misses should be reported in institute H according to their procedures but in practice only actual events are reported. For more information on the domain of safety see Table 3.

Patient-centered
Although all center have some type of contact-person for patients, none had an official case-manager for all patient pathways. In institute A and D a formalized inclusion of patients in the strategy development is present. Other centers reported to collaborate with external patient organizations to represent patients. All centers provide some care for cancer survivors, however, only center A has an extensive survivorship program in-house with a dedicated budget. Center G also reports to have a budget for survivorship care (e.g. Psychosocial support). For more information on patient centeredness see Table 3.

Timely
For seven centers the waiting times are set by the government (see Table 3). Institute A indicated that they encountered difficulties in meeting the maximum waiting time for some types of surgeries. The maximum waiting times are input for negotiations with healthcare insurers, and have potential influence on the funding for center A. Center H reports waiting times to the regional government who uses  this data to adjust the amount of services offered by the regional healthcare-system. Possible reasons mentioned for long waiting times are high demand of patients for diagnostic tests and insufficient staff. The largest variation between institutes occurred in overall waiting time before first visit, which varied between 1.5 and 21.8 days. Table 4 describes examples of improvement suggestions per pilot center and resulting improvement plans. Improvement suggestions varied from broader processes such as the involvement of patients in the care process, to specific recommendations (e.g. measure staff satisfaction). Adoption of case managers was a frequently mentioned improvement suggestion. Regarding the suggestion to improve patient participation in the organization, center C only partially agreed as they stated "not all patients want to be involved". Center A felt a complication registry was mainly useful per discipline and therefore partly agreed with the suggestion to implement an institution-wide complications registry. Out of the total improvement suggestions, pilot centers agreed with 85% and partially agreed with 15%. For center G improvement suggestions were given, however, no improvement plan was received.

Discussion
In this study, we developed a benchmark tool to assess the quality and effectiveness of comprehensive cancer care consisting of 61 qualitative indicators and 141 quantitative indicators. The tool was successfully tested in eight cancer centers to assess its suitability for yielding improvement suggestions and identifying good practices.
The benchmark data showed performance differences between cancer centers which led to improvement suggestions/opportunities for all participating centers. In general, the indicators revealed well-organized centers. However, there were indicators on which centers performed less. For example, not all centers register mortality rates and it is unclear whether these rates, when registered, are made public. Nevertheless, there is broad consensus that public reporting of provider performance can be an important tool to drive improvements in patient care [21]. An indicator on which only two centers performed well was the offering of in-house survivorship care by having a dedicated budget. An advantage of follow-up taking place in cancer centers is that it is comfortable for patients and provides continuity of care [22]. However, it is debatable whether offering this kind of care should be the responsibility of cancer centers, as multiple pilot centers already indicated to have tight budgets.
Large variety existed in the domain of efficiency between centers. This variety was only partly related to differences in healthcare systems, leading to multiple improvement suggestions. For example, center C, G and H had a relatively low inpatient bed utilization, which is likely to be less cost-efficient. Center G had a high number of daycare treatments but a lower bed utilization, possibly indicating a utilization loss. A higher ratio indicates efficient use of beds and chairs and, hence, most likely also staff use. Centers C and D might have a surplus of daycare beds and chairs. Wind et al. [23] showed that having fewer beds has no association with low financial performance and could indeed improve efficiency.
Another important improvement area was patient-centeredness. Specifically in the area of case management for which all centers agreed that it was necessary to implement or expand. Case management is an organizational approach used to optimize the quality of  Increase patient participation in the care process B/ Agree "We are already working in this area." An area on the website is under development were patient can access: future appointments, exams results and requisitions, among other clinical and administrative information.
The portal that is under development will have one tab containing the patients targeted information.
Improve patient participation in the organization/strategy development C/ Partially agree "Patients have to be involved. However not all patients want to be involved." All patients have to pass the MDT. And after discussion-take a decision on whether to participate. This participation has to be organized.
Develop a structured, institute wide adverse events analysis system C/ Agree "It is absolutely necessary to check and register these events. Important for the quality of care." Depends on the staff. Sometimes they hide the information Measure staff satisfaction C/ Agree "Staff has to be honest and not just provide the socially accepted answers." Regular discussions with staff, improve existing questionnaires Central complication registry may be useful A/ Partially agree "Complication registration is mainly useful for healthcare professionals, current registration system allows health professionals to see the data important for them, per discipline. Central registration could be useful to annually analyze the results and look at the trends compared to trends in for example new patients. The national institute for Clinical Auditing registers complications as well on a national level." Create system that can extract data from existing system or develop new registration system Implement Computerized Physician Order Entry E/ Agree Electronic prescriptions are currently being implemented: in the short term there will be 2 pilot actions for 2 departments. It is currently planned to include treatment details (chemotherapy data), transfusions and clinical trial participation.
F/ Partially agree "This is an important and urgent objective, but unfortunately due to regional restrictions the institute cannot be proactively proceed.
" Improvement of the electronic chart (e-chart): at regional level, the first attempt has been made within the region Assess and improve inpatient bed utilization H/ Partially agree "Inpatient bed utilization is planned and regulated at regional level." treatment and care for individuals within complex patient groups [24]. However, centers indicated that implementing or extending these case managers will take a long time and therefore categorized this as mid-term (2-5 years) or long-term (6-10 years) goals.

Limitations
Several assumptions underpinned this study. First, although we thoroughly searched the literature and existing quality assessments to identify indicators for the initial list, some suitable indicators may have been missed. Identifying suitable outcome indicators was more challenging than for example process indicators due to the difference in case-mix and healthcare system and financing. We tried to minimize this influence by including a large group of experts from various fields who had affinity with development and management of cancer centers and quality assessment in cancer care. We continuously modified the set of indicators in response to feedback on their relevancy, measurability and comparability by the pilot centers. An advantage of this approach is that the indicators benchmark what the cancer centers want to know, which can increase adoption of the benchmark format as a tool for future quality improvement. Second, the tool was only tested once in eight European cancer centers. This makes it impossible to say whether the benchmark actually led to quality improvements. Consequently, future research should evaluate the implementation of improvement plans to investigate whether the benchmark actually leads to quality improvement. In addition, future inclusion of more centers will allow to assess the actual discriminative capabilities of the indicator set. The benchmark tool was successfully applied in eight European countries with different wealth status. Although differences in healthcare systems and social legislation unavoidably led to differences in nature and availability of data, comparison still revealed relevant and valuable recommendations for all centers. We mainly achieved this by correcting for size, case-mix and type of healthcare reimbursements.
Finally, due to the extensive scope of indicators, it was difficult to go into detail for each topic. A benchmark focused on a single domain would allow to yield more profound information and more specific improvement suggestions and good practices. Future research is therefore advised to focus on specific domains of the BENCH-CAN framework, such as strategy and effectiveness, to gain a more profound understanding of the processes behind the performance differences, enabling a better comparison and more applied improvement recommendations.

Lessons learned
Multiple lessons were learned from benchmarking cancer care in specialized centers throughout Europe. First, representatives of the pilot centers indicated that international projects such as these can increase awareness that performance can be improved and promote the notion that countries and centers can learn from each other. Identifying successful or good-practice approaches can assist hospitals in improving their services, and reduce inequalities in care provision raising the level of oncologic services across countries. Pilot centers did however indicate not to be able to implement all suggestions or good practices due to socio-economic circumstances. Second, learning through peers enabled cancer centers to improve their performance and efficiency without investing in developing these processes separately. A frequently mentioned comment was the casual, non-competitive atmosphere which led to an open collaboration. Involvement of key stakeholders from the centers at the start of the benchmark is highly recommended to develop interest, strengthen commitment, and ensure sufficient resources which not only accommodates a successful benchmark but also ensures implementation of the lessons learned.
From our earlier review on benchmarking [25], we learned research on benchmarking as a tool to improve hospital processes and quality is limited. The majority of the articles found in this study [25] lacked a structured design, were mostly focused on indicator development and did not report on benchmark outcomes. With this study we used a structured design, reported the benchmark outcomes and contributed to the knowledge base of benchmarking in practice. Although improvement suggestions were made, within the scope of the study we could not report on the effect of the improvement suggestions. This reinforces the need for further research and evidence generation in especially the fields of effectiveness of benchmarking as tool for quality improvement, particularly in terms of patient's outcomes and learning from good practices.

Conclusion
In conclusion, we successfully developed and piloted a benchmark tool for cancer centers. This study generated more insight into the process of international benchmarking, providing cancer centers with common definitions, indicators and a tool to focus, compare and elaborate on organizational performance. Results of the benchmark exercise highlight the importance of an accurate description of underlying processes and understanding the rationale behind these processes. The tool allowed comparison of inter-organizational performance in a wide range of domains, and improvement opportunities were identified. The tool and the thereof derived improvement opportunities were positively evaluated by the participating cancer centers. Our tool enables cancer centers to improve on quality and efficiency by learning from good practices from their peers instead of reinventing the wheel.