The calculation of quality indicators for long term care facilities in 8 countries (SHELTER project)

Background Performance indicators in the long term care sector are important to evaluate the efficiency and quality of care delivery. We are, however, still far from being able to refer to a common set of indicators at the European level. We therefore demonstrate the calculation of Long Term Care Facility Quality Indicators (LTCFQIs) from data of the European Services and Health for Elderly in Long TERm Care (SHELTER) project. We explain how risk factors are taken into account and show how LTC facilities at facility and country level can be compared on quality of care using thresholds and a Quality Indicator sum measure. Methods The indicators of Long Term Care Facility quality of care are calculated based on methods that have been developed in the US. The values of these Quality Indicators (QIs) are risk adjusted on the basis of covariates resulting from logistic regression analysis on each of the QIs. To enhance the comparison of QIs between facilities and countries we have used the method of percentile thresholds and developed a QI sum measure based on percentile outcomes. Results In SHELTER data have been collected with the interRAI Long Term Care Facility instrument (interRAI-LTCF). The data came from LTC facilities in 7 European countries and Israel. The unadjusted values of the LTCF Quality Indicators differ considerably between facilities in the 8 countries. After risk adjustment the differences are less, but still considerable. Our QI sum measure facilitates the overall comparison of quality of care between facilities and countries. Conclusions With quality indicators based on assessments with the interRAI LTCF instrument quality of care between LTC facilities in and across nations can be adequately compared.


Background
Quality of care is a complex, multi-dimensional concept. The US Institute of Medicine defines quality of care as "the degree to which health services for individuals and populations increase the likelihood of desired health outcomes which are consistent with current professional knowledge" (www.iom.edu). There is interest in the creation of performance indicators that can measure quality by examining the structure, process, and outcomes of care. One method of identifying potentially good and poor professional quality of care is the use of quality indicators, which can be defined as "markers that indicate either the presence or absence of potentially poor care practices or outcomes". The aim of quality indicators use is therefore to identify the clinical areas that can benefit from improvement of the care process and to define performance of individual care providers [1].
Quality Indicators (QIs) for monitoring quality of care in nursing homes have been developed using assessment data from the widely implemented Resident Assessment Instrument (RAI) for Long Term Care [2][3][4]. Routine monitoring of these QIs led to QI reports being used for best practice comparison between nursing homes. A study commissioned by the US Centers for Medicare and Medicaid Services (CMS) demonstrated that the items from routine use of the RAI for Long Term Care in US Nursing Homes are reliable and that they can be used for the stimulation of improvement of care and reporting to the general public [5,6]. For most of the QIs some risk adjustment is necessary to allow useful comparison of them between facilities [7,8]. Although the relationship between outcomes and good and bad care practices were not equally strong for all available QIs, 10 QIs had a good enough relationship between identifiable pro-active and responsive care practices. These QIs have been selected by CMS for periodically public reporting at facility level.
A four step approach was used in the CMS commissioned development and validation of QI's for nursing homes.
1. Selecting indicators of professional quality of care.
Using large datasets gathered from routine practice, focus groups discussed which assessment items or combination of items might indicate dimensions of quality of care (face validity). QIs then were defined together with the method for calculating numerator and denominator values (construct validity). To be useful the indicators must, in addition, show enough variance between facilities, have high enough prevalence, and show sensitivity to change when care practices change; 2. Correlating indicators with quality of care. Experts must agree that high scores (or low scores) on the indicators in a facility or agency correspond to bad (or good) quality of care. This was formalized by research that identified care practices that correlated well with indicator scores pro-actively (i.e. prevent problems) or responsively (i.e. remedy problems); 3. Identifying person level risk factors. Factors that legitimately increased or reduced the likelihood of an individual scoring on the indicators were identified by regression analysis of client characteristics as recorded in the assessment items. 4. Identifying service level bias. Service level bias (ascertainment bias) manifests itself in two related forms: service/facility admission practice, and staff competence in observation and recording. Nursing homes that admit a relatively large number of clients with some specific indicator problems often continued to score high on these indicators at follow-up, despite risk adjustment. When experts examined the practice of these services/ facilities, the quality of care in these indicators areas was not necessarily poor. A Facility Admission Profile (FAP) covariate was defined to resolve this matter [5].
In two papers that came out of the European AgeD in HOme Care (ADHOC) study [9], the calculation of QIs for Home Care was explained and discussed [10,11].
In this paper, similarly, we aim to explain and discuss Quality Indicators for Long Term Care Facilities (LTCFQIs) based on interRAI LTCF assessments [12] and their calculation. To do that we specifically show results of the calculation at country level in the 8 countries participating in the European Services and Health for Elderly in Long TERm care (SHELTER) study [13].

The list of long term care facility quality indicators
The LTCFQIs shown in this paper were initially commissioned on contract by CMS [5]. The indicators were developed for use with the mandated MDS 2.0 assessments for Nursing Homes. We have 'translated' them to be used with the interRAI LTCF instrument [12]. Most could directly be calculated by substituting the MDS 2.0 items with corresponding LTCF items. On occasion some codes in the LTCF items needed to be collapsed to give the same code set as in MDS 2.0. For some of the indicators the conversion of items could not be done or was too complex. The indicators 'depressed/anxious mood worsening' and 'walking performance maintenance/ improvement' were for that reason deleted from the list.
The results of the calculation of the LTCFQI's have been quarterly reported in the last 6 years to facilities in the Netherlands that used the interRAI LTCF which appears to have had a positive effect on the quality of care in these facilities [14]. On request of these facilities three QI's were added to the list: ' Anti-depressant use prevalence' , 'Influenza vaccination prevalence' and 'Depression prevalence' These QIs were borrowed from the Home Care Quality Indicators set [15] and are included in this study. Table 1 gives the list of the LTCFQI's, their name, numerator, denominator exclusions, and risk adjusters.
QI scores are derived from the individual item scores of the interRAI LTCF assessment as indicated in Table 1. They are calculated for the individual person (yes/no/not applicable) and summed per facility or any higher level of aggregation as a numerator/denominator ratio or percentage.
The CMS commissioned study identified risk factors, not necessarily under the control of the facility, that affect the prevalence of some quality indicators [5]. These risk factors are at the level of the individual resident and include differences in various characteristics of the resident, occasionally expressed as a scale value: activities of daily living (ADL) ability, measured by the ADL-long form scale -(ADL-lf ) [16]. cognitive function, measured by the Cognitive Performance Scale (CPS) [17]. depressed mood, measured by the Depression Rate Scale (DRS) [18].
The QI's are case-mixed corrected by logistical regression analysis, with presence or absence of the QI

Design
The SHELTER study has a longitudinal design. Data of residents were collected at baseline, at 6 and at 12 months follow-up.

Population
The SHELTER data sample consists of Long Term Care Facility residents from 7 European countries, plus Israel [9]. In total 59 private and public LTCFs participated in the study. The number of participating facilities varied widely across the participating countries: 10 in the Czech Republic, 4 in Finland, 6 in France, 9 in Germany, 10 in Italy, 7 in Israel, 4 in the Netherlands, and 9 in England. The aim was to recruit on average 500 residents per country. At baseline data were collected from 4156 residents, at 6 months follow-up data  Table 1), were to be calculated from the last available assessment of a resident which had been in the facility at that time for 30+ days. The 15 incidence indicators (see Table 1) were to be calculated from the difference between an assessment and the previous assessment of that resident if available.

Data collection
Assessments were conducted by trained nurses, most of whom worked at the facilities included in the SHELTER study. All were trained in the use of the interRAI-LTCF by experienced trainers in a standardised two day training programme. All trainees received a interRAI-LTCF manual, had access to a Clinical Assessment Protocol manual, and additional training material. On the first day of training the trainees were given an explanation on the interRAI-LTCF and completed a case example from their case load. In the days after they completed an assessment on one or two actual care residents in their facility. On the second day of training those assessments were extensively discussed.

Analysis
We calculated the LTCFQIs (yes/no/not applicable) for all individuals in the SHELTER dataset [13]. We then entered the risk factors (dependent variable) derived from the CMS study [5] in a stepwise logistic regression analysis for each of the QIs (independent variable) and calculated the Odds Ratios for the risk factors in the SHELTER sample. Since assessment data from the SHELTER study were not necessarily from an initial assessment at admission of the individual to a long term care facility we had no explicit intake data for most of the SHELTER residents. For most residents therefore Facility Admission Profile values were not available.
We compared the unadjusted and risk adjusted individual LTCFQI values by country. Zimmerman showed that for most QIs aggregated facility scores below the 10th percentile scores (indicating 'better' care) and above 75th and 90th (indicating 'worse' care) are useful for indicating potentially excellent and sub-standard quality of care facilities [3].
We then constructed an aggregate QI measure for each country by assigning a score of 1 to every 75th percentile score or above and an extra 1 to every 90th percentile score or above. We used this aggregate score to compare the overall level of deficiency in quality of care of the 59 facilities and the 8 countries in SHELTER. A country or facility will only be compared with other countries or facilities on a LTCFQI, if the actual number of cases with a positive score on the QI or the predicted number of cases with a positive score is 5 or more. A ranking of the countries or facilities is possible by dividing the aggregate QI measure by the number of LTCQIs for which a score was calculated.
Analyses were conducted using SPSS and Microsoft Excel, see Appendix.
Research ethics approval for the SHELTER study was received for all participating countries and specifically from the following ethics committees: Residents were invited to take part in the study and were free to decline participation. Consent was obtained with assurance of data confidentiality.

Results
The adjusted LTCFQIs by country are shown in Table 2. For each LTCFQI, the best care (lowest LTCFQI score) country is shown light (green) and the worst care (highest LTCFQI score) country is shown dark (red). There is wide variation in the results of some of the LTCFQIs. For example, the LTCFQI "High Risk Behaviour problem prevalence" shows a range from 22% (France) to 61% (England) and the LTCFQI "Physical restraint use prevalence" a range from 1% (England) to 32% (Israel).
Additional file 1: Table S1 shows the aggregate LTCFQI measure for each country. The Czech Republic and Israel have by far the lowest summary scores of 5 and 6 ('best' quality of care), England the highest score of 33 ('worst' quality of care). The aggregate scores of the other countries are in between.
The LTCFQI scores of the 59 facilities can be calculated and presented in the same way. The outcomes for facilities, even in one country, show large differences. For example, the four facilities in Finland that participated in the SHELTER study produced the following results:

Discussion
In this paper we have explained and discussed the methods for calculating Long Term Care Facility Quality Indicators (LTCFQIs). We used the method to calculate the values of 39 such indicators on assessments from 59 LTC facilities in 7 European countries plus Israel that participated in the Services and Health for Elderly in Long TERm Care (SHELTER) project [13]. Even if the calculations are appropriate, see discussion below, the results of the measurements cannot be considered representative for the eight countries. The recruitment in the SHELTER project was for cost reasons not aimed at representative samples for each country, only a limited number of facilities in each of the countries therefore have participated. The results of individual facilities in a country can be quite different, as shown, as an example, for the four facilities from Finland. Our intention with this paper is to show that professional quality of care can be calculated and compared between locations. To obtain valid comparisons between countries much larger samples of facilities per country need to be included. It is possible, however, that some of the outcomes per country will persist when samples are larger, as is predicted by the experts of LTCF that co-authored this paper.
For the moment, the differences between individual facilities, within and across countries, are probably much more real. If so, what does that imply for an individual facility? This study enables a benchmark on quality of care. Feedback of the results can be used effectively to improve the quality of care in a facility, as has been shown by Boorsma et al. [14] in a RCT study. To achieve this, demands considerable efforts: continued collection of good data, training, routine use of the assessment outcomes in care planning, continued interest by the management of the facilities. The work environment of a Long Term Care Facility is complex with its 24 × 7 hours of care, work shifts, various disciplines with their own manners, interaction with family and volunteers, complex and emotional care demands, budget restraints. Besides, the real effort to improve quality of care needs to be made on the level of the wards, each with its own more or less independent management, work relations, manners and specific groups of residents. Most LTCFQIs, however, cannot be reliably calculated for wards, because of limited nominator and denominator numbers. Additional analysis by management of the facility, therefore, is required to identify problems with quality of care on particular wards.
The validity of the measurements of the LTCFQIs depends first of all on the accuracy of the assessment. In our study the assessors were adequately trained at the start, so that we may assume that the baseline assessment was fine and comparable between facilities in all of the countries [13]. At the time of the second assessment, approximately 6 months later, and the third at twelve months, knowledge may have decreased or increased. Facilities where assessments were completed by nurses that worked in these facilities, may have employed new nurses that possibly have not received the same level of assessment training. Facilities where the assessments have been used in routine care planning, as intended, and where computer output was distributed timely and where the results have been thoroughly discussed, nurses and their care teams have become more experienced and better at recognizing issues that are assessed with the interRAI LTCF instrument. In other facilities, where nothing or less than was intended has been done with the results of the assessment, the opposite can be true. It is even possible that in the latter facilities the second and third assessment have largely been copied from printouts of the baseline assessment. We suspect that some of the incidence LTCFQIs outcomes may have suffered from this. Continued good quality of the assessments needs firm external and internal incentives, as has been shown in the US and Canada where interRAI instruments now have been used on a large scale for some decades [19].
The calculation of QIs presumes that the facilities and their resident population are to a large extent comparable, i.e. in level of care, kind of care, kind of residents, focus of care. This has not necessarily been so in the SHELTER study. The person's living arrangement at the time of referral varied. In France, 45 percent of the residents came from a rehabilitation hospital. In England and the Czech Republic, most of the residents came from an acute hospital. In Italy, Israel and Germany the majority came from home. The number of residents in various Resource Utilization Groups, and length of stay [not presented in this paper], show that in some facilities (e.g., in Israel) most residents received extensive rehab, or stayed comparatively short (e.g., Czech Republic). Some of this is remedied by the risk adjustment, but likely not all.
In the calculation of the aggregate LTCFQI measure each QI has the same weight and the 75-th and 90-th percentile thresholds are equally important for each QI. These assumptions are obviously flawed. For some quality of care issues a number of QIs are available and for others maybe none. It is known to the authors that efforts are being made by interRAI to develop a more balanced aggregate measure on a subset of the QIs, including adding new QIs to the existing set that look at decline and improvement on particular issues simultaneously. Furthermore, the 75-th and 90-th percentiles are often useful, although not equally meaningful for each QI [3], but this has not been verified for all of the QIs, presented in this paper.
A further remark on the calculation of the QIs concerns the frequency of the assessment and the Facility Admission Profile risk adjustment. In the original research for the development of the QIs [5,6], residents were assessed at least every 3 months. In SHELTER, follow-up assessments were scheduled at 6 and 12 months. This affects the results of the incidence QIs, but probably not a large extent. The FAP values, however, which are used for risk adjustment, could in this study only be calculated when a resident was assessed at entry to the facility at baseline. This was rarely so, since the facilities were selected that had long-stay residents, which by definition have few new entries in a short period of time. And even so, risk adjustment through FAP values is not perfect.
Quality of life as experienced by the resident is as important as the calculation of professional quality of care and should, when possible, be measured and analyzed additionally to evaluate services delivered and prepare for enhanced services.

Conclusion
To conclude, the LTCQIs are useful measures of professional quality of care in Long Term Care facilities. The indicators are based on the worldwide used interRAI LTCF instrument. When the conditions for measurement are met the indicators appropriately measure quality of care. We have shown how the facilities in the SHELTER study, and to a lesser extent the countries, can be compared with each other with the LTCFQIs.