In this study, four different strategies for constructing overall scores were assessed, and their characteristics compared to a global rating of quality of care.
With regard to our first research question, correlations between individual quality indicators and each of the overall scores proved to be considerable, in contrast to their rather weak associations with the global rating. This means that the specific patient experiences are better reflected by the overall scores than by a global rating. Overall scores therefore turn out to be a more valid way of summarizing the survey data than a global rating. It should however be noted that overall scores consist only of the scores actually reported by patients in the survey, whereas a global rating can be based on anything, including for instance on aspects of healthcare not mentioned in the survey. It is important to keep this in mind. For the association between overall scores and the global rating, correlations proved to be about 0.7. This is considerable, but nevertheless it is safe to state that a single question about the overall quality (i.e. global rating) does not necessarily produce the same result as an overall score calculated from validated quality indicators.
The overall scores showed considerable discriminatory power, even more so than the global rating. As a result, the overall scores enable more rigorous differentiation of providers, which is an important finding for future quality assessment of healthcare providers. In line with earlier research, the discriminatory power of the overall scores also decreases the number of responses required to obtain reliable scores, compared to individual indicators [36–38]. The same applies, although to a lesser extent, to the global rating.
We found profound differences between rankings based on the overall scores we constructed and the ranking based on the global rating. A large part of these seemingly substantial differences in ranking were due to clustering of scores, in which case a negligible difference in score may yield a huge difference in ranking. However, we also illustrated that for some of the providers, the global rating yielded a substantially different result compared to the overall scores, suggesting that for these providers it does matter whether they are classified based on a global rating or an overall score.
The effort required to construct meaningful overall scores as an alternative to global ratings does not seem to be in vain; their advantages over using a global rating are clear. But which strategy for constructing an overall score should be preferred? In the past, many stakeholders have suggested the use of the Average Rating Overall Score as a way of summarizing the performance of healthcare providers, because the star ratings per quality indicator are already in place for reporting on CQ-index data. Even though it shows promising results (Table 3), the nature of this overall score construct severely limits the requisite statistical analyses if it is to be compared with other overall scores. The other overall scores are constructed by calculating an average over all indicators for each individual which is then aggregated to a provider mean, i.e. these overall scores are an average of scores of individual respondents. In contrast, the average star rating is essentially an average of the provider scores for each indicator. We believe the latter strategy to be unfavourable, because conventional statistical parameters such as ICCs cannot be calculated. In addition, the interpretation of standard errors and confidence intervals will be different as these no longer depend on the number of individuals per provider, but on the number of indicators being measured.
When the three remaining strategies are compared, they seem to yield statistically similar results. The differences between providers are comparable (according to the calculated ICC’s) and there are similar and substantial correlations with the individual indicators. Choosing the ‘best’ strategy from these three overall scores does not seem to depend on either validity or discriminatory power and so may be allowed to be guided by practical considerations.
In this context, it is also valuable if the overall scores are easy to understand and to use for all stakeholders involved. The Average Overall Score strategy is the most straightforward to understand: it consists of merely averaging the scores of all the quality indicator scores. The other overall scores, however, require quite a distinct level of statistical literacy and need explanation. From this point of view, they are not to be preferred over the Average Overall Score. Therefore, the sound statistical basis plus above all the practical arguments make the simple Average Overall Score the best choice.
Strengths and limitations
It is important to note that there is no ‘gold standard’ available for the measurement of patient experiences. Apart from the method used in our research, there are other possible ways of measuring a global rating. For instance using different wordings or a different scale. We cannot rule out the possibility that other methods concerning a global rating of care may lead to different outcomes. However, the way the global rating was measured in this research is the most commonly used strategy in patient surveys in the USA (CAHPS) and the Netherlands (CQ-index) [9, 19].
Many stakeholders favour a global rating as a way of summarizing the patients’ opinions on health care, for its simplicity. However, patient experience surveys mean to cover all aspects of health care relevant to patients, health care providers and other stakeholders and it has been shown that not all of these aspects are represented by a global rating . Since the present paper demonstrates that an overall score constructed from patient experiences represents the underlying health care aspects better than a global rating, an overall score seems at least as valid in summarizing patient experiences as the global rating, if not more.
We thoroughly investigated the properties of four possible overall score constructs using a large dataset containing patient experiences of a quarter of all Dutch nursing homes. As a result, our findings should be fairly representative for the Dutch setting of nursing homes. Also, our data contained a large number of quality indicators, allowing us to assess the validity of the overall scores on many different aspects of healthcare.
The construction of quality indicators does involve a risk regarding nonresponse, however. Structural nonresponse on items with a notably high or low average score may influence quality indicator scores. If nonresponse differs between institutions, it may lead to unjustified differences on that particular quality indicator. The same goes, to a lesser extent, for the construction of the overall scores; we allowed for a maximum of three missing quality indicator scores at patient level. However, our stringent approach with regard to missing values on quality indicators made selective missing values on the overall scores at provider level highly unlikely. If this would indeed be the case, less missing values per quality indicator or overall score should be allowed.
It is possible that the analysis of different survey data would yield different results. In other words, the specific properties of these overall score constructs have yet to be established for other patient surveys. Although differences between most of the constructs proved to be limited in our research, this may not be the case for other datasets, as is also shown in other studies [16, 37, 39, 40]. Also, there are a number of strategies for calculating overall scores that we have not included in this research. Well-known examples are ‘all-or-none’ (providers score a ‘1’ if they meet a certain quality criterion and a ‘0’ if they do not, after which all quality scores are summed) and the ‘percentage of success’ (percentage of quality criteria met) [39, 41]. But as the indicator scores from the current data can be considered as continuous variables, these and many other strategies were not applicable. However, we concede that there are other applicable construction methods that could have been considered for this study.
Based on our results, we would recommend the use of an overall score as a more valid and reliable alternative to the global rating in summarizing patient survey results. However, a few practical issues should be considered in using overall scores.
Firstly, it is important to bear in mind that constructing overall scores will inevitably lead to a certain amount of data reduction, thus obscuring details and maybe even differences between organizations from the original data. Overall scores oversimplify results and are only useful for rough comparisons [37, 42]. In our opinion, overall scores should not be presented as a substitute for individual indicator scores, but rather as a useful addition to survey results to provide a quick overview. For a more detailed picture, stakeholders may subsequently inspect the individual indicator scores; these show where specific differences between providers occur and which processes actually need improvement. This is also important in the case of individual indicators that do not seem to be reflected by an overall score.
Secondly, careless and uninformed use of (overall) scores may have serious consequences for healthcare organizations or individual healthcare providers, if used for quality ranking . Finally, stakeholders may prefer one method of constructing overall scores over the others, based on their aims . It is even possible to combine different constructs. Although this is theoretically interesting, such complex constructs will make it more difficult for stakeholders to understand and interpret the overall scores.
In the end, constructing overall scores remains a great challenge, which needs to be handled with care [16, 36]. If the matters above are addressed, though, a well-defined overall score may present all stakeholders with a valid and reliable overall view of quality of care from the patients’ perspective.