Harvesting the wisdom of the crowd: using online ratings to explore care experiences in regions

Background Regional population health management (PHM) initiatives need an understanding of regional patient experiences to improve their services. Websites that gather patient ratings have become common and could be a helpful tool in this effort. Therefore, this study explores whether unsolicited online ratings can provide insight into (differences in) patient’s experiences at a (regional) population level. Methods Unsolicited online ratings from the Dutch website Zorgkaart Nederland (year = 2008–2017) were used. Patients rated their care providers on six dimensions from 1 to 10 and these ratings were geographically aggregated based on nine PHM regions. Distributions were explored between regions. Multilevel analyses per provider category, which produced Intraclass Correlation Coefficients (ICC), were performed to determine clustering of ratings of providers located within regions. If ratings were clustered, then this would indicate that differences found between regions could be attributed to regional characteristics (e.g. demographics or regional policy). Results In the nine regions, 70,889 ratings covering 4100 care providers were available. Overall, average regional scores (range = 8.3–8.6) showed significant albeit small differences. Multilevel analyses indicated little clustering between unsolicited provider ratings within regions, as the regional level ICCs were low (ICC pioneer site < 0.01). At the provider level, all ICCs were above 0.11, which showed that ratings were clustered. Conclusions Unsolicited online provider-based ratings are able to discern (small) differences between regions, similar to solicited data. However, these differences could not be attributed to the regional level, making unsolicited ratings not useful for overall regional policy evaluations. At the provider level, ratings can be used by regions to identify under-performing providers within their regions.


Background
Regional Population health Management (PHM) initiatives are challenged to evaluate regional patient experiences to improve their health and social services. These initiatives have been increasingly widening their focus from individuals to populations [1,2] to deal with the changing care demand. Their intent is often to achieve the Triple Aim; i.e. simultaneously improve population health and the experienced quality of care, while reducing costs [3]. This, combined with a more general focus in care on how patients' experience care [4,5], makes it essential for PHM initiatives to evaluate regional patient experiences.
Many reforms struggle with evaluating population level experienced quality of care to assess their regional policies, resulting in a variety of methods used [6,7]. Currently, solicited surveys are dominant, with well-known examples such as the Hospital Consumer Assessment of Healthcare Providers and systems (HCAHPS) and the NHS Inpatient Survey [8,9]. Notwithstanding the value of solicited surveys, they often have substantial downsides such as the significant time lag between measurement and publication, low response rates and high costs to deploy [10,11]. A potential other source might be found in websites that gather unsolicited online ratings. Analogue to other sectors, where consumers are getting used to voicing their experiences online using for example Yelp.com [12] and other social networks such as Twitter and Facebook [10], patients are increasingly sharing their experiences on the internet [13,14]. Patients can often rate their care experiences on general websites like Yelp or Facebook, as well as specialized websites such as RateMDs and HealthGrades.
Unsolicited online patient ratings have shown promise for creating insight into experienced quality of care at the provider level. At this level, they seem to be especially useful as an additional perspective or source of information complementing solicited surveys [11,15,16] or as a more real-time alternative [17]. In the Netherlands, the most widely used patient rating website is ZorgkaartNederland.nl (Dutch Care Map, ZKN) [18], which is run by the Dutch Patient Federation (DPF). Since 2007, over 500,000 experiences with different individual care providers, hospitals and other care institution were shared by patients on the ZKN website. Their experiences might prove valuable for policy makers as it could compile close to real-time information regarding progress on one of the pillars of the Triple Aim and might be of value for comparisons between regions. However, the extent to which these provider level online ratings can be used to create overall insight for (regional) population level policies is unclear. If combined, they could be used to measure overall regional quality of care and aid regional policy evaluations in a relatively simple and cost-effective manner. Therefore, this study aimed to explore whether unsolicited online provider ratings can be used to create insight into differences in patient experiences between as well as within regions. To asses this, the structure and regional coherence of online unsolicited ratings will be studied. Additionally, a comparison will be made between the results of unsolicited ratings and the dominant method, solicited surveys, in nine regions to explore their differences.

Study population
In 2013 the Dutch minister of Health appointed nine regions as pioneer sites because of their goal to implement regional policies according to the Triple Aim [19]. These pioneer sites are demarcated geographical areas in which different organizations work together to achieve this goal [20]. Each site has their own approach with different organizations involved, such as hospitals, municipalities or insurance companies. They are monitored by the National Institute for Public Health and the Environment in the so-called National Monitor Population management (NMP) and are spread out across the Netherlands. Overall, about 2 million people live in these regions, but the size as well as the characteristics of the population in each region varies ( Table 3 in Appendix 1).

Data sources
Two data sources were used in this study; the primary focus were the unsolicited online patient ratings provided by the DPF, while the solicited survey data provided by the NMP was used predominantly for comparative reasons.
The unsolicited online patient ratings were derived from the www.ZorgkaartNederland.nl (ZKN) website, which was made available by the DPF. On this website, patients can both give and see reviews. To add a review, patients first select a care provider, which can be a care professional like a specific GP or specialist, or an organization such as a hospital (department) or nursing home. Six ratings have to be given, ranging from 1 to 10, covering six quality of care dimensions. The dimensions differ depending on the category of provider that is selected (i.e. for hospital care they are appointments, accommodation, employees, listening, information and treatment). Additionally, there is a textbox where patients can explain their ratings and add other relevant comments as well as the condition they were treated for.
No further personal information about the respondent is requested, but a timestamp and email address is registered. The ZKN staff checks each submission for repeated entries, integrality and anomalies, and gives each one an identifier.
The solicited survey data used was provided by the NMP (Ethical Review Board number: EC-2014.39) [21]. In nine pioneer sites, a random sample of 600 insured adults per pioneer site (total = 5400) were invited to fill out the survey between December of 2014 and January of 2015 [21]. This yielded 2491 filled-out surveys (response rate 46.1%), around 300 per pioneer site. The average age was 55.7 years old and more than a quarter was highly educated ( Table 3 in Appendix 1). The solicited survey population has previously been described in more detail [22]. For this study only the following question was used: "On a scale from 0 to 10, where 0 is the worst possible care and 10 is the best possible care, which grade would you give the total care you received in the past 12 months?" Thus, in this dataset only ratings given for overall care experiences were available.

Ratings
In addition to ratings given through the website, the DPF actively gathers ratings by visiting care providers. These visits predominantly focus on residents of nursing homes and each is logged using a unique identifier. To distinguish between ratings given unsolicited and those gathered by the DPF, the number of occurrences of the unique identifier was checked. Identifiers that showed up more than 10 times were labeled as solicited, while others were labeled as unsolicited. A mean rating was calculated for each entry by averaging the six ratings provided. This combination was shown to provide a good summary of an entry [23]. Ratings and providers were clustered at the regional level using the nine pioneer sites' zip codes [24]. Additionally, a second set of regions was created for the below described sensitivity analyses. These regions were identified using zip codes based on nine regional initiatives that were not included in the NMP [25]. Furthermore, providers in the Zorgkaart data were grouped into the following categories: hospital care, nursing homes, general physicians (GP), insurer, birth care, pharmacy, physiotherapy, youth care, dental care and 'others'. In the survey data, the only alteration made was the combination of the 0 and 1 ratings to create scales that both have 10-points, running from 1 to 10. These ratings were already grouped by pioneer site.

Analyses
First, descriptive statistics were extracted for both the unsolicited online ratings and solicited survey data to explore the structures of both data sets. Rating frequencies and means were determined per pioneer site overall and for the online data these were also stratified by the largest provider categories; hospital, GPs, dental care and nursing home. To compare the two datasets, means, using independent t-test, and distributions were studied. Additionally, with each dataset an Analysis of Variance (ANOVA) was performed to test for differences between pioneer sites and Spearman's rho was determined to look at the correlation of mean scores based on both the unsolicited online ratings and solicited survey data.
Second, multilevel analyses were performed using the unsolicited online ratings based on three levels; 1) rating, 2) provider and 3) pioneer site, in order to gain insight in the regional clustering of ratings. If ratings were clustered, then this would indicate that found mean differences between regions could be attributed to characteristics of the regions. The year a rating was given was added as a fixed variable to adjust for changes over time, such as the introduction of population health policies. Using the three levels and the ratings as dependent variables, the model was run to determine intraclass correlation coefficients (ICC, range = 0-1). The ICC is a measure of similarity between values from the same group and provides insight into the clustering of, in this case, unsolicited ratings in regions and in providers. At which level this clustering is meaningful is assessed based on ratio of the between person (i.e. provider) variance and within person variance [26]. To have interpretable results, levels within a multilevel analyses have to be interpretable and similar, therefore models were tested per provider category.
Finally, a sensitivity analysis was added to assure that found results in the multilevel analyses were not due to region selection and would be comparable with other than the nine selected regions. Multilevel analyses were repeated with alternative regions for this purpose. SPSS 22 (SPSS Inc., Chicago, Illinois) and R Studio Version 0.99.441 for Windows (RStudio, Boston, Massachusetts) were used to perform the analyses described below. A p-value below 0.05 was considered significant in all analyses.

Ratings
The Zorgkaart database provided 449,263 unsolicited ratings, given by 208,047 unique identifiers. Of these unsolicited ratings, 70,800 were given by 31,260 identifiers to providers in the pioneer sites ( Table 2 in Appendix 1). Of the 25,616 care providers that received at least a single rating in Zorgkaart, 4100 were located in one of the nine pioneer sites ( Table 2 in Appendix 1). The total number of ratings varied strongly between pioneer sites, from n = 1451 (region Vitaal Vechtdal) to n = 17,953 (region SmZ). However, the population size of the pioneer sites also varied substantially, from 106,270 in GoedLeven to 646,910 in Friesland Voorop. If expressed in percentages against population size the number ratings varied from 1.3% (region Vitaal Vechtdal) to 3.7% (region PELGRIM). When further classified by category of care provider, it is shown that dental care, GP care, hospital care, nursing homes and physiotherapy had a substantial number of ratings available per pioneer site. Tables showing the distribution of ratings can be seen in Appendix 1.
Overall mean scores illustrated that when combining all ratings in a region, differences between regions were significant but small (ANOVA p < 0.001). As the limited range of Fig. 1 illustrates, mean unsolicited online ratings of pioneer sites were around 8.5 for each dimension as well as the mean overall scores. When ratings are broken down by care provider category, different patterns emerged. Different pioneer sites stand out, either positively or negatively, in different provider categories (Appendix 2).
When compared to solicited survey ratings, unsolicited online ratings were generally higher ( Table 4 in Appendix 1). When individual regions were compared on unsolicited online and solicited survey ratings, all differed significantly in their means (p < 0.05). The dispersion of both unsolicited online and solicited survey ratings were both skewed towards the positive, but online ratings slightly more so. For unsolicited online ratings, 8 and 10 were the most dominant ratings, while solicited survey ratings peaked at 7 and 8 out of 10. Comparing mean scores of pioneer sites based on each dataset showed an insignificant correlation (Spearman's rho = 0.42, p = 0.26). Performing relatively well in one dataset, did not mean a region would perform well in the other or vice versa.

Clustering
Multilevel analyses were only performed for dental care, GPs and nursing homes. These categories had both substantial numbers of ratings as well as rated care providers ( Table 2 in Appendix 1). Hospital care had sufficient ratings and individual providers as well, but was excluded as these were mostly given to one or two locations (hospitals) in a region.
The ICCs region from all categories were close to zero indicating there was little clustering between provider ratings within the same site ( Table 1). The ICC providers was substantially larger, which indicated that some variance was explained by actual differences between providers. When replacing the third level, pioneer sites, in the sensitivity analysis with the alternate PHM regions, these proportions did not change.

Discussion
This study explored whether unsolicited online provider based ratings can provide insight into patient experiences at a (regional) population level. The overall mean scores as well mean scores stratified by provider category (e.g. hospital (department) and GP practices) differed significantly between pioneer sites. However, most differences were small as they often only varied a few tenths on a 10-point scale. This in itself is not an issue, as care experiences might be comparable between each region, as similar small differences were found  in solicited survey. Unsolicited ratings did overall score higher. Multilevel analyses conducted using the unsolicited online ratings among GPs, dental care and nursing homes at the pioneer site level indicated there was little clustering of ratings between providers in the same region. This makes it difficult to attribute any found variation to regional (i.e. population) level differences. The provider level did show a meaningful clustering of ratings, suggesting differences could be explained by provider variation. Unsolicited online ratings cannot be used to gain insight in differences between regions for now. There appeared to be little clustering of experienced quality of care between providers in the same region. This lack of regional grouping of experienced quality can also be seen using other measures [27]. Currently, this limits the use of unsolicited online ratings as an evaluation tool for the regional level. Even though the goal was not to evaluate any specific policy, PHM initiatives in the Netherlands have only started implementing regional collaborations five years ago and many are still in the start-up stage. Regional policies have shown the potential to impact quality of care [28], but in the Netherlands, initiatives might require more time to have an impact.
When looking at ratings separated by the individual dimensions, notable patterns emerged. For example, Vitaal Vechtdal was rated the highest overall, but had a substantial dip in the accommodation dimension and was rated lowest in the dental care category. Similar interesting patterns could be seen in other pioneer sites and this, keeping the low ICCs in mind, could be insightful for both policy makers and providers. Furthermore, the dispersion in ratings between providers was substantial. This illustrates the variation of experienced quality of care of providers within a region; there are providers that are performing better than other providers. This is in line with previous studies, which showed that care providers in the same region differ in the care experience they deliver [27]. To be able to identify variations in providers is useful for regional policymakers, as it illustrates there is room for poorer scoring providers to be identified and improved. Ideally, by stimulating integration and cooperation within healthcare, overall experienced quality of care should improve and ratings should be more geographically coherent.
This study is the first to explore the use of unsolicited online care provider ratings at the regional level. Results are not yet consistent [16],but several studies show the potential correlation between ratings and hospital readmissions as well as other objective quality of care measures [15,29]. Furthermore, they can be used by care inspection agencies as an additional input source [18] and have shown to impact real world behavior of consumers [30]. However, online ratings and the used Zorgkaart data in particular have limitations that have to be considered. Patients can give more than one rating, making them not completely independent. However, correcting for this using a cross-classified multilevel model, which is not preferred as it is very skewed as most patients give one rating, did not show any different results. Additionally, Zorgkaarts' unsolicited ratings are given to providers and adjustments have to be made to be able to evaluate at population level. A more direct measure of general population level experienced quality of care would be preferable. Furthermore, the present number of ratings were insufficient for in-depth analyses for many provider categories in this study (e.g. insurance companies and disabled care). Additionally, several providers had only a few ratings available for the multilevel analysis. The Zorgkaart data showed that the frequency at which ratings are submitted has been increasing rapidly over the years and it is therefore expected that the low numbers issue will be solved over time. Zorgkaart, as do most rating sites, also has limited participant information for privacy reasons. This means it is impossible to correct for selection bias, while a younger, more tech-savvy population tends to provide ratings [31]. Finally, there was limited opportunity to connect the unsolicited dataset to solicited dataset. The solicited dataset did not target specific quality dimensions like Zorgkaart, which prevented a comparison or conclusions at this level. For regions, dimension specific information and comparisons could be useful.
For regional population evaluations, online ratings could be improved. First, it is worth performing a follow-up study in a few years to determine if the same conclusions can be drawn. By this time, more ratings are available and the initiatives will have had more time to form their interventions and have an impact. Second, an algorithm could be created that highlights poor performing or declining providers within a certain area for policy makers to faster identify and aid underperforming providers. Third, text comments accompanying ratings can provide an additional source of information [32] and could provide more detail for policy makers as well as providers [33]. Finally, to truly evaluate regional policies that go beyond healthcare, broader measures are required that cover preventive, well-being and social services. The current ratings are generally only focused on the quality of healthcare services. Expansion of current or the creation of new instruments would be needed to drastically improve their use for current regional health policies that go beyond clinical care.

Conclusions
The aim of this study was to assess the ability of unsolicited online ratings to provide insight into regional experienced quality of care. Currently, they have limited use for regional evaluations, because even the small differences found could not be attributed to regional characteristics. Providers did show meaningful clustering of ratings, highlighting the ability to identify under-performing providers and opportunities for regional policy.

Availability of data and materials
The Zorgkaart.nl data that supports the findings of this study are available from the Dutch Patient Federation but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the Dutch Patient Federation.
The National Population Health Monitor dataset analysed during the current study are not publicly available to protect the privacy of the participants but are available from the authors upon reasonable request and with permission of the National Institute for Public Health.

Authors' contributions
Concept and design: RH, HD, MS, JS, DR and CB; acquisition, analysis, or interpretation of data: RH, HD, MS, DR and CB; drafting of the manuscript: RH; critical revision of the manuscript for important intellectual content: all authors; study supervision: HD, MS, DR and CB. All authors read and approved the final manuscript.
Ethics approval and consent to participate The Medical Research Involving Human Subjects Act (WMO) does not apply to this study, and official approval was not required [34]. Participants agreed to the terms of service of Zorgkaart Nederland, which states that their submission can be used anonymously for research purposes [35].

Consent for publication
Not applicable.