A Rasch analysis of the Person-Centred Climate Questionnaire – staff version

Background Person-centred care is the bedrock of modern dementia services, yet the evidence-base to support its implementation is not firmly established. Research is hindered by a need for more robust measurement instruments. The 14-item Person-Centred Climate Questionnaire - Staff version (PCQ-S) is one of the most established scales and has promising measurement properties. However, its construction under classical test theory methods leaves question marks over its rigour and the need for evaluation under more modern testing procedures. Methods The PCQ-S was self-completed by nurses and other care staff working across nursing homes in 35 Swedish municipalities in 2013/14. A Rasch analysis was undertaken in RUMM2030 using a partial credit model suited to the Likert-type items. Three subscales of the PCQ-S were evaluated against common thresholds for overall fit to the Rasch model; ordering of category thresholds; unidimensionality; local dependency; targeting; and Differential Item Functioning. Three subscales were evaluated separately as unidimensional models and then combined as subtests into a single measure. Due to large number of respondents (n = 4381), two random sub-samples were drawn, with a satisfactory model established in the first (‘evaluation’) and confirmed in the second (‘validation’). Final item locations and a table converting raw scores to Rasch-transformed values were created using the full sample. Results All three subscales had disordered thresholds for some items, which were resolved by collapsing categories. The three subscales fit the assumptions of the Rasch model after the removal of two items, except for subscale 3, where there was evidence of local dependence between two items. By forming subtests, the 3 subscales were combined into a single Rasch model which had satisfactory fit statistics. The Rasch form of the instrument (PCQ-S-R) had an adequate but modest Person Separation Index (< 0.80) and some evidence of mistargeting due to a low number of ‘difficult-to-endorse’ items. Conclusions The PCQ-S-R has 12 items and can be used as a unidimensional scale with interval level properties, using the nomogram presented within this paper. The scale is reliable but has some inefficiencies due to too few high-end thresholds inhibiting discrimination amongst populations who already perceive that person-centred care is very good in their environment.


Background
Person-centredness is internationally regarded as an essential design principle underpinning modern dementia care services. Although tracing its roots to Rogerian psychotherapy [1], the seminal works of Tom Kitwood [2] are seen as defining the starting-point of person-centred dementia care, which seeks to address how perceptions of dementia can detrimentally affect a person's standing in relation to those around them [3]. Gerontological nursing has since produced a healthy stock of related conceptual advances, particularly in developing enabling care relationships [4][5][6]. Each share social constructionist perspectives of ageing [7,8] and are concerned with how identity, personhood and agency can be reinforced or compromised depending on the nature of the care environment [9]. Although there is no universally agreed definition, most articulations describe care based on a holistic understanding of a person's lived experiences; providing a care environment congruent with their preferences and that encourages expression of self; promoting their place as valued members of social relationships and networks; and tailoring support to the individual [3,5,10].
The rise of person-centredness in dementia care finds support in an encouraging, if not definitive, evidencebase. Experimental designs have associated personcentred approaches with reduced behavioural symptoms (particularly agitation) and use of neuroleptics with care home residents [11][12][13]; with observational and qualitative studies also having linked person-centredness with improved wellbeing and physical health outcomes [14], and benefits for care workers [15]. However, evidence is inconsistent, and there is a dearth of research exploring the importance of how person-centredness is best implemented. A lack of high quality measurement instruments has been widely highlighted as one impediment to rigorous research [16,17]. The most established measures, particularly Dementia Care Mapping [18], demands intensive recording of care interactions by specially-trained observers which is beyond the resources of many services and research groups. The need for robust questionnairebased instruments has been highlighted [17].

The Person-Centred Climate Questionnaire
The Person-Centred Climate Questionnaire is one of the most well-documented and widely tested scales available for evaluating the person-centred quality of the care environment within institutional settings [17,19]. It is based on an empirically-developed theory of how supportive care environments can protect personhood in the context of cognitive decline and beyond [19], and comprises 14 statements for respondents to assess their agreement with on a 6-point Likert scale (1 = No, I disagree completely, to 6 = Yes, I agree completely). The staff version (PCQ-S) is a proxy report version for use in care settings where an expected high prevalence of cognitive impairment/ dementia would inhibit self-report responses. Its 14 items are organized into three subscales -see Table 1 spanning safety, homeliness and community. The original PCQ-S was developed in Swedish, but empirically-tested English, Norwegian, Slovenian, Chinese and Korean versions have been published [20][21][22][23]. A patient-completed version (PCQ-P) is also available, although that is not the subject of the present article [24].
The strengths of the PCQ-S include its encouraging measurement properties, established across an array of empirical studies. Content validity has been supported by expert agreement methods [25] whilst repeated factor analytic studies in independent populations have supported a reasonably stable three factor structure [26]. Cronbach alpha for the subscales, and for the global score, are consistently estimated above 0.80even in other languages it has been translated into. Specific studies of reliability and cut-scores have been undertaken, providing greater utility for application in practice [27]. The PCQ-S is firmly established as a regular research tool in psychosocial studies of dementia care and its implications for patient welfare and staff satisfaction [28][29][30].
However, the PCQ-S was developed under classical test theory (CTT) which has been subject to increasing criticism as the appropriate framework for measurement instruments [31]. Four common limitations of CTT are highlighted here. First, the assumption that ordinal Likert-type items can be summed to form a measure with interval-level properties remains to be proven [32]. A significant leap of faith is necessary for the score of one point on an ordinally-constructed instrument to be assumed truly equal across the entire length of the scale. If this assumption is breached, parametric tests are not supported and even simple mathematical operations (e.g. mean scores) are then inappropriate [32,33]. Second, reliability estimates are commonly thought to be artificially inflated in the presence of locally dependent items; whereby the pursuit of high Cronbach alpha statistics causes near-equivalent items to be combined as though they were statistically independent [34]. Third, error is assumed to be constant across the distribution of measurement whereas it is likely to vary (and so be lesssuited, or demand larger samples, for some studies targeted at either end of the continuum). Finally, the procedures and justification for combining subscales into a single global score are not widely understood and so much research use multifactorial instruments as though they were unidimensional, despite having evidence to the contrary [35]. Rasch analysis has been proposed as a robust measurement paradigm [36]. Developed in education sciences as a means for measuring aptitude, Rasch analysis proposes that the likelihood of a person agreeing with a questionnaire statement is related to its 'difficulty'. That is, some items are easier to agree with than others, and do not equally convey the same information about quality. Thus, a positive response to a statement that very few other people agree with would likely suggest that the respondent occupies a higher position on the latent continuum (whereas a CTT scale pays no regard to item difficulty). Where a consistent hierarchical structure exists within the questionnaire (a probabilistic form of the Guttman pattern) Rasch analysis forms estimates of each respondent's location on the latent scale [34]. In the event that an instrument can demonstrate it satisfies a series of assumptions (see below), the resulting measure is assured interval-level properties suited for parametric hypothesis testing in research [34]. Furthermore, Rasch analysis permits detailed investigation of local dependence problems and the distribution of error across the continuum.
This paper aims to establish whether the PCQ-S conforms to Rasch assumptions and to provide a mechanism for researchers to convert raw scores to intervallevel Rasch scores.

Methods
This study uses cross-sectional data collected as part of the Umeå ageing and health research program in Sweden (U-Age) [37]. The PCQ-S was administered through a self-administered questionnaire distributed to nursing home staff between November 2013 and September 2014. Further detail is as follows.

Participants
The U-Age data collection consisted of nationwide randomly selected nursing homes and their residents. A total of 60 (of 290) Swedish municipalities were randomly selected for the project and of those 35 agreed to participate. Within these municipalities, nursing home managers were contacted by telephone and 188 of 202 invited nursing homes agreed to participate. No further attempts were made to approach non-participating municipalities or units to find their reasons for not participating. This study was based on data from 172 nursing homes, since 16 did not return data although they had agreed to participate. The sample comprised staff working within 548 units. Further sampling details are available elsewhere [37].

Analysis
Rasch analysis was implemented through a partial credit model [38] suitable for polytomous items, within RUMM2030 software. The objective of a Rasch analysis is to construct a scale from individual items and test its suitability for interval level measurement. Scale construction is possible by using a logistic function to relate a person's probability of answering an item using a given response category to their underlying position on the latent continuum. Scores are thus measured in logits. RUMM2030 enables inspection of key assumptions to be satisfied [34], specifically: 1. Overall fit: A χ 2 statistic assesses the overall fit to the Rasch model (against the null hypothesis of perfect fit). In addition, the distribution of standardised residuals for both persons and items should have a standard deviation no larger than 1.4. 2. Item and person fit: Individual person and item residuals should be as close to zero as possible. Residuals in excess of ±2.5 are considered potentially problematic. At the item level, large negative residuals indicate redundancy (akin to very high item-total correlations in CTT). These are evaluated within RUMM2030 using an F-test. 3. Threshold ordering: Each response category should be in the anticipated order and each should have the greatest likelihood of being chosen for some part of the latent continuum. 4. Local dependence: Items should not be correlated beyond that associated with the construct under measurement. Within RUMM2030, this is evaluated through the item residual correlation matrix, with those correlations larger than the mean + 0.20 regarded as problematic. 5. Unidimensionality: Rasch models require that only one construct is being measured. RUMM2030 conducts a principal components analysis of residuals, with negatively / positively loading items then separately used to estimate each respondent's location on the logit scale. Paired t-tests then estimate the significance of these differences. The proportion of t-tests reaching significance should not exceed 5%. 6. Reliability: The internal consistency is estimated using the Person Separation Index and Cronbach alpha. A PSI in excess of 0.70 is generally viewed as a minimum for research purposes. 7. Differential Item Functioning: In this study, we use a set of general personal and job-related characteristics (gender, age, qualification, and type of care setting) to evaluate whether some groups of respondents have a different likelihood of answering an item / category despite being located at the same point on the latent continuum. DIF may be uniform (a constant differential across the scale) or nonuniform (where differential varies across the scale) and is evaluated through an ANOVA.
Against a null hypothesis of perfect fit, the Rasch tests are known to be over-powered for large n (e.g. n > 400); that is, even acceptable levels of deviation from the Rasch assumptions are found to be statistically significant. Because a large sample was achieved in this study (described below), two 10% random subsamples were drawn without replacement from the full dataset, following the approach of Gibbons et al. [39]. Statistical tests confirmed that there were no significant differences in the characteristics of the two samples. Rasch analysis was first conducted on subsample 1 (the 'evaluation sample'), which was then reapplied in subsample 2 (the 'validation sample'); with the stability of fit indicators assessed in both applications. For the purpose of describing the final item locations, standard errors, and a nomogram for converting raw to logit scores, the model was re-estimated in the full sample.
The PCQ-S has been found in previous testing to be best represented by a three-factor structure (see above). Three separate scales were therefore constructed and separately tested. To form a summary scale from the three factors, 'subtests' were formed within RUMM2030 (see [38] for a similar example of this process). Under subtests, the items within each subscale are combined and entered as 'meta-items' within the Rasch model. Since subtests parcel-out dependency between items, this necessarily reduces internal consistency estimates. In the event that a single summary score formed of these subtests satisfies the Rasch fit assumptions, then this 'higher order' Rasch scale is able to resolve the problems caused by dependency between subsets of items.

Results
Of 6902 questionnaires distributed to staff, 4831 were returned representing a 70% response rate. The broad sample characteristics are presented in Table 2. Over 80% of the sample were registered nurses with others representing different grades of non-registered practitioners. Approximately a third were based in group living environments with the bulk of the sample drawn from nursing homes. The size of participating units ranged from 7 to 128 beds and included both general as well as special care units for dementia. As noted above, Rasch analysis was performed in 'evaluation' and 'validation' subsamples randomly drawn from this dataset.

Subscale 1: a climate of safety
Initial Rasch analysis of items 1-5 indicated a poor fit (p < 0.0001, see Table 3). There was evidence of disordered thresholds in four items, which were resolved by rescoring each by combining responses falling in the second and third categories. There appeared to be two additional causes of misfit. First, there were large residuals for items 1 and 5 and, second, evidence of local dependency between item 1 and 2 . Item 1 had a large negative residual indicating over-discrimination and redundancy and was the only item with a statistically significant Fstatistic. Upon its removal, the Rasch assumptions were met (χ 2 (20) = 26.112, p = 0.162) with items fitting appropriately, albeit with a slightly larger than expected residual standard deviation. These four items formed a unidimensional subscale, with under 3% of paired t-tests reaching significance. The thresholds for each item category are shown in accompanying category characteristics curves in Additional file 1. Robustness of fit was tested by re-examining these four items in the validation sample, with similar results being achieved. No evidence of DIF was identified for gender, age group, qualification status, type of care setting or its size (full results from RUMM2030 for DIF analyses are provided as Additional file 2).

Subscale 2: a climate of everydayness
Analysis of items 6-10 indicated some misfit to the Rasch model as evidenced by a borderline significant χ 2 value and a large item residual standard deviation. Items 7,8 and 10 all required rescoring in the same form as for items in subscale 1 to resolve disordered thresholds. A potential source of misfit was due to local dependence between items 8 and 9 indicating shared variation beyond the latent trait under measurement. Item 8 was on the threshold of significant misfit, and so was removed. The reduced scale had a non-significant χ 2 value, was unidimensional and free from local dependency. Applying the same model to the validation sample gave satisfactory results. As for subscale 1, no evidence of (uniform or non-uniform) DIF was identified for any variables tested.

Subscale 3: a climate of community
Analysis of items 11-14 revealed good fit to the Rasch model including a non-significant χ 2 value, with satisfactory distribution of residuals and was unidimensional. All thresholds were ordered without need for rescoring. Analysis of residual correlations indicated some evidence of local dependence between items 13 and 14 (a place where it is easy for patients to talk to staff/a place where patients have someone to talk to). The residual correlation was 0.29 greater than the mean correlation in the matrix. However, removing or combining the items caused other fit difficulties and so the two separate items were kept. No evidence of DIF was identified.

Summary scale
The 12 items forming the three subscales (referred to hereafter as the PCQ-S-R, denoting the Rasch version) were then combined in a single model, showing good fit to Rasch assumptions except for its (anticipated) multidimensionality. The three subscales were then formed as 'testlets' within RUMM2030 and re-estimated, showing generally good fit, as evidenced by non-significant χ 2 value and non-significant test of unidimensionality once subscales were accounted for. The person separation index for the summary scale was 0.776, which was above the required minimum levels for research purposes (= 0.70). Re-estimation within the validation sample supported these findings (see Table 3). Table 4 presents the item level results for the final model. The PCQ-S-R was then estimated for the whole sample. Figure 1 presents the distribution of respondents and item thresholds and indicates targeting problems. With item thresholds anchored with a mean of zero logits (lower panel) the mean of the person distribution was 1.10 logits (standard deviation of 0.86), with 7.6% of respondents at extreme values. A more efficient scale would have questions that were less often affirmed in the sample. Finally, Table 5 presents a nomogram enabling researchers to convert ordinal raw scores to metric logit scores (and rescaled logit scores matching the range of the raw score).

Discussion
Amid clarion calls to improve measurement in personcentred care [40], this paper sought to bolster the quality and rigour of one such instrument through application of Rasch analysis to the PCQ-S. The PCQ-S is one of the most widely used questionnaire-based measures of person-centredness in dementia research, being translated into multiple languages and growing evidence of its measurement properties [17]. However, all current work has used classical test theory, leaving the PCQ-S open to concerns over how Likert-type items are simply summed to form the measure. This new research found that a 12-item version, labelled the PCQ-S-R, broadly satisfied the assumptions of the Rasch model by forming subscales from the three separate factors. A notable strength of the analysis is the large sample size on which it is drawn. By using the nomogram, researchers using the PCQ-S-R can be satisfied that any subsequent analysis would be of interval-level scores. A further advance of the PCQ-S-R is that a single score can be used to accurately represent respondents' perceptions of person-centredness, rather than relying on three distinct but correlated subscales. Unless a particular dimension is the subject of attention, combining the three scales into a global measure of personcentredness would be a researcher's preference. Although it is commonplace to sum subscale scores under CTT into a global score, few applications have formally assessed (e.g. through bifactor modelling) how appropriate this would be for the construct of interest. Not all subscales comprise sufficient common variance to justify aggregation [41]. However, the PCQ-S-R, formed of three subscales, passes the Rasch assumptions.
An important feature of the PCQ-S lies in its spread of content across themes that resonate strongly with person-centred literature. However, the response patterns for two items did not accord with Rasch expectations, and so were removed in the PCQ-S-R. Item 1 ('a place where I feel welcome') was removed, which might be of some concern given this has been identified as an important aspect of service user and carer experience, at least in hospital settings [42]. Arguably, the notion of 'feeling welcome' is pertinent in joining new environments that is not one's own, and may be less suited to long-term residential settings where some will have been resident for many months and years. It is therefore plausible that other items more accurately capture the essence of what was intended, as is indicated by the Rasch analysis. Similarly, the Rasch analysis found evidence of local dependency between item 8 ('a place that is quiet and peaceful') and item 9 ('a place to get unpleasant thoughts out of your head'). Presumably respondents considered that these were tautological, and therefore the removal of item 8 would not be a considerable loss to the content validity of the scale. Additional research interviews with respondents to the scale would be useful to explore these redundancy issues further.
Rasch analysis has the added advantage of investigating the efficiency of a scale. The results suggest that the PCQ-S suffers from mistargeting. Many of the Likert thresholds at the lower end of the spectrum contribute little information, since so few individuals report care quality that is so poor. By contrast, at the higher quality end of the spectrum, too few thresholds mean that it is challenging to discriminate between respondents' perceptions. The scale is therefore less-suited for monitoring change in services where person-centredness quality is already strong. Ideally, the PCQ would contain more items that, to be endorsed, would require even higher standards of person-centredness. Targeted qualitative work with those already perceiving that services are of a high quality could help to identify more 'difficult' standards for inclusion in the PCQ. In the future, further items could potentially form an item bank for use within computer adaptive testing (whereby questions are tailored during the test depending on earlier responses, to identify more accurate estimates of the phenomenon under measurement.) The analysis is not without its limitations. First, the analysis relates to the Swedish language version of the PCQ-S, and it cannot be assumed that the same conclusions would apply to other versions. The challenges in creating semantically and culturally equivalent scales are well known [43], and formal analysis would require parallel application using other language versions. Second, the sample is restricted to residential settings and there can be no guarantees that the same findings would have been achieved from hospital-based respondents. That said, there is some reassurance from the absence of any Differential Item Functioning between the nursing homes and other settings within the sample. Finally, it is worth recalling that the PCQ-S relies on staff reports and these may differ from independent ratings of person-centredness in any given facility. Arguably, on average, staff will report more positive views of person-centredness within their facility than outside observers. Ideally these questions require improvement to clarify their distinction and this should be the focus of future research.

Conclusions
The PCQ-S-R is a 12-item scale that can be used to appraise the person-centredness in dementia care settings. The research represents a significant advance since the questionnaire can now be said to have been examined against the rigorous assumptions of the Rasch model, and can be more confidently analysed using parametric statistical procedures. Furthermore, the research offers a means for correctly calculating a single global score rather than three separate subscales. Yet some improvements are still required. Specifically, the scale is mistargeted, meaning that it may not be sensitive to change at higher levels of person-centred quality, and further research could explore and refine two items that may still be locally dependent.