Psychometric properties of the patient assessment of chronic illness care measure: acceptability, reliability and validity in United Kingdom patients with long-term conditions

Background The Patient Assessment of Chronic Illness Care (PACIC) is a US measure of chronic illness quality of care, based on the influential Chronic Care Model (CCM). It measures a number of aspects of care, including patient activation; delivery system design and decision support; goal setting and tailoring; problem-solving and contextual counselling; follow-up and coordination. Although there is developing evidence of the utility of the scale, there is little evidence about its performance in the United Kingdom (UK). We present preliminary data on the psychometric performance of the PACIC in a large sample of UK patients with long-term conditions. Method We collected PACIC, demographic, clinical and quality of care data from patients with long-term conditions across 38 general practices, as part of a wider longitudinal study. We assess rates of missing data, present descriptive and distributional data, assess internal consistency, and test validity through confirmatory factor analysis, and through associations between PACIC scores, patient characteristics and related measures. Results There was evidence that rates of missing data were high on PACIC (9.6% - 15.9%), and higher than on other scales used in the same survey. Most PACIC sub-scales showed reasonable levels of internal consistency (alpha = 0.68 – 0.94), responses did not demonstrate high skewness levels, and floor effects were more frequent (up to 30.4% on the follow up and co-ordination subscale) than ceiling effects (generally <5%). PACIC demonstrated preliminary evidence of validity in terms of measures of long-term condition care. Confirmatory factor analysis suggested that the five factor PACIC structure proposed by the scale developers did not fit the data: reporting separate factor scores may not always be appropriate. Conclusion The importance of improving care for long-term conditions means that the development and validation of measures is a priority. The PACIC scale has demonstrated potential utility in this regard, but further assessment is required to assess low levels of completion of the scale, and to explore the performance of the scale in predicting outcomes and assessing the effects of interventions.

Background Improving the quality of care for long-term conditions is an international priority [1,2], which has led to significant focus on the delivery and evaluation of quality improvement activities such as provider and patient education [1], service redesign [3], use of technology [4], and financial incentives [5]. However, assessing the effects of these initiatives depends on acceptable, reliable and valid measures of quality. Although quality can be assessed from a number of perspectives, there is increasing agreement concerning the importance of the views of the patient [6].
Assessing the patient perspective generally requires self-report measures. To ensure their utility, measures must be subject to a significant programme of research to assess their acceptability to patients and their formal psychometric properties, including their use in contexts and populations different to those in which they were developed. In particular, where innovations in the management of long-term conditions cross national boundaries, measures of quality are needed that perform consistently in different health care settings to both support effective local policy implementation and to allow interpretable comparison of the performance of different health care systems worldwide.
The Patient Assessment of Chronic Illness Care (PACIC) is a United States (US) measure of quality of care for patients with a chronic illness [7]. The original PACIC includes 20 items and measures specific actions or qualities of care, based on the influential Chronic Care Model (CCM). The PACIC is designed around five subscales: (a) patient activation (b) delivery system design and decision support (c) goal setting and tailoring (d) problem-solving and contextual counselling (e) follow-up and coordination.
Although the scale was only published in 2005, the influence of the Chronic Care Model means that there is already a reasonable evidence base on the performance of the scale (see Table 1). The scale seems to be largely acceptable to patients with long-term conditions, with low levels of missing data [7][8][9][10], although some studies demonstrate skew and related floor and ceiling effects [11,12]. Most assessments suggest acceptable levels of internal consistency [7,9,11,[13][14][15] and test-retest reliability [7,13,15]. Although the scale is based on a five factor conceptual model, there is less consensus over the degree to which responses reflect this structure [9,11], with studies suggesting two factor or unidimensional structures may be a better fit to the data [8,9,14].
Validating scales such as the PACIC is a complex process. Although convergent validity with related scales (such as other patient self-report measures of quality) is useful [7,8], construct validity is difficult because it is not clear exactly how factors such as age, sex, socioeconomic status and multimorbidity should relate to PACIC scores. Studies have demonstrated predicted relationships with measures of self management behaviour [10,12] and self rated health [11,14]. Studies relating PACIC to 'harder' outcomes such as clinical parameters have generally had less success [9,13]. There are few prospective studies of the ability of PACIC to predict outcomes over time.
The published literature on the PACIC includes studies from the US [7,9,10,12,13,15,18], Canada [23], Denmark [11], Germany [8,19,21], Holland [22], Australia [14] and New Zealand [16]. Many findings are consistent across health systems and populations. However, at the time of writing there is little evidence about the performance of PACIC in the United Kingdom (UK), despite the major initiatives (such as the Quality and Outcomes Framework) which have been implemented in this setting to improve care for long-term conditions.
We present preliminary data on the psychometric performance of the PACIC in a large sample of UK patients with long-term conditions, in terms of acceptability, reliability and validity. For acceptability, we explored rates of missing data and compared them with rates found in the international literature. In terms of reliability, we assessed internal consistency at the scale and subscale level. In terms of validity, we explored floor and ceiling effects, factor structure, and associations between the PACIC and other care quality outcomes measures to test predicted relationships.

Methods
Data were collected as part of a wider longitudinal cohort designed to assess the impact of 'care planning' and written 'care plans' on patient outcomes [24]. We identified patients on clinical registers with long-term conditions in practices with high levels of 'care plans' as reported in the General Practice Patient Survey [25], and recruited comparable patients in similar practices reporting lower levels of written care plans. The study was not designed to provide population estimates, but to create patient groups differing in rates of 'care plans' but similar in all other characteristics. However, the sample should function for assessment of psychometric characteristics and associations between variables. The current analysis uses baseline data from the cohort. The following measures were used.

PACIC
As noted previously, the original version of PACIC used in the study includes 20 items based around five subscales: patient activation; delivery system design; goal setting; problem-solving and contextual counselling; and follow-up and co-ordination. Each item is rated on a five point scale (from 'almost never' to 'almost always') and subscale and total scores are based on average scores across items [7],      with higher scores indicating higher quality of care. Item content is shown in Table 2. The scale was used without any major adaptation for a UK population, although 'chronic condition' was changed to 'long-term condition' as this is the more usual term used in the UK.

Demographic and clinical characteristics
We measured socio-demographic variables (age, gender, work, and education). We asked patients to self report long-term conditions from a list (including high blood pressure, chest complaints, diabetes, heart problems, chronic kidney disease, stroke, cancer, anxiety and depression, arthritis, stomach or bowel problems, skin conditions, vision or hearing problems, neurological problems, chronic fatigue, thyroid or other problems). Patients also reported the professional they consulted with most frequently for their long-term conditions (GP, practice nurse or other, including community nurse, hospital doctor or hospital nurse) and the number of primary care consultations in the last six months.
Measures of quality of care (a) Shared decision making We measured shared decision making using the Health Care Climate Questionnaire (HCCQ) measure [26,27]. The scale assesses patients' perceptions of the degree to which their health professional is 'autonomy supportive' as opposed to 'controlling' when providing health care. Each item is scored on a 7-point scale ranging from 'strongly disagree' to 'strongly agree'. We used the short form with 6 items, with an alpha of 0.8. Scale scores were recoded 0-100 for descriptive analysis, although there was evidence of significant skew.

(b) Quality of care for long-term conditions
We used a six item scale used in quality improvement activities in the UK which assess quality of care for longterm conditions with items relating to communication, patient involvement, information, support, co-ordination of care, and self-efficacy (QIPP scale). Each item is scored on a 4 point scale, with a range of scale anchors, and the item scores are averaged to create an overall score.
(c) Satisfaction with primary care We assessed satisfaction with primary care with a single item 5 point scale (rated from 'very dissatisfied' to 'very satisfied') [25]. Satisfaction data were very highly skewed.

Analysis (a) Acceptability
The PACIC scale did not go through any form of translation from US to UK populations. One indicator of the acceptability of a measure is to look at how well the items are completed. We assessed acceptability for the UK population by looking at completion rates and the extent of missing data. We computed missing data rates for items, sub-scales and overall score. There are no published guidelines for dealing with missing values on PACIC, so we adopted the arbitrary criterion that respondents must have completed at least 60% of the items on a subscale or the total scale to be included in analyses.

(b) Reliability
As with many scales, multiple items are used to measure PACIC subscales, on the basis that several observations will lead to a more reliable measure. This is based on the assumption that items within a subscale are homogenous. We assessed the internal consistency of the PACIC by calculating Cronbach's alpha for the full PACIC scale and for each subscale.

(c) Validity
We calculated the proportions of patients scoring at floor and ceiling for subscales and the overall scale, and explored the distribution of subscale and overall scores. Confirmatory factor analysis was used to test the hypothesised factor structure of the PACIC: a five latentfactor model of quality of care in which all latent factors were allowed to covary with one another [9]. Structural equation modelling, using AMOS (version 16.0), was used to fit and test the factor structure. We conducted two analyses. The first was a 'complete cases' analysis using only those respondents with full data on all 20 items; the second adopted a less restrictive criterion and included patients with missing data on three or fewer of the 20 items and no more than 50% of items missing on any of the five scales. STATA's method of multivariate normal regression was used to impute ratings for these cases. As the imputed data was non-integer and, in a few cases, outside the item scoring range, it was first rounded to the nearest integer and then recoded, where necessary, to the appropriate 'anchor' point. The method of maximum likelihood was adopted for parameter estimation: asymptotically distribution-free estimation was employed as a sensitivity analysis.
The published evidence was mixed with respect to likely associations with demographic characteristics (see Table 1). We made no specific hypotheses, but report differences in the scores of different groups using linear regression (in Stata version 11.0), taking account of the within-practice clustering. Due to skewness in the distribution of the overall PACIC scores, standard errors were calculated using a bootstrap method, free from parametric assumptions, using 10,000 bootstrap samples.
To assess construct validity, we hypothesised significant associations with measures of shared decision-making, quality of care and satisfaction with primary care services (Table 1). We assessed these relationships using Spearman non-parametric correlations, in view of the skewed distributions in the measures.

Results
Responses were received from 2551 respondents (41%), although a small number with missing age and sex were removed for analysis, as were those who self reported no long-term conditions (despite being on a clinical register), leaving 2439 potential respondents for analysis (40%). Demographic and clinical data on respondents are provided in Table 3.

Acceptability
Missing data rates for the PACIC items were high (Table 2), ranging from 9.6% to 15.9% at an item level. Between 11.2% and 15.7% of subscales could not be calculated because less than 60% of items were completed and 14.6% of patients were missing a total score. Ceiling effects were generally under 5%, although significant proportions of patients scored at the floor for patient activation (20.9%), goal setting (14.2%), problem solving (14.7%) and follow-up and co-ordination (30.4%).

Descriptives
The total PACIC score showed a reasonable distribution of scores, with some positive skew. Most of the subscales were also positively skewed, most notably the goal setting and follow up subscales. The mean overall PACIC score was 2.4 (SD 0.87) with subscale means of patient activation (2.5), delivery system design (3.1); goal setting (2.2); problem solving (2.5); and follow-up and co-ordination (1.9). The distribution of PACIC scores demonstrated more symmetry and less ceiling effects than the QIPP, HCCQ and satisfaction scores. Importantly, the distribution in the PACIC scores means the scale has much higher capacity to reflect positive changes in individual scores than the latter scales (see Additional files 1,2,3,4,5,6,7,8 and 9).
The intracluster correlation coefficient (ICC) for the total PACIC score was 0.040 (i.e. only 4% of the total variation in PACIC scores was due to differences in practice means, with the remaining 96% resulting from differences between patients) with subscale ICCs ranging from 0.029 to −0.042.

Validity (structure)
The complete case analysis of the hypothesised PACIC factor structure utilised 75.7% of the sample (n = 1846). The model did not fit the data well according to most indices of fit (actual indices and conventional levels of 'good' fit are presented in Table 4). Although the Standardised Root Mean-Squared Residual indicated that on average, observed and predicted item variances and covariances were not too dissimilar, this masks a number of large differences on specific covariance terms. Inter-factor correlations were also generally high, ranging from 0.60 to 0.97 (between delivery system design and goal setting).
Using the less restrictive criteria a further subset of 194 patients (8%) with some missing data were added into the analysis, but the overall results in terms of indices of fit were similar (Table 4).

Validity (construct)
The high inter-correlations between PACIC subscales and the failure to confirm a five factor structure meant that analyses of construct validity focused on PACIC total scores only. Initial analysis explored associations with demographic characteristics. Females and patients aged 75  or more scored significantly lower in total scores (regression co-efficient 0.18 and −0.20 respectively). The impact of increasing numbers of conditions and greater contact with a general practitioner was inconsistent. There was no association between scores and the professional most responsible for care of the long-term condition. All these relationships accounted for around 1% of the variance in PACIC scores (Table 5).
In terms of construct validity (Table 6), PACIC total scores were significantly associated with the single item measure of patient satisfaction with primary care (Spearman's correlation 0.24) and demonstrated higher correlations with shared decision-making (Spearman's correlation 0.47) and quality of care (Spearman's correlation 0.54).
The results were not markedly different when the analyses were rerun on the imputed data set (N = 1973).

Discussion
As health policy makers focus on the challenges of care for long-term conditions, significant funding is being channelled towards quality improvement in care delivery, through changes to skill mix, staff training, new technologies, and financial incentives. The success of these quality improvement efforts are in part dependent on effective measures to track current standards and assess the effectiveness of interventions. Such measures can also ensure that policy and clinical interventions are perceived by patients to be making improvements to care. This study represented a preliminary test of the utility of the PACIC for this purpose in the UK.

Summary of the results
Scores on the PACIC showed some skew, but were generally reasonably well distributed, with few scales showing the high levels of skew that that are sometimes evident on other patient-reported measures of primary care. However, the amount of missing data at the item, subscale and overall levels was relatively high. This was higher than comparable PACIC studies in the literature, where rates (when reported) were around 3-5% [7][8][9]11]. It was also higher than rates found in scales in the same surveyfor example, the shorter QIPP had missing data rates of 3-5% at an item level.

Limitations of the study
As noted earlier, the study was designed as a longitudinal study to assess the potential benefits of care plans. The sample was not designed as a random sample of patients with long-term conditions, and the response rate, while in line with other published studies [28], does mean that the sample cannot be considered representative. It should function as a sample for preliminary assessment of the performance of the scale, although selective non-response (i.e. among more severely ill patients) may restrict range on some variables, which could in turn impact on estimated associations. Furthermore, care must be taken in interpreting descriptive data such as mean scores. We did not have data to estimate some important aspects of reliability and validity, including test-retest reliability, criterion validity (as there is no accepted 'gold standard') or responsiveness to change. Our assessment of acceptability was limited to missing item rates, and did not explore other aspects, such as patient views of the scale, time to complete the scale, or cultural acceptability [29].

Interpretation of the results
It is not clear why non-completion rates were so much higher than comparable studies, and there are a lack of data to provide comparisons of patient characteristics (such as education and health literacy) in the current sample which might explain these high rates. Examination of the item content of the PACIC might suggest that some phrases (such as 'nutritionist') may be unfamiliar to some patients, and others (such as 'hard times') may be interpreted differently between UK and other populations. Informal discussions took place with some patients during administration of the survey, and those discussions suggested that some items on PACIC may make assumptions about existing care in the UK which may be inappropriate. For example, question 1 is ' Asked about my ideas when we made a treatment plan' , and question 8 asks patients if they were 'given a copy of my treatment plan'. The items represent reports of care, but the response options do not offer a 'not relevant' option, hence it is possible that some 'missing' responses reflect patients who felt that the question was irrelevant to their current care, rather than just representing activities that were infrequent, as evidence suggests that written treatment plans are not a consistent part of care for long-term conditions in the UK [24]. The current response format may be causing missing data which in fact reflects meaningful responses. It has been suggested that response scales may reasonably be modified to suit local context and this might improve performance of the scale in the UK, although this will be at some potential cost in comparability across studies [30]. This issue requires further investigation and possibly cognitive testing and other qualitative methods to make the scale more suitable for the UK population.
If the scores of respondents can be considered meaningful, it is interesting that the scores on PACIC in the UK are relatively low. Some scales did show a high prevalence of scores at the floor, and the mean scores were generally lower than those reported in the wider literature. For example, mean PACIC total score was 2.4, compared to 2.6 in patients in US primary care [7], 3.3 in depressed primary care patients in Germany [8], 2.7 in patients with osteoarthritis in German primary care [19], 3.0 in patients with CHD, hypertension or diabetes in Australian general practice [14] and 3.2 in Hispanics with diabetes in hospital ambulatory settings in the US [15]. Patient activation, follow-up and co-ordination and problem solving were particularly low in the current sample. Of course, there are a lack of data on calibration of PACIC against other measures which would allow judgements of the clinical or policy significance of such differences, even if they were statistically significant. However, the low scores may seem surprising given the importance placed on structured delivery of care for long-term conditions through the Quality and Outcomes Framework, which has seen changes to skill mix, and increased use of information technology and protocols for monitoring patients and delivering standardised care in line with the Chronic Care Model [31]. Recent evidence suggests that patients with complex care needs in the UK rate their experience of a 'patient-centered medical home' (characterised by high access, professionals who know their medical history, and care coordination) higher than those in other countries [32]. However, there is evidence that the content of care has changed, with an increased focus on biomedicine and less on self management and psychosocial issues [33][34][35], and it is possible that the scores reflect that.
Generally, the PACIC subscales showed appropriate levels of internal reliability. We did not set an a priori criterion for reliability prior to the analysis, although our implicit assumption was that they should be between 0.7 and 0.9, in line with published convention [29]. Cronbach's alpha for Delivery system design was lower than for the other subscales (0.68 vs. >0.80). It should be noted that this pattern is consistent with data from other studies where reported [7,8,22] and as these other studies are from the USA, Germany and Holland, the lower reliability seems unlikely to reflect the UK health service.
As indicated in Table 1, studies have reported variable relationships with demographic and clinical variables. We found lower PACIC scores in females, while the bulk of studies report non-significant relationships [8,11,12,15], although that may reflect the higher level of power in the current analysis as the proportion of variance accounted for by gender was trivial. The same patterns were in evidence for relationships with age [8,11,12,15,20] and number of conditions [7,8,11,12,15]. In terms of validity, the PACIC showed the hypothesised associations with shared decision-making and assessments of quality of care and patient satisfaction. Global measures of satisfaction generally reflect patient assessments of interpersonal care, and it appears that PACIC is not simply reflecting the quality of the doctor-patient relationship or patients' liking for their doctor, as the associations are relatively low. The different distributions of scores indicate that PACIC has the potential to add value to the assessment of practice and professional performance.
The factor analysis suggested that the five factor structure was not supported by the data. Although further analysis might formally test alternative models of the relationships between items, calculating total PACIC scores based on all 20 items might be the most appropriate scoring method. It should be noted that maximum likelihood estimation is not considered to be the best method for use with ordinal data [36], as it was developed for continuous variables with a joint multivariate normal distribution. However, the large sample size, coupled with the knowledge that we are following applied measurement practice for this instrument (i.e. item scores are simply summed to form subscale and overall scores) justify its use here.
Some previous analyses have supported the five factor structure [7,20,22], although technical aspects of these analyses have been criticised [30]. Of course, as this is the first published assessment of the PACIC scale in the UK, the failure to confirm the factor structure may reflect characteristics of the service context and the patient population, such as the gap between the assumptions inherent in PACIC items and the experience of patients that was raised in the discussion of missing data above. If patient experience of care for long-term conditions is not effectively reflected in the PACIC items, a clear factor structure may be less likely to emerge.
More fundamentally, the appropriateness of factor analysis (and internal reliability estimates) has been questioned. Underlying these techniques is the assumption that responses to individual items are caused by an underlying construct [37]. Patients reporting inconsistent patterns across related items may not reflect instrument problems, but inconsistency in their experience of separate and distinct aspects of care. If this is the case, conventional assessments of factor structure and internal reliability may be less useful [30].
Although data are available in the baseline cohort, we have not reported associations between PACIC and patient health behaviour and health outcomes. We do not feel that these are correctly conceptualised as measures of validity for a single scale: rather, the association between quality of care and patient outcomes (and the importance of care quality compared to other drivers such as demography, socio-economic status and selfmanagement behaviour) is a core empirical question for health services research and delivery [38]. The priority is to explore whether quality of care predicts outcomes over time, where evidence is far more limited. Our longitudinal survey is designed to allow this to be estimated prospectively and we will publish data in due course.

Conclusions
In summary, the study suggests that the use of PACIC may lead to relatively high levels of missing data among UK patients, although the reasons for that would benefit from further research. However, our analyses suggest reasonable levels of reliability and validity. The instrument also demonstrates a more symmetrical distribution than most patient-reported measures and a higher capacity to capture positive change, giving the scale (and the modified version currently proposed) considerable potential as a measure of the delivery of core components of care for long-term conditions in the UK.