Measurement properties of the Health Literacy Questionnaire (HLQ) among older adults who present to the emergency department after a fall: a Rasch analysis

Background Health literacy is an important concept associated with participation in preventive health initiatives, such as falls prevention programs. A comprehensive health literacy measurement tool, appropriate for this population, is required. The aim of this study was to evaluate the measurement properties of the Health Literacy Questionnaire (HLQ) in a cohort of older adults who presented to a hospital emergency department (ED) after a fall. Methods Older adults who presented to an ED after a fall had their health literacy assessed using the HLQ (n = 433). Data were collected as part of a multi-centre randomised controlled trial of a falls prevention program. Measurement properties of the HLQ were assessed using Rasch analysis. Results All nine scales of the HLQ were unidimensional, with good internal consistency reliability. No item bias was found for most items (43 of 44). A degree of overall misfit to the Rasch model was evident for six of the nine HLQ scales. The majority of misfit indicated content overlap between some items and does not compromise measurement. A measurement gap was identified for this cohort at mid to high HLQ score. Conclusions The HLQ demonstrated good measurement properties in a cohort of older adults who presented to an ED after a fall. The summation of the HLQ items within each scale, providing unbiased information on nine separate areas of health literacy, is supported. Clinicians, researchers and policy makers may have confidence using the HLQ scale scores to gain information about health literacy in older people presenting to the ED after a fall. Trial registration This study was registered with the Australian New Zealand Clinical Trials Registry, number ACTRN12614000336684 (27 March 2014).


Background
Falls represent the main cause of emergency department (ED) presentations for older adults [1]. However, participation in falls prevention activities following presentation to the ED with a fall is suboptimal [2]. Health literacy is an important concept associated with participation in preventive health initiatives [3]. Health literacy is defined as "the cognitive and social skills which determine the motivation and ability of individuals to gain access to, understand and use information in ways which promote and maintain good health" [4].
Adults with sub-optimal health literacy are less likely to participate in preventive health programs, such as falls prevention programs, possibly due to lack of understanding of health information and education provided [5]. Accurate measurement of health literacy prior to commencing a falls prevention program may guide clinicians to adapt provider-patient communication, such as provision of information related to falls risks and their management strategies, to match the patient's level of health literacy. This may lead to increased participation in falls prevention activities, potentially resulting in improved outcomes for these individuals.
A range of health literacy measurement tools are available. However, most tools do not reflect the multidimensional definition of health literacy, and predominantly focus on reading comprehension, pronunciation and numeracy [6,7]. The Health Literacy Questionnaire (HLQ) was developed to address the shortcomings of previous tools [8]. The HLQ comprises nine independent scales related to the understanding of, engagement with, and use of health services, from both an individual and organisational perspective.
The measurement properties of the HLQ have been explored in depth using predominantly classical test theory (CTT) approaches [8][9][10][11] and qualitative approaches [8,12]. The HLQ was originally validated using a sample from clinical, home and community care settings in Australia [8]. A highly restrictive 9-factor confirmatory factor analysis (CFA) model fitted satisfactory, with each of the HLQ scales representing nine conceptually distinct areas of health literacy. Subsequent studies evaluating the psychometric properties of the HLQ, including German, Danish, and Slovakian versions, support these findings, with the HLQ demonstrating good model fit and reliability, as well as homogeneity of items within each of the HLQ scales [9][10][11]13]. Diverse cohorts were used in these studies representing people with a range of health conditions, receiving a variety of health services. A recent study evaluated the measurement properties of the initial version of the HLQ among people at risk of cardiovascular disease, using Rasch methods [14]. Similar to previous studies, each of the nine HLQ scales were found to measure nine separate constructs of health literacy with good internal consistency. Unclear distinction between some response categories in some HLQ scales was reported and the scales were deemed to be suboptimally targeted in relation to the particular cardiovascular cohort [14]. With the HLQ version used in this study, some disordered thresholds among items in scales 6 to 9 were observed. Kolarcik et al. observed this effect as well and subsequently improved the response options which resulted in lower scores (better targeting), and improved model fit, with no disordered thresholds [13].
Rasch analysis is a modern and unique form of item response theory (IRT) [15]. It involves testing an outcome scale against a mathematical model that operationalises the key principles of good measurement [15][16][17]. Rasch analysis allows for a unified approach to evaluating several measurement issues, such as unidimensionality, local dependency, response category ordering, item bias and targeting, producing rich data that complements and adds to CTT approaches [15][16][17][18]. Rasch analysis is widely accepted as the standard for modern psychometric evaluations of outcome scales [15,19]. As such, this methodology was deemed to be the most appropriate for this study.
Previous studies provide robust evidence to guide the practical use of the HLQ among a variety of international community and clinical populations. However, the measurement properties of the HLQ have not previously been determined for older adults who have presented to an ED after a fall. The appropriateness of a tool may vary across settings, therefore it is imperative to analyse the HLQ in specific populations prior to applying the tool and interpreting scores [8,12]. The aim of this study was to use Rasch methods to evaluate the measurement properties of the HLQ in a cohort of older adults who presented to a hospital ED after a fall.

Design
This study was embedded within a multi-centre randomised controlled trial (RCT) of a patient-centred falls prevention program: RESPOND. RESPOND incorporates (1) a home-based assessment; (2) education, goal setting and telephone coaching for management of selected falls risk factors; and (3) healthcare provider communication and community linkage, delivered over 6 months [20]. Ethical approval was obtained from Alfred Health (HREC 439/13) and Royal Perth Hospital (REG 13-128), Monash University Human Research Ethics Committee (HREC) (MUHREC CF13/3869-2013001975) and Curtin University HREC (HR 43/ 2014).

Participants and setting
Adults aged between 60 and 90 years who presented at two Australian EDs with a fall, and had a planned discharge home within 72 h, were eligible to participate in the RESPOND trial [20]. Exclusion criteria were: current palliative care or terminal illness, requiring hands-on assistance to walk, needing an interpreter, a history of psychoses or social aggression, and cognitive impairment (Mini Mental State Examination (MMSE) <23) [21]. A total of 438 patients were recruited to the RESPOND RCT and completed the HLQ. Of these participants, five withdrew prior to completion of the trial. Data from the remaining 433 participants were used for this study.

Data collection
Demographic data were collected by members of the research team at the screening and recruitment phase at the participating hospitals, and the initial face-to-face assessment conducted at the participant's home. The home visit was planned to occur within two weeks of discharge from hospital [20]. The HLQ was self-administered by the participant either prior to or during the home visit.

The health literacy questionnaire (HLQ)
The HLQ comprises 44 items over nine independent scales, each representing a different element of the overall health literacy construct: (1) Feeling understood and supported by healthcare providers; (2) Having sufficient information to manage my health; (3) Actively managing my health; (4) Social support for health; (5) Appraisal of health information; (6) Ability to actively engage with healthcare providers; (7) Navigating the healthcare system; (8) Ability to find good health information; and (9) Understanding health information well enough to know what to do. There are four to six items in each scale. Depending upon the purpose of inquiry, the full instrument or selected scales can be used. The first five scales comprise items that ask the respondents to indicate their level of agreement on one of four response options (strongly disagree to strongly agree). The remaining scales (6-9) represent scales of self-reported capability and items within these scales are scored on one of five response options (cannot do; very difficult; quite difficult; quite easy; very easy). The full HLQ provides nine individual scores based on an average of the items within each of the nine scales. There is no overall total score for the HLQ as that could potentially mask individual needs in specific health literacy domains [22].

Other measures
Socio-economic status (SES) was measured using The Index of Relative Socio-economic Advantage and Disadvantage (IRSAD) [23], a reliable and robust approach to assessing socio-economic status [24]. Data are based on participant postcodes and take into consideration socioeconomic factors such as income, education, employment, occupation and housing [23]. The 20% most advantaged, according to their IRSAD score, were considered to be a relatively high socio-economic group for the purpose of this study. The remaining participants were combined into a second group representing lower socio-economic status.
Whether or not participants have private health insurance or live alone were self-report questions answered yes/no at the time of the initial face-to-face assessment. Falls risk status was measured at the face-to-face interview using a reliable assessment tool: the Falls Risk for Older People -Community setting (FROP-Com) [25]. A FROP-Com score > 18 represented high falls risk [25].

Analyses
Descriptive statistics were used to profile the cohort using SPSS v22.0 (IBM Corporation, Armonk, New York). Rasch analysis was conducted using the partial credit model, as this allows the thresholds to vary for each of the individual items [26], using RUMM2030 software (RUMM Laboratory Pty Ltd., Perth, Australia). In order to determine whether the HLQ scales fit the Rasch model, response patterns to HLQ items were evaluated against the model's expectations [15]. Three statistics were considered to determine the degree of fit for each HLQ scale: overall fit; individual person fit; and individual item fit [15]. Adequate overall fit of the HLQ to the Rasch model was indicated by a non-significant Bonferroni adjusted Chi-square probability value [27] (p ≥ 0.0125 for four item scales (1 and 2); p ≥ 0.01 for five item scales (3, 4, 5, 6, 8 and 9); p ≥ 0.0083 for the six item scale (7)). Satisfactory overall item and individual fit for each scale was determined by a fit residual standard deviation (SD) value of ≤1.5 [27].
Individual items were further analysed to determine whether or not each of the four to six items comprising the nine HLQ scales fit the Rasch model requirements. Individual item fit was indicated by two statistics: fit residual values; and Chi-square probability values [16]. Item fit residual values −2.5 to 2.5 indicated adequate fit [28]. Above this range (underfit) suggests deviation from the model, below (overfit) suggests that some items in the scale are similar to each other [26]. Consistent with overall fit, a non-significant Bonferroni adjusted Chisquare probability value (p > 0.0125 for scales 1 and 2; p > 0.01 for scales 3, 4, 5, 6, 8, and 9; and p > 0.0083 for scale 7) indicated adequate item fit [28].
In addition to model fit the following measurement properties were analysed: unidimensionality; internal consistency reliability; response format; item bias; and targeting. Measurement properties analysed, their definitions, statistical tests used and criteria for assessment are summarised in Table 1.

Participant characteristics
The mean age of participants was 73 years, 55% were female, and 42% of participants lived alone. Most had private health insurance (61%), and most were of high SES (62%). Approximately one third (34%) were classified as being at high risk of falls. Participant characteristics and HLQ scores are presented in Table 2.

Rasch analysis
Three of the nine scales: (5) Appraisal of health information; (8) Ability to find good health information; and (9) Understanding health information well enough to know what to do -demonstrated adequate overall fit to the Rasch model as indicated by a non-significant Bonferroni adjusted Chi-square probability value (p = 0.33; p = 0.02; p = 0.05 respectively) ( Table 3). The remaining scales demonstrated some degree of misfit between the data and the Rasch model (scales 1 and 2 p < 0.0125; scales 3, 4 and 6 p < 0.01; scale 7 p < 0.0083). The majority of item misfit, as determined by a negative item fit residual value below −2.5 (17 items), suggested overfit (Table 4). A further seven items (one item from each of scales 1, 2, 3, 4, 6, 7, and 8) demonstrated underfit with a Chi-square probability below the adjusted alpha value (scale 1 and 2 p < 0.0125; scales 3, 4, 6, and 8 p < 0.01; and scale 7 p < 0.0083) ( Table 4).
Good person fit was demonstrated for the majority of the scales (1, 2, 6, 7, 8, and 9) with a person fit residual SD < 1.5 indicating that overall people responded to items as expected. Minor person misfit was shown across three of the nine scales: (3) Actively managing my health; (4) Social support for health; and (5) Appraisal of health information, with a person fit residual SD >1.5 (Table 3). This suggest that some people responded in an unusual way to some items in these scales.
No item bias was evident for the majority of the HLQ items (43 out of 44), demonstrating that people with the same level of health literacy consistently responded to items in the same way, regardless of their gender or age group. Only one item: 'Get health information by yourself ' from scale (8) Ability to find good health information, demonstrated item bias for gender as indicated by a probability value below the Bonferroni adjusted probability value (p < 0.005). This means that males and females responded differently to each other despite having the same level of health literacy (nonuniform DIF) [16] (Fig. 1).
Overall, the response format was found to be satisfactory for the 'strongly disagree to strongly agree' scales (scale 1 to 5) as indicated by the absence of disordered thresholds. Mild disordering was evident in scale (4) Social support for health, for the following item: 'I have at least one person who can come to medical appointments with me'. Disordered thresholds predominantly occurred among the capability response categories (cannot do to very easy) for the following items: 'discuss  [18].
Local independence is an element of unidimensionality. This occurs where the response to one item is not dependent on the response to another item [18,26].
Internal consistency reliability The degree to which items in each scale measure the same construct [16].
Response format Whether or not participants are able to consistently choose a response category appropriate for their level of health literacy. The point between two response categories (such as strongly agree and agree) where either response is equally probable is known as a 'threshold' [28].
The absence of disordered thresholds on the category probability curve graphs indicates appropriate response format [34].

Item bias
Whether or not different subgroups within the sample respond differently to an item, despite having equal levels of health literacy [16,18]. This is measured using differential item functioning (DIF). Item bias for gender (male or female) and age group (60-75 and 76-90) were analysed.

Targeting
The degree to which the HLQ was appropriately targeted to the RESPOND cohort [16].
Targeting was evaluated through analysis of person-item distribution graphs [35]. The mean person location should approximate zero for a well targeted tool [16]. A positive person mean suggests that on the whole respondents found the scales easy to endorse. A negative person mean suggests that respondents found the scales difficult to endorse. A well targeted scale should see items spanning across the full range of individual person scores.
things with healthcare providers…' and ' Ask healthcare providers questions to get…' from scale (6) Ability to actively engage with healthcare providers; 'Find out what healthcare services you are…' from scale (7) Navigating the healthcare system; 'Find health information from several…' , Get information about health so you are…' , and 'Get health information by yourself' from scale (8) Ability to find good health information; and all items in scale (9) Understanding health information well enough to know what to do. On inspection of the category probability curves, the main issue participants had was choosing between 'very difficult' and 'quite difficult'. The HLQ authors, however, recently changed the capability response options (scales 6-9) to include elements of frequency as well as difficulty, and this was found to be better than the original options [13].
In terms of targeting, a positive mean person location for all nine scales (0.89-2.99) suggested that participants found some of the items easy to endorse. Person-item distribution graphs plot item difficulty and the person's level of health literacy along a common measure: logits. A logit is the unit of measurement that results when the Rasch model is used to transform raw scores from ordinal data to log odds ratios on a common scale [26]. The value of zero is allocated to the mean of the item difficulty [16,26]. There should be an even spread of HLQ items across the range of participants' health literacy levels. On inspection of these graphs there were no items matching participants' level of health literacy at approximately the one to two logit point (mid to high HLQ score) despite a number of participants at this ability level for each scale (Fig. 2).

Discussion
This is the first study to assess the measurement properties of the HLQ among a cohort of older people who have presented to an ED after a fall. Health literacy is an important factor associated with participation in preventive health programs, such as falls prevention initiatives. Overall, the HLQ demonstrated good measurement properties. The summation of the HLQ items within each scale to provide scale summary scores, with each scale representing one distinct component of health literacy, is supported. This finding is consistent with previous validation studies of the HLQ [8][9][10][11]14]. This indicates that each HLQ scale measures what it purports to measure, and nothing more, providing detailed information on nine separate areas of health literacy.
Absence of item bias is considered a fundamental principle of good measurement [15,18]. It is important that items work consistently for individuals across different sub-groups, particularly if different demographic groups are to be compared [18]. Almost all the items (43 of 44) did not demonstrate item bias for the covariates assessed, with minor bias demonstrated for only one item. This suggests that un-biased estimates of health literacy across gender and age groups can be obtained from the HLQ. This finding further supports previous studies that found both the English and Slovakian versions of the HLQ to be invariant across a number of key demographic groups [9,13].
In this study, the majority of misfit suggests that the set of items within some scales may have overlapping content (overfit). Overfit does not compromise good measurement [26]. A strong rationale for including the items is provided in the development of the tool. Multiple structured processes were undertaken to develop the HLQ items, guided by the revised Bloom's taxonomy, to generate items of various difficulty. Detailed psychometric analyses were used to test and refine the items, leading to removal or re-wording of poorly performing items [8]. Given the rigorous development process of the HLQ, deletion of misfitting items is not recommended. Doing so may compromise construct coverage and result in loss of some of the tool's important items [26]. Overall misfit to the Rasch model should be treated with caution. While Chi-square probability values are recommended to determine fit, these values are sensitive to sample size [30]. Given a sufficiently large sample size (n = 433 in this study), even small deviations from model fit will be statistically significant [30]. All nine HLQ scales were found to be inadequately targeted for this sample, which is consistent with findings from Richtering et al. [14]. It is important to note that the RESPOND cohort were not representative of the general population in several ways. Firstly, the cohort consisted of participants who were taking part in a clinical trial. Those who volunteer to participate in research projects may have levels of education, motivation and engagement that differ from those who decline to participate. Secondly, due to the exclusion criteria necessary for the purpose of the RCT, the sample was underrepresented for certain subgroups known to have lower levels of health literacy. For example, those born overseas or who speak languages other than English at home, those with lower education, no private health insurance, multiple chronic conditions, and women have been found to have lower health literacy on some HLQ scales [31]. The RESPOND cohort had higher HLQ scores in seven of the nine HLQ scales (scales 1, 2, 4, 6, 7, 8, and 9), and similar levels of health literacy in two scales (3 and 5), when compared to a sample representing a diverse range of socio-economic and geographical characteristics [31]. This may explain why the RESPOND cohort appeared to find some HLQ items easy to endorse. The measurement gap identified has implications for measurement precision, which decreases at the level corresponding with this gap [32]. This means that a large change in health literacy is necessary in order to elicit a change in mid to high HLQ score for the RESPOND cohort.
The main strength of this study is that the sample was from a multi-centre trial, encompassing two geographically diverse areas of Australia. In terms of limitations, the sample size may have contributed to the significant Chi-square probability values [30]. A further limitation was that the sample was under representative of a number of socioeconomic groups, limiting generalisability of the results to the broader population of older adults who present to an ED after a fall.