Developing the HLS19-YP12 for measuring health literacy in young people: a latent trait analysis using Rasch modelling and confirmatory factor analysis

Le, Christopher; Guttersrud, Øystein; Sørensen, Kristine; Finbråten, Hanne Søberg

doi:10.1186/s12913-022-08831-4

Research
Open access
Published: 06 December 2022

Developing the HLS₁₉-YP12 for measuring health literacy in young people: a latent trait analysis using Rasch modelling and confirmatory factor analysis

Christopher Le¹,
Øystein Guttersrud²,
Kristine Sørensen³ &
…
Hanne Søberg Finbråten¹

BMC Health Services Research volume 22, Article number: 1485 (2022) Cite this article

1849 Accesses
1 Citations
2 Altmetric
Metrics details

Abstract

Background

Accurate and precise measures of health literacy (HL) is supportive for health policy making, tailoring health service design, and ensuring equitable access to health services. According to research, valid and reliable unidimensional HL measurement instruments explicitly targeted at young people (YP) are scarce. Thus, this study aims at assessing the psychometric properties of existing unidimensional instruments and developing an HL instrument suitable for YP aged 16–25 years.

Methods

Applying the HLS₁₉-Q47 in computer-assisted telephone interviews, we collected data in a representative sample comprising 890 YP aged 16–25 years in Norway. Applying the partial credit parameterization of the unidimensional Rasch model for polytomous data (PCM) and confirmatory factor analysis (CFA) with categorical variables, we evaluated the psychometric properties of the short versions of the HLS₁₉-Q47; HLS₁₉-Q12, HLS₁₉-SF12, and HLS₁₉-Q12-NO. A new 12-item short version for measuring HL in YP, HLS₁₉-YP12, is suggested.

Results

The HLS₁₉-Q12 did not display sufficient fit to the PCM, and the HLS₁₉-SF12 was not sufficiently unidimensional. Relative to the PCM, some items in the HLS₁₉-Q12, the HLS₁₉-SF12, and the HLS₁₉-Q12-NO discriminated poorly between participants at high and at low locations on the underlying latent trait. We observed disordered response categories for some items in the HLS₁₉-Q12 and the HLS₁₉-SF12. A few items in the HLS₁₉-Q12, the HLS₁₉-SF12, and the HLS₁₉-Q12-NO displayed either uniform or non-uniform differential item functioning. Applying one-factorial CFA, none of the aforementioned short versions achieved exact fit in terms of non-significant model chi-square statistic, or approximate fit in terms of SRMR ≤ .080 and all entries ≤ .10 that were observed in the respective residual matrix. The newly suggested parsimonious 12-item scale, HLS₁₉-YP12, displayed sufficiently fit to the PCM and achieved approximate fit using one-factorial CFA.

Conclusions

Compared to other parsimonious 12-item short versions of HLS₁₉-Q47, the HLS₁₉-YP12 has superior psychometric properties and unconditionally proved its unidimensionality. The HLS₁₉-YP12 offers an efficient and much-needed screening tool for use among YP, which is likely a useful application in processes towards the development and evaluation of health policy and public health work, as well as for use in clinical settings.

Peer Review reports

Background

In several Western countries, young people (YP) from the age of 16 are expected to take responsibility for health on their own [1]. Today, YP are frequently exposed to health-related information from different sources, such as peers, adults, social media, and commercial enterprises [2]. Several studies have shown that YP might lack sufficient health literacy (HL) to access, understand, critically appraise, and use such information [3, 4].

YP from the age of 16 report worse access to healthcare than does the adult population [1]. According to Levesque et al.’s conceptualization of access to healthcare [5], there are five corresponding abilities of the populations required to generate access: ability to perceive, ability to seek, ability to reach, ability to pay, and ability to engage. These required abilities reflect the importance of individuals’ HL in different health-related situations, e.g., accessing the health services.

Sufficient HL might empower YP to deal with health information, enable, and access health-promoting activities [6]. According to the HLS-EU Consortium, “Health literacy is linked to literacy and entails people’s knowledge, motivation and competences to access, understand, appraise, and apply health information in order to make judgments and take decisions in everyday life concerning healthcare, disease prevention, and health promotion to maintain or improve quality of life during the life course” [7]. Based on the comprehensive definition, the HLS-EU Consortium [7] developed a conceptual model and an associated framework for questionnaire item development, which combined three health domains (HDs) and four cognitive domains (CDs) operationalized into a 12-cell matrix. Subsequently, the 12-cell matrix focuses on finding (F), understanding (U), judging (appraising; J), and applying (A) health information concerning healthcare (HC), disease prevention (DP), and health promotion (HP).

Accurate and precise measurement is vital for identifying vulnerable groups with low HL that might need support in managing health issues, suggesting tailored interventions, and evaluating progress in HL promotion [8]. Only when population HL is appropriately described, the public health and health care services can make targeted prioritizations, become more efficient, continuously improve the quality of services towards vulnerable groups, and contribute to increasing population HL [9]. During the past decades, more than 200 tools have been developed focusing on various aspects of HL [10]. The inconsistencies due to instrument diversity have complicated the interpretation of findings across studies, as well as the choice of instruments for new studies [11, 12]. Another major challenge is that different instruments and tools measure different aspects of HL owing to different definitions, contexts, and/or subpopulations [13].

Several reviews of measurement instruments for youth HL have been published to date [14,15,16,17]. The systematic review of generic HL measurement instruments for children and adolescents [15] revealed that most instruments did not provide sufficient conceptual information, as they only measured the researchers’ own contextual understanding of HL. A more recent systematic review [18] also uncovered an inconsistency in how researchers define HL versus develop measures of HL, in which there is a high risk of missing information necessary to understand the underlying conceptualization of HL in the studies. Subsequently, Guo et al. [14] suggested that most studies on the use of HL instruments applied to children and adolescents were of poor methodological quality, and involved vague descriptions of the target population. Moreover, the best-developed HL instrument for young people (HLAT-8) identified in their review has not been tested for adolescents under 18. The instrument is multidimensional, and was not conceptually developed based on a theoretical framework.

The European Health Literacy Survey Questionnaire (HLS-EU-Q) is widely used to measure HL in adult populations. It was developed on basis of the 12-cell conceptual model of Sørensen et al. [7], reflecting people’s proficiency in finding, understanding, appraising, and applying health information across three health domains: HC, DP, and HP. Several short versions of this comprehensive instrument have been suggested (see the Table 1). As opposed to the 12-item short versions, the 16-item short version, HLS-EU-Q16, does not reflect the 12-cell matrix. The present study, therefore, excluded the 16-item version from the comparative analyses of the short versions. In 2019, the WHO Action Network on Measuring Population and Organisational Health Literacy (M-POHL) revised the HLS-EU-Q47 items for the HLS₁₉ instrument in terms of rewording items and adding/removing instruction details, such as examples within items [19]. Furthermore, the HLS₁₉ Consortium also suggested an additional 12-item short version: HLS₁₉-Q12. The revised HLS₁₉-Q47 and the short version HLS₁₉-Q12 were applied in the HLS₁₉ survey to measure general HL in the adult population in 17 countries. The Table 1 below provides an overview of the HLS₁₉ instrument and its short versions.

Table 1 Overview of HLS-EU/HLS₁₉-Q47 and suggested short versions

Full size table

The psychometric properties of the HLS-EU-Q47 have been widely assessed using several techniques, such as principal component analysis (PCA) [24, 25], confirmatory factor analysis (CFA) [26,27,28,29], and Rasch modelling [21, 23, 30]. Also, the short versions of HLS-EU-Q47; HLS-EU-Q16 [20], HLS-Q12 [21], HLS-SF12 [23], and HLS₁₉-Q12 [19], have been suggested [19,20,21, 23] and validated for adult populations [31, 32], but not in YP. Nonetheless, Okan et al. [15] concluded that there still is a lack of valid and reliable unidimensional scales for measuring general HL explicitly targeted at YP.

Consequently, our aims are to: (1) evaluate the psychometric properties of the 12-item short versions of the HLS₁₉-Q47 in YP and (2) consecutively suggest a parsimonious unidimensional short version suitable for measuring general HL among YP. Specifically, the hypothesis is that when applied in YP aged 16–25, the short versions of the HLS₁₉-Q47 achieve approximate fit and display acceptable goodness of fit-indices when evaluated using CFA, and are sufficiently unidimensional, well-targeted scales with acceptable person separation (reliability), consisting of independent and invariant items at the ordinal level (i.e., ordered response categories) each displaying sufficient fit to the unidimensional Rasch model. This hypothesis forms the basis for comparison against the psychometric properties of the consecutively suggested parsimonious unidimensional short version: HLS₁₉-YP12.

Methods

Sampling and data collection

This study used data from the Norwegian part of the Health Literacy Survey 2019–2021 (HLS₁₉) [22], which was collected during April–October 2020. The Norwegian HLS₁₉ study applied a population-based cross-sectional survey study design, and was funded by the Norwegian Directorate of Health. The survey was conducted in cooperation with Oslo Metropolitan University and Inland Norway University of Applied Sciences. A Norwegian market research agency (Norstat), with access to country representative strata, collected the data using computer-assisted telephone interviewing (CATI). The data collection was performed in two steps. In the first step (n = 3000) data on the comprehensive 47-item instrument were collected, whereas in the second step (n = 3000) data were collected only on the two short versions: HLS₁₉-Q12-NO and HLS₁₉-Q12. Out of 6000 participants, 890 participants met our inclusion criteria “YP aged 16–25 years”, and 419 responded to the comprehensive scale HLS₁₉-Q47.

Characteristics of the participants

The study’s sample included 890 participants with a slight predominance of males (Table 2). Due to the stepwise data collection, only the smaller sample (n = 419) was applicable to the scales: HLS₁₉-YP12, HLS₁₉-SF12, and HLS₁₉-Q47. Most of the participants have an education equal to upper secondary school or lower. Two-thirds report belonging to the upper social level, and above three quarters report no economic deprivation. Most of the participants also report being healthy.

Table 2 Distribution of participants’ sociodemographic factors

Full size table

Measures, translation, and cultural adaptations

In combination with the HL scales, we collected person factors and covariates, such as age, gender, education, self-reported level in the society, economic deprivation, long-term illness, and health status. In addition, the HL-scales have been culturally adapted and translated into Norwegian as described below.

The HLS₁₉-Q47 and its 12-item short versions

The HLS₁₉-Q47 and its 12-item short versions (see the Table 1) reflect the conceptual model of Sørensen et al. [25], and uses a 4-point rating scale with the response categories: (1) very difficult, (2) difficult, (3) easy, and (4) very easy. Moreover, the “don’t know” response category was used when stated spontaneously by the participants, which was recoded to missing data in the analyses.

Translation and cultural adaptation of the HLS₁₉-Q47

The translation of the HLS₁₉-Q47 was performed in accordance with Brislin’s protocol [33]. The questionnaire was translated from English to Norwegian by two bilingual persons (translators) independently. The concept of HL was deeply understood by the translators, and they were experienced questionnaire developers. The two translators compared their translated versions and discussed item content and wording. A third person read the Norwegian translation, made comments, and suggested amendments. A professional translator was engaged to do a back-translation when consensus had been reached. The original English version was then compared with the back-translated version, in order to gain the most semantically, technically, and contextually equivalent versions. Finally, the translation was quality-assured by the data collection agency (Norstat). To ensure that the item contents were understood and could be considered relevant also in a Norwegian context, cognitive interviews with a think aloud-procedure [34] were conducted when translating the HLS-EU-Q47 [30]. The results from these cognitive interviews were monitored as part of the translation process in the current study.

Pilot testing of the instruments

Prior to the main data collection, a pilot of the instruments was conducted in several institutions and organizations, such as municipalities, directorates, universities, NGOs, and hospitals. Some HLS₁₉-Q47 items were revised based on results from the pilot survey. These amendments were based on empirical observations interpreted in light of theoretical expectations.

Model estimation

Rasch modelling

There are three main item response theory (IRT) models: 1) the one-parameter IRT model, 2) the two-parameter model, and 3) the three-parameter model. The one-parameter IRT model corresponds to the Rasch model. Distinct from other IRT models, the Rasch models meet requirements of fundamental measurement, such as sufficiency [35], additivity [36], invariance [37], and specific objectivity [38]. On this background, the unidimensional Rasch model was applied in this study.

We tested data up against the partial credit parameterization [39] of the unidimensional Rasch model for polytomous data [40], and up against the partial credit parameterization of the “between-item” “multidimensional random coefficients multinomial logit” (MRCML) model [41]. The latter was used when testing the HLS₁₉-Q47 data up against a 12-dimensional model that reflects all 12 cells in the HLS-EU HL matrix: three health domains by four cognitive domains (12 correlated sub-scales). Using the unidimensional approach, we assume perfectly correlated subscales, that is, three perfectly aligned health domains (HP, DP, and HC) and/or four perfectly aligned cognitive domains (F, U, J and A). Using the three- and 12- dimensional approaches, we relax this constraint and allow health domains and/or cognitive domains to covary. Additionally, consecutive approach (treating the subscales as orthogonal or uncorrelated) was used when assessing item invariance. Models were estimated by applying the ConQuest 5 software [42] and the RUMM2030plus software [43].

For item-location estimates, RUMM2030plus uses pairwise maximum likelihood estimation (PMLE) [44], while ConQuest 5 uses marginal maximum likelihood estimation (MMLE) [45]. Normality may be considered a prerequisite when using maximum likelihood estimation. As such, the raw data obtained from the scales measuring YP’s HL were transformed into person-location estimates (logit values) using RUMM2030plus and ConQuest 5 software. Subsequently, the transformed data could be considered continuous and at interval level, and there is evidence of data normality when examining the normal distribution histograms. For unbiased person-location estimates, both softwares apply Warm’s mean weighted likelihood estimation (WLE) [46]. The average item-location estimate was set to 0.0 in all analyses.

Using Rasch measurement theory, we evaluated dimensionality, response dependency, targeting, reliability, item fit, differential item functioning (DIF), and ordering of response categories.

Dimensionality

For each of the instrument versions, the dimensionality was assessed applying the combined principal component analysis (PCA) of residuals and paired t-test procedure [43, 47]. Based on the PCA, two subsets of items were identified. Person-location estimates on the respective two subsets were then compared using paired t-test. Multidimensionality is indicated when the proportion of individuals with significantly different person-location estimates on the compared subscales exceeds 5% [47, 48]. Unidimensionality is deemed to be strictly proved as opposed to multidimensionality [49]. Given a normal distribution of the differences in person-location estimates derived from the two subsets, Tennant and Pallant [50] claimed that this approach is robust enough to detect multidimensionality. In such a case, where the proportion of individuals with significantly different person-location estimates on the compared subscales exceeds 5%, we also manually performed the binomial test, which is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories. If the proportion lower bound 95% confidence interval in terms of number of significant t-tests is lower than or equal to 0.05 (5%), then the scale could be considered sufficiently unidimensional.

Response dependency

Effective instruments do not collect redundant information and are free from response dependency, which is present when responses to an item are statistically dependent on the responses to a previous item. The average of the residual correlations added to 0.2 (average + 0.2) was used as a cut-off to indicate possible “significant” response dependency [51]. When the responses to a pair of items are locally dependent, we may construct a subtest or, when developing instruments, delete one of the items.

Targeting of persons and items

For a well-targeted scale, the distribution of the person-estimates should match the distribution of the item threshold estimates or difficulties [52]. As the scale is always centered on zero logits in the Rasch software, the mean person location value for a well-targeted scale would be close to the value of zero. Poor targeting may result in deflated variance in person estimates, which subsequently leads to poor person separation and deflated “test–retest” reliability indexes.

Reliability – internal consistency

The person separation reliability (PSR) and the person separation index (PSI) were estimated using the ConQuest 5 software and the RUMM2030plus software, respectively. In addition, Omega was estimated using the Mplus 8.6 software and the Microsoft excel-based tool to calculate ordinal Omega by standardized factor loadings and standardized residual variances [53]. Frisbie [54] has suggested that the reliability of the sum scores should exceed 0.85 or 0.65 when drawing conclusions at the individual or group level, respectively.

Individual item fit

Using ConQuest 5, weighted Mean Square Error (infit MNSQ) or variance-weighted fit residual was used to indicate individual item fit to the Rasch model [55]. The expected infit MNSQ value is 1, which implies perfect data-model fit. Using instruments at the population level, we consider 0.7 > infit < 1.3 as sufficient [32, 56]. Furthermore, item under- and over-discrimination relative to Rasch models was indicated by values significantly different from the expected value of 1 with an absolute value of the T statistic higher than 1.96 [55, 57]. Under-discriminating items most likely measure too much of “something else” that does not correlate positively with the latent trait, with the result that they will not discriminate sufficiently well between persons with high and low standing on the latent trait [58].

A non-significant chi-square item fit statistic (p > 0.05) indicates good data-model fit, but the probability of detecting significant values or “misfit items” increases by the number of significance tests performed. The Bonferroni correction is one of several methods to counteract this effect [59]. For a 12-item scale, the Bonferroni adjusted chi-square probability is p/12 = 0.05/12 = 0.004.

Differential item functioning

A central requirement of the Rasch model is measurement invariance, which means that items should function in the same way across different groups of people [60], such as gender and people with different health status. Items display differential item functioning (DIF) when items have different relative difficulty (uniform DIF) or discriminate differently (non-uniform DIF) for different groups of people.

We explored whether the items displayed DIF for selected person factors by two-way analysis of variance (ANOVA) of standardized residuals and inspecting graphical displays [60]. Owing to the inclusion criteria “YP aged 16–25 years”, we dichotomized participants’ highest education level (“upper secondary school or below” versus “above upper secondary school”), and we dichotomized participants’ age accordingly (16–20 years old versus 21–25 years old). Participants’ self-reported social status on a scale from 1 to 10 was dichotomized, as the two age groups probably define their level in the society based on different criteria due to life experiences: education level, living conditions, and economic status. Economic deprivation was present, as some reported difficulties with paying bills at the end of the month. Participants described their health status (mostly healthy or increased risk of/having a chronic health problem) and reported whether they suffered from long-term illness expected to last or had lasted for at least six months.

Ordered response categories

Polytomous items (here: 4-point response scale) with ordered response categories yield categorical data at the ordinal level. This implies significantly different and ordered thresholds, where thresholds are the locations at the latent trait where adjacent response categories are equally likely [60]. Disordered thresholds indicate response categories not working as intended [61].

Confirmatory factor modelling

Using the software Mplus 8.6 [62], one- and three-factorial CFAs of the HLS₁₉-YP12, HLS₁₉-Q12, HLS₁₉-SF12, and HLS₁₉-Q12-NO data, were conducted to examine the correlation structure and item loadings in light of the theoretical framework – the HLS-EU health literacy matrix [7]. The one-, two-, three-, four- and 12-factorial CFAs of the HLS₁₉-Q47 data were supplementarily performed to assist confirmation of prior studies.

Following Asparouhov and Muthén [63], a significant model chi-square statistic implies that the suggested confirmatory factor model fails the “exact fit test”. Applying categorical data, weighted least square (WLS) estimator was used to obtain the model chi-square statistic [64]. Other fit indices were estimated using robust diagonally weighted least squares (WLSMV) estimator: a default option for categorical data in Mplus 8.6. Using WLSMV estimators with ordered-category data, polychoric correlation coefficients were estimated and reported in Table 3.

Table 3 Descriptive statistics and correlation matrix of HLS₁₉-YP12, with variances on the diagonal

Full size table

Other absolute fit indices below their target value, such as the standardized root mean square residual (SRMR ≤ 0.080) combined with small residual correlation matrix entries [63] (i.e., absolute value ≤ 0.10) [65], indicate approximate fit. Other “goodness of fit” (GOF) indices (with target value in parenthesis) may assist model evaluation, such as the root mean square error of approximation (RMSEA ≤ 0.06), comparative fit index (CFI ≥ 0.95), and Tucker-Lewis index (TLI ≥ 0.95) [66]. However, RMSEA values ≤ 0.08 may be considered acceptable in a small sample, whereas the other GOF indices suggest a good model fit. Additionally, CFI between 0.90 and 0.95 also indicates reasonable fit, while values < 0.90 are considered poor fit [67].

Developing the HLS₁₉-YP12

The suggested 12-item short version in the present study was developed from analyses of the HLS₁₉-Q47 and the other three 12-item short versions, applied in YP aged 16–25 in Norway. The development was stepwise: 1) exclude items that in the Rasch analyses displayed poor fit, DIF, disordered response categories, and that might collect redundant information; and 2) using CFA to assess the fit statistics, in which large residual correlation matrix entries indicate the need for model modifications. Items included in the suggested version were continuously ensured reflecting the conceptual 12-cell matrix.

Handling missing data

Missing data also comprises “don’t know” responses, which on average made up 2 percent of the data. The highest missing rates (5–7%) were observed for items 2, 3, 10, 11, 19 and 34, while items 8, 14, 22, 32, 33, 37, 38, 39, 40, 42, 43 and 44 had less than 1% missing values. However, using full information maximum likelihood (FIML) estimation, person-locations and item-thresholds are estimated based on available information [62].

Results

Descriptive statistics and correlations between the items of HLS₁₉-YP12

For all items, the percentage of participants who had the “difficult” and “very difficult” responses is lower than the percentage for responses of “easy” and “very easy” (Table 3). The most difficult items were item41, item10, and item18 with 46, 43, and 42% of (very) difficult responses, respectively. The easiest items were item4, item23, item46, and item13 with 86, 84, 81, and 80% of (very) easy responses, respectively. The correlations between the items of HLS₁₉-YP12 could be considered small to medium (range: 0.190 – 0.474).

Overall data-model fit and unidimensionality of 12-item short versions

The HLS₁₉-YP12, the HLS₁₉-Q12-NO, and the HLS₁₉-SF12 data displayed sufficiently overall fit to the PCM (non-significant overall chi-square statistic), while the HLS₁₉-Q12 data did not. All short versions explored in our study had reliability indexes (PSR, PSI and Omega) above 0.65. The HLS₁₉-YP12, the HLS₁₉-Q12, and the HLS₁₉-Q12-NO are considered sufficiently unidimensional, while the HLS₁₉-SF12 is not (Table 4).

Table 4 Overall data-model fit, reliability, and unidimensionality by applying Rasch modelling of the 12-item short scales

Full size table

No response dependency was observed for any short version, but the HLS₁₉-Q47 suffers from serious local dependency with up to 35 pairs of dependent items when applying the unidimensional PCM. For details, see Supplementary Table S1.

No short version was particularly well-targeted to the YP, but the distribution of item-threshold locations and the distribution of person locations were best aligned for the HLS₁₉-YP12 (Fig. 1); mean person location for the scales HLS₁₉-YP12, HLS₁₉-Q12, HLS₁₉-SF12, and HLS₁₉-Q12-NO were 1.035, 1.155, 1.141, and 1.084, respectively (Table 4).

Exploring dimensionality by using confirmatory factor analysis

Comparing one- and three-factorial models, only the one-factor model of HLS₁₉-YP12 achieved approximate fit with acceptable SRMR (0.030) and with no entry in the residual correlation matrix > 0.10 (Table 5). Supplementary Table S2 provides an overview of all entries in the residual correlation matrix based on all four 12-item scales, applying both one- and three-factor models. Other GOF indices indicated that the model-implied correlation matrix sufficiently well re-created the observed correlation matrix: RMSEA (0.039; 0.034), CFI/TLI (0.985/0.981; 0.989/0.986) (Table 5). Results related to the comprehensive scale HLS₁₉-Q47 are supplementarily reported in Supplementary Table S3.

Table 5 Fit statistics for different factor structures of 12-item short versions applying CFA

Full size table

While all short versions: HLS₁₉-YP12, HLS₁₉-Q12, HLS₁₉-SF12, and HLS₁₉-Q12-NO, achieved SRMR < 0.080 for both one- and three-factorial models, the HLS₁₉-SF12 had most entries in the residual correlation matrix > 0.10, whereas the HLS₁₉-YP12 had none for the one-factor model and only one high entry (-0.13) for the three-factor model. Among the 12-item short scales, the HLS₁₉-YP12 obtained the most acceptable standardized factor loadings applying the one-factor structure model (all items > 0.500) (Table 6).

Table 6 Factor loadings for the items in the respective 12-item short versions when a one-factor structure model is considered

Full size table

Rasch analyses at item level for HLS₁₉-YP12, HLS₁₉-Q12, HLS₁₉-SF12, and HLS₁₉-Q12-NO

Individual item fit

Applying unidimensional Rasch modelling, all items for all short versions had acceptable infit values (Tables 7, 8, 9 and 10). For the HLS₁₉-Q12, item31 had a T-value of 2.1 meaning that the item under-discriminated relative to the PCM. In addition, Bonferroni-adjusted chi-square probability (chi-square: 21.18; p < 0.001) for item42 in the same scale was significant (not reported in the Tables). Significant total item chi-square (Table 4) indicated also problems at the individual item level. Following this problem, Class Interval main effect indicating item misfit was also observed for this item concerning all person factor variables: age, gender, education, economic deprivation, level in society, long-term illness, and health status. Class Interval main effect was also observed, but only for the person factor “long-term illness”, in item45 in the HLS₁₉-SF12 scale. Supplementary investigation of the HLS₁₉-Q47 showed, however, there were five items (29, 34, 38, 41, 45) in the 12-dimensional model that under-discriminated relative to the PCM (Supplementary Table S4).

Table 7 Item characteristics, ordering of response categories, and DIF of the 12-item short version HLS₁₉-YP12

Full size table

Table 8 Item characteristics, ordering of response categories, and DIF of the 12-item short version HLS₁₉-Q12

Full size table

Table 9 Item characteristics, ordering of response categories, and DIF of the 12-item short version HLS₁₉-SF12

Full size table

Table 10 Item characteristics, ordering of response categories, and DIF of the 12-item short version HLS₁₉-Q12-NO

Full size table

Differential item functioning—DIF

While there was no DIF observed, neither graphical nor by significant ANOVA tests, for any item in the HLS₁₉-YP12, significantly uniform DIF was observed for the HLS₁₉-Q12-NO in item14 for the “level in society” subgroups, whereas item45 in the HLS₁₉-SF12 scale displayed significantly non-uniform DIF for the “long-term illness” subgroups (Fig. 2). Disregarding statistical Bonferroni-adjusted non-significance, investigation of the items using the item characteristic curves (ICCs) graphically displayed uniform DIF for the HLS₁₉-Q12 in item42 for the “level in society” subgroups and for the HLS₁₉-SF12 in item6 for the “health status” subgroups (not reported in the Figures).

Ordering of response categories

Among the four short versions, only item15 in the HLS₁₉-SF12 and item16 in the HLS₁₉-Q12 displayed disordered response categories. Figure 3 shows that response category “2” in both items was not the most likely category for any location on the continuum of person location estimates.

Discussion

In several Western health care systems, the patient role has been redefined expecting patients to be a more active part in his/her care and decision-making [68]. Accurate and precise measure of HL is very supportive for tailoring the communication between patients and health providers during the patient pathway. Similarly for the targeted public health measures. All this also applied to YP from the age of 16.

Despite the fact that the HLS₁₉-Q47 and its short versions, HLS₁₉-Q12, HLS₁₉-SF12 and HLS₁₉-Q12-NO, have been well studied and validated for the adult populations [21, 23, 31, 32], this study, to our knowledge, is the first one that simultaneously assessed the psychometric properties of all recently suggested 12-item versions of the HLS₁₉-Q47 applied in YP aged 16–25.

Based on data from the Norwegian HLS₁₉ study, the empirical evidence has weakened our null hypothesis associated with the psychometric properties of the previously 12-item short versions of the HLS₁₉-Q47, i.e., HLS₁₉-Q12, HLS₁₉-SF12, and HLS₁₉-Q12-NO. By examining poorly fitting items displayed from Rasch modelling and CFA, we successfully established a psychometrically sound parsimonious 12-item version (HLS₁₉-YP12) for use among YP aged 16–25 years.

The empirical evidence suggested that the HLS₁₉-YP12 has superior psychometric properties and convincingly outperforms other recently available 12-item short versions of the HLS₁₉-Q47, i.e., HLS₁₉-Q12, HLS₁₉-SF12, and HLS₁₉-Q12-NO.