Skip to main content

Developing the HLS19-YP12 for measuring health literacy in young people: a latent trait analysis using Rasch modelling and confirmatory factor analysis



Accurate and precise measures of health literacy (HL) is supportive for health policy making, tailoring health service design, and ensuring equitable access to health services. According to research, valid and reliable unidimensional HL measurement instruments explicitly targeted at young people (YP) are scarce. Thus, this study aims at assessing the psychometric properties of existing unidimensional instruments and developing an HL instrument suitable for YP aged 16–25 years.


Applying the HLS19-Q47 in computer-assisted telephone interviews, we collected data in a representative sample comprising 890 YP aged 16–25 years in Norway. Applying the partial credit parameterization of the unidimensional Rasch model for polytomous data (PCM) and confirmatory factor analysis (CFA) with categorical variables, we evaluated the psychometric properties of the short versions of the HLS19-Q47; HLS19-Q12, HLS19-SF12, and HLS19-Q12-NO. A new 12-item short version for measuring HL in YP, HLS19-YP12, is suggested.


The HLS19-Q12 did not display sufficient fit to the PCM, and the HLS19-SF12 was not sufficiently unidimensional. Relative to the PCM, some items in the HLS19-Q12, the HLS19-SF12, and the HLS19-Q12-NO discriminated poorly between participants at high and at low locations on the underlying latent trait. We observed disordered response categories for some items in the HLS19-Q12 and the HLS19-SF12. A few items in the HLS19-Q12, the HLS19-SF12, and the HLS19-Q12-NO displayed either uniform or non-uniform differential item functioning. Applying one-factorial CFA, none of the aforementioned short versions achieved exact fit in terms of non-significant model chi-square statistic, or approximate fit in terms of SRMR ≤ .080 and all entries ≤ .10 that were observed in the respective residual matrix. The newly suggested parsimonious 12-item scale, HLS19-YP12, displayed sufficiently fit to the PCM and achieved approximate fit using one-factorial CFA.


Compared to other parsimonious 12-item short versions of HLS19-Q47, the HLS19-YP12 has superior psychometric properties and unconditionally proved its unidimensionality. The HLS19-YP12 offers an efficient and much-needed screening tool for use among YP, which is likely a useful application in processes towards the development and evaluation of health policy and public health work, as well as for use in clinical settings.

Peer Review reports


In several Western countries, young people (YP) from the age of 16 are expected to take responsibility for health on their own [1]. Today, YP are frequently exposed to health-related information from different sources, such as peers, adults, social media, and commercial enterprises [2]. Several studies have shown that YP might lack sufficient health literacy (HL) to access, understand, critically appraise, and use such information [3, 4].

YP from the age of 16 report worse access to healthcare than does the adult population [1]. According to Levesque et al.’s conceptualization of access to healthcare [5], there are five corresponding abilities of the populations required to generate access: ability to perceive, ability to seek, ability to reach, ability to pay, and ability to engage. These required abilities reflect the importance of individuals’ HL in different health-related situations, e.g., accessing the health services.

Sufficient HL might empower YP to deal with health information, enable, and access health-promoting activities [6]. According to the HLS-EU Consortium, “Health literacy is linked to literacy and entails people’s knowledge, motivation and competences to access, understand, appraise, and apply health information in order to make judgments and take decisions in everyday life concerning healthcare, disease prevention, and health promotion to maintain or improve quality of life during the life course” [7]. Based on the comprehensive definition, the HLS-EU Consortium [7] developed a conceptual model and an associated framework for questionnaire item development, which combined three health domains (HDs) and four cognitive domains (CDs) operationalized into a 12-cell matrix. Subsequently, the 12-cell matrix focuses on finding (F), understanding (U), judging (appraising; J), and applying (A) health information concerning healthcare (HC), disease prevention (DP), and health promotion (HP).

Accurate and precise measurement is vital for identifying vulnerable groups with low HL that might need support in managing health issues, suggesting tailored interventions, and evaluating progress in HL promotion [8]. Only when population HL is appropriately described, the public health and health care services can make targeted prioritizations, become more efficient, continuously improve the quality of services towards vulnerable groups, and contribute to increasing population HL [9]. During the past decades, more than 200 tools have been developed focusing on various aspects of HL [10]. The inconsistencies due to instrument diversity have complicated the interpretation of findings across studies, as well as the choice of instruments for new studies [11, 12]. Another major challenge is that different instruments and tools measure different aspects of HL owing to different definitions, contexts, and/or subpopulations [13].

Several reviews of measurement instruments for youth HL have been published to date [14,15,16,17]. The systematic review of generic HL measurement instruments for children and adolescents [15] revealed that most instruments did not provide sufficient conceptual information, as they only measured the researchers’ own contextual understanding of HL. A more recent systematic review [18] also uncovered an inconsistency in how researchers define HL versus develop measures of HL, in which there is a high risk of missing information necessary to understand the underlying conceptualization of HL in the studies. Subsequently, Guo et al. [14] suggested that most studies on the use of HL instruments applied to children and adolescents were of poor methodological quality, and involved vague descriptions of the target population. Moreover, the best-developed HL instrument for young people (HLAT-8) identified in their review has not been tested for adolescents under 18. The instrument is multidimensional, and was not conceptually developed based on a theoretical framework.

The European Health Literacy Survey Questionnaire (HLS-EU-Q) is widely used to measure HL in adult populations. It was developed on basis of the 12-cell conceptual model of Sørensen et al. [7], reflecting people’s proficiency in finding, understanding, appraising, and applying health information across three health domains: HC, DP, and HP. Several short versions of this comprehensive instrument have been suggested (see the Table 1). As opposed to the 12-item short versions, the 16-item short version, HLS-EU-Q16, does not reflect the 12-cell matrix. The present study, therefore, excluded the 16-item version from the comparative analyses of the short versions. In 2019, the WHO Action Network on Measuring Population and Organisational Health Literacy (M-POHL) revised the HLS-EU-Q47 items for the HLS19 instrument in terms of rewording items and adding/removing instruction details, such as examples within items [19]. Furthermore, the HLS19 Consortium also suggested an additional 12-item short version: HLS19-Q12. The revised HLS19-Q47 and the short version HLS19-Q12 were applied in the HLS19 survey to measure general HL in the adult population in 17 countries. The Table 1 below provides an overview of the HLS19 instrument and its short versions.

Table 1 Overview of HLS-EU/HLS19-Q47 and suggested short versions

The psychometric properties of the HLS-EU-Q47 have been widely assessed using several techniques, such as principal component analysis (PCA) [24, 25], confirmatory factor analysis (CFA) [26,27,28,29], and Rasch modelling [21, 23, 30]. Also, the short versions of HLS-EU-Q47; HLS-EU-Q16 [20], HLS-Q12 [21], HLS-SF12 [23], and HLS19-Q12 [19], have been suggested [19,20,21, 23] and validated for adult populations [31, 32], but not in YP. Nonetheless, Okan et al. [15] concluded that there still is a lack of valid and reliable unidimensional scales for measuring general HL explicitly targeted at YP.

Consequently, our aims are to: (1) evaluate the psychometric properties of the 12-item short versions of the HLS19-Q47 in YP and (2) consecutively suggest a parsimonious unidimensional short version suitable for measuring general HL among YP. Specifically, the hypothesis is that when applied in YP aged 16–25, the short versions of the HLS19-Q47 achieve approximate fit and display acceptable goodness of fit-indices when evaluated using CFA, and are sufficiently unidimensional, well-targeted scales with acceptable person separation (reliability), consisting of independent and invariant items at the ordinal level (i.e., ordered response categories) each displaying sufficient fit to the unidimensional Rasch model. This hypothesis forms the basis for comparison against the psychometric properties of the consecutively suggested parsimonious unidimensional short version: HLS19-YP12.


Sampling and data collection

This study used data from the Norwegian part of the Health Literacy Survey 2019–2021 (HLS19) [22], which was collected during April–October 2020. The Norwegian HLS19 study applied a population-based cross-sectional survey study design, and was funded by the Norwegian Directorate of Health. The survey was conducted in cooperation with Oslo Metropolitan University and Inland Norway University of Applied Sciences. A Norwegian market research agency (Norstat), with access to country representative strata, collected the data using computer-assisted telephone interviewing (CATI). The data collection was performed in two steps. In the first step (n = 3000) data on the comprehensive 47-item instrument were collected, whereas in the second step (n = 3000) data were collected only on the two short versions: HLS19-Q12-NO and HLS19-Q12. Out of 6000 participants, 890 participants met our inclusion criteria “YP aged 16–25 years”, and 419 responded to the comprehensive scale HLS19-Q47.

Characteristics of the participants

The study’s sample included 890 participants with a slight predominance of males (Table 2). Due to the stepwise data collection, only the smaller sample (n = 419) was applicable to the scales: HLS19-YP12, HLS19-SF12, and HLS19-Q47. Most of the participants have an education equal to upper secondary school or lower. Two-thirds report belonging to the upper social level, and above three quarters report no economic deprivation. Most of the participants also report being healthy.

Table 2 Distribution of participants’ sociodemographic factors

Measures, translation, and cultural adaptations

In combination with the HL scales, we collected person factors and covariates, such as age, gender, education, self-reported level in the society, economic deprivation, long-term illness, and health status. In addition, the HL-scales have been culturally adapted and translated into Norwegian as described below.

The HLS19-Q47 and its 12-item short versions

The HLS19-Q47 and its 12-item short versions (see the Table 1) reflect the conceptual model of Sørensen et al. [25], and uses a 4-point rating scale with the response categories: (1) very difficult, (2) difficult, (3) easy, and (4) very easy. Moreover, the “don’t know” response category was used when stated spontaneously by the participants, which was recoded to missing data in the analyses.

Translation and cultural adaptation of the HLS19-Q47

The translation of the HLS19-Q47 was performed in accordance with Brislin’s protocol [33]. The questionnaire was translated from English to Norwegian by two bilingual persons (translators) independently. The concept of HL was deeply understood by the translators, and they were experienced questionnaire developers. The two translators compared their translated versions and discussed item content and wording. A third person read the Norwegian translation, made comments, and suggested amendments. A professional translator was engaged to do a back-translation when consensus had been reached. The original English version was then compared with the back-translated version, in order to gain the most semantically, technically, and contextually equivalent versions. Finally, the translation was quality-assured by the data collection agency (Norstat). To ensure that the item contents were understood and could be considered relevant also in a Norwegian context, cognitive interviews with a think aloud-procedure [34] were conducted when translating the HLS-EU-Q47 [30]. The results from these cognitive interviews were monitored as part of the translation process in the current study.

Pilot testing of the instruments

Prior to the main data collection, a pilot of the instruments was conducted in several institutions and organizations, such as municipalities, directorates, universities, NGOs, and hospitals. Some HLS19-Q47 items were revised based on results from the pilot survey. These amendments were based on empirical observations interpreted in light of theoretical expectations.

Model estimation

Rasch modelling

There are three main item response theory (IRT) models: 1) the one-parameter IRT model, 2) the two-parameter model, and 3) the three-parameter model. The one-parameter IRT model corresponds to the Rasch model. Distinct from other IRT models, the Rasch models meet requirements of fundamental measurement, such as sufficiency [35], additivity [36], invariance [37], and specific objectivity [38]. On this background, the unidimensional Rasch model was applied in this study.

We tested data up against the partial credit parameterization [39] of the unidimensional Rasch model for polytomous data [40], and up against the partial credit parameterization of the “between-item” “multidimensional random coefficients multinomial logit” (MRCML) model [41]. The latter was used when testing the HLS19-Q47 data up against a 12-dimensional model that reflects all 12 cells in the HLS-EU HL matrix: three health domains by four cognitive domains (12 correlated sub-scales). Using the unidimensional approach, we assume perfectly correlated subscales, that is, three perfectly aligned health domains (HP, DP, and HC) and/or four perfectly aligned cognitive domains (F, U, J and A). Using the three- and 12- dimensional approaches, we relax this constraint and allow health domains and/or cognitive domains to covary. Additionally, consecutive approach (treating the subscales as orthogonal or uncorrelated) was used when assessing item invariance. Models were estimated by applying the ConQuest 5 software [42] and the RUMM2030plus software [43].

For item-location estimates, RUMM2030plus uses pairwise maximum likelihood estimation (PMLE) [44], while ConQuest 5 uses marginal maximum likelihood estimation (MMLE) [45]. Normality may be considered a prerequisite when using maximum likelihood estimation. As such, the raw data obtained from the scales measuring YP’s HL were transformed into person-location estimates (logit values) using RUMM2030plus and ConQuest 5 software. Subsequently, the transformed data could be considered continuous and at interval level, and there is evidence of data normality when examining the normal distribution histograms. For unbiased person-location estimates, both softwares apply Warm’s mean weighted likelihood estimation (WLE) [46]. The average item-location estimate was set to 0.0 in all analyses.

Using Rasch measurement theory, we evaluated dimensionality, response dependency, targeting, reliability, item fit, differential item functioning (DIF), and ordering of response categories.


For each of the instrument versions, the dimensionality was assessed applying the combined principal component analysis (PCA) of residuals and paired t-test procedure [43, 47]. Based on the PCA, two subsets of items were identified. Person-location estimates on the respective two subsets were then compared using paired t-test. Multidimensionality is indicated when the proportion of individuals with significantly different person-location estimates on the compared subscales exceeds 5% [47, 48]. Unidimensionality is deemed to be strictly proved as opposed to multidimensionality [49]. Given a normal distribution of the differences in person-location estimates derived from the two subsets, Tennant and Pallant [50] claimed that this approach is robust enough to detect multidimensionality. In such a case, where the proportion of individuals with significantly different person-location estimates on the compared subscales exceeds 5%, we also manually performed the binomial test, which is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories. If the proportion lower bound 95% confidence interval in terms of number of significant t-tests is lower than or equal to 0.05 (5%), then the scale could be considered sufficiently unidimensional.

Response dependency

Effective instruments do not collect redundant information and are free from response dependency, which is present when responses to an item are statistically dependent on the responses to a previous item. The average of the residual correlations added to 0.2 (average + 0.2) was used as a cut-off to indicate possible “significant” response dependency [51]. When the responses to a pair of items are locally dependent, we may construct a subtest or, when developing instruments, delete one of the items.

Targeting of persons and items

For a well-targeted scale, the distribution of the person-estimates should match the distribution of the item threshold estimates or difficulties [52]. As the scale is always centered on zero logits in the Rasch software, the mean person location value for a well-targeted scale would be close to the value of zero. Poor targeting may result in deflated variance in person estimates, which subsequently leads to poor person separation and deflated “test–retest” reliability indexes.

Reliability – internal consistency

The person separation reliability (PSR) and the person separation index (PSI) were estimated using the ConQuest 5 software and the RUMM2030plus software, respectively. In addition, Omega was estimated using the Mplus 8.6 software and the Microsoft excel-based tool to calculate ordinal Omega by standardized factor loadings and standardized residual variances [53]. Frisbie [54] has suggested that the reliability of the sum scores should exceed 0.85 or 0.65 when drawing conclusions at the individual or group level, respectively.

Individual item fit

Using ConQuest 5, weighted Mean Square Error (infit MNSQ) or variance-weighted fit residual was used to indicate individual item fit to the Rasch model [55]. The expected infit MNSQ value is 1, which implies perfect data-model fit. Using instruments at the population level, we consider 0.7 > infit < 1.3 as sufficient [32, 56]. Furthermore, item under- and over-discrimination relative to Rasch models was indicated by values significantly different from the expected value of 1 with an absolute value of the T statistic higher than 1.96 [55, 57]. Under-discriminating items most likely measure too much of “something else” that does not correlate positively with the latent trait, with the result that they will not discriminate sufficiently well between persons with high and low standing on the latent trait [58].

A non-significant chi-square item fit statistic (p > 0.05) indicates good data-model fit, but the probability of detecting significant values or “misfit items” increases by the number of significance tests performed. The Bonferroni correction is one of several methods to counteract this effect [59]. For a 12-item scale, the Bonferroni adjusted chi-square probability is p/12 = 0.05/12 = 0.004.

Differential item functioning

A central requirement of the Rasch model is measurement invariance, which means that items should function in the same way across different groups of people [60], such as gender and people with different health status. Items display differential item functioning (DIF) when items have different relative difficulty (uniform DIF) or discriminate differently (non-uniform DIF) for different groups of people.

We explored whether the items displayed DIF for selected person factors by two-way analysis of variance (ANOVA) of standardized residuals and inspecting graphical displays [60]. Owing to the inclusion criteria “YP aged 16–25 years”, we dichotomized participants’ highest education level (“upper secondary school or below” versus “above upper secondary school”), and we dichotomized participants’ age accordingly (16–20 years old versus 21–25 years old). Participants’ self-reported social status on a scale from 1 to 10 was dichotomized, as the two age groups probably define their level in the society based on different criteria due to life experiences: education level, living conditions, and economic status. Economic deprivation was present, as some reported difficulties with paying bills at the end of the month. Participants described their health status (mostly healthy or increased risk of/having a chronic health problem) and reported whether they suffered from long-term illness expected to last or had lasted for at least six months.

Ordered response categories

Polytomous items (here: 4-point response scale) with ordered response categories yield categorical data at the ordinal level. This implies significantly different and ordered thresholds, where thresholds are the locations at the latent trait where adjacent response categories are equally likely [60]. Disordered thresholds indicate response categories not working as intended [61].

Confirmatory factor modelling

Using the software Mplus 8.6 [62], one- and three-factorial CFAs of the HLS19-YP12, HLS19-Q12, HLS19-SF12, and HLS19-Q12-NO data, were conducted to examine the correlation structure and item loadings in light of the theoretical framework – the HLS-EU health literacy matrix [7]. The one-, two-, three-, four- and 12-factorial CFAs of the HLS19-Q47 data were supplementarily performed to assist confirmation of prior studies.

Following Asparouhov and Muthén [63], a significant model chi-square statistic implies that the suggested confirmatory factor model fails the “exact fit test”. Applying categorical data, weighted least square (WLS) estimator was used to obtain the model chi-square statistic [64]. Other fit indices were estimated using robust diagonally weighted least squares (WLSMV) estimator: a default option for categorical data in Mplus 8.6. Using WLSMV estimators with ordered-category data, polychoric correlation coefficients were estimated and reported in Table 3.

Table 3 Descriptive statistics and correlation matrix of HLS19-YP12, with variances on the diagonal

Other absolute fit indices below their target value, such as the standardized root mean square residual (SRMR ≤ 0.080) combined with small residual correlation matrix entries [63] (i.e., absolute value ≤ 0.10) [65], indicate approximate fit. Other “goodness of fit” (GOF) indices (with target value in parenthesis) may assist model evaluation, such as the root mean square error of approximation (RMSEA ≤ 0.06), comparative fit index (CFI ≥ 0.95), and Tucker-Lewis index (TLI ≥ 0.95) [66]. However, RMSEA values ≤ 0.08 may be considered acceptable in a small sample, whereas the other GOF indices suggest a good model fit. Additionally, CFI between 0.90 and 0.95 also indicates reasonable fit, while values < 0.90 are considered poor fit [67].

Developing the HLS19-YP12

The suggested 12-item short version in the present study was developed from analyses of the HLS19-Q47 and the other three 12-item short versions, applied in YP aged 16–25 in Norway. The development was stepwise: 1) exclude items that in the Rasch analyses displayed poor fit, DIF, disordered response categories, and that might collect redundant information; and 2) using CFA to assess the fit statistics, in which large residual correlation matrix entries indicate the need for model modifications. Items included in the suggested version were continuously ensured reflecting the conceptual 12-cell matrix.

Handling missing data

Missing data also comprises “don’t know” responses, which on average made up 2 percent of the data. The highest missing rates (5–7%) were observed for items 2, 3, 10, 11, 19 and 34, while items 8, 14, 22, 32, 33, 37, 38, 39, 40, 42, 43 and 44 had less than 1% missing values. However, using full information maximum likelihood (FIML) estimation, person-locations and item-thresholds are estimated based on available information [62].


Descriptive statistics and correlations between the items of HLS19-YP12

For all items, the percentage of participants who had the “difficult” and “very difficult” responses is lower than the percentage for responses of “easy” and “very easy” (Table 3). The most difficult items were item41, item10, and item18 with 46, 43, and 42% of (very) difficult responses, respectively. The easiest items were item4, item23, item46, and item13 with 86, 84, 81, and 80% of (very) easy responses, respectively. The correlations between the items of HLS19-YP12 could be considered small to medium (range: 0.190 – 0.474).

Overall data-model fit and unidimensionality of 12-item short versions

The HLS19-YP12, the HLS19-Q12-NO, and the HLS19-SF12 data displayed sufficiently overall fit to the PCM (non-significant overall chi-square statistic), while the HLS19-Q12 data did not. All short versions explored in our study had reliability indexes (PSR, PSI and Omega) above 0.65. The HLS19-YP12, the HLS19-Q12, and the HLS19-Q12-NO are considered sufficiently unidimensional, while the HLS19-SF12 is not (Table 4).

Table 4 Overall data-model fit, reliability, and unidimensionality by applying Rasch modelling of the 12-item short scales

No response dependency was observed for any short version, but the HLS19-Q47 suffers from serious local dependency with up to 35 pairs of dependent items when applying the unidimensional PCM. For details, see Supplementary Table S1.

No short version was particularly well-targeted to the YP, but the distribution of item-threshold locations and the distribution of person locations were best aligned for the HLS19-YP12 (Fig. 1); mean person location for the scales HLS19-YP12, HLS19-Q12, HLS19-SF12, and HLS19-Q12-NO were 1.035, 1.155, 1.141, and 1.084, respectively (Table 4).

Fig. 1
figure 1

Targeting, person-item threshold distribution of 12-item short versions. Note: Targeting of HLS19-YP12, HLS19-Q12, HLS19-SF12, and HLS19-Q12-NO reflects the person location mean in Table 4 indicating a slight right-skewness given the item location mean calibrated to be at 0.0 logits

Exploring dimensionality by using confirmatory factor analysis

Comparing one- and three-factorial models, only the one-factor model of HLS19-YP12 achieved approximate fit with acceptable SRMR (0.030) and with no entry in the residual correlation matrix > 0.10 (Table 5). Supplementary Table S2 provides an overview of all entries in the residual correlation matrix based on all four 12-item scales, applying both one- and three-factor models. Other GOF indices indicated that the model-implied correlation matrix sufficiently well re-created the observed correlation matrix: RMSEA (0.039; 0.034), CFI/TLI (0.985/0.981; 0.989/0.986) (Table 5). Results related to the comprehensive scale HLS19-Q47 are supplementarily reported in Supplementary Table S3.

Table 5 Fit statistics for different factor structures of 12-item short versions applying CFA

While all short versions: HLS19-YP12, HLS19-Q12, HLS19-SF12, and HLS19-Q12-NO, achieved SRMR < 0.080 for both one- and three-factorial models, the HLS19-SF12 had most entries in the residual correlation matrix > 0.10, whereas the HLS19-YP12 had none for the one-factor model and only one high entry (-0.13) for the three-factor model. Among the 12-item short scales, the HLS19-YP12 obtained the most acceptable standardized factor loadings applying the one-factor structure model (all items > 0.500) (Table 6).

Table 6 Factor loadings for the items in the respective 12-item short versions when a one-factor structure model is considered

Rasch analyses at item level for HLS19-YP12, HLS19-Q12, HLS19-SF12, and HLS19-Q12-NO

Individual item fit

Applying unidimensional Rasch modelling, all items for all short versions had acceptable infit values (Tables 7, 8, 9 and 10). For the HLS19-Q12, item31 had a T-value of 2.1 meaning that the item under-discriminated relative to the PCM. In addition, Bonferroni-adjusted chi-square probability (chi-square: 21.18; p < 0.001) for item42 in the same scale was significant (not reported in the Tables). Significant total item chi-square (Table 4) indicated also problems at the individual item level. Following this problem, Class Interval main effect indicating item misfit was also observed for this item concerning all person factor variables: age, gender, education, economic deprivation, level in society, long-term illness, and health status. Class Interval main effect was also observed, but only for the person factor “long-term illness”, in item45 in the HLS19-SF12 scale. Supplementary investigation of the HLS19-Q47 showed, however, there were five items (29, 34, 38, 41, 45) in the 12-dimensional model that under-discriminated relative to the PCM (Supplementary Table S4).

Table 7 Item characteristics, ordering of response categories, and DIF of the 12-item short version HLS19-YP12
Table 8 Item characteristics, ordering of response categories, and DIF of the 12-item short version HLS19-Q12
Table 9 Item characteristics, ordering of response categories, and DIF of the 12-item short version HLS19-SF12
Table 10 Item characteristics, ordering of response categories, and DIF of the 12-item short version HLS19-Q12-NO

Differential item functioning—DIF

While there was no DIF observed, neither graphical nor by significant ANOVA tests, for any item in the HLS19-YP12, significantly uniform DIF was observed for the HLS19-Q12-NO in item14 for the “level in society” subgroups, whereas item45 in the HLS19-SF12 scale displayed significantly non-uniform DIF for the “long-term illness” subgroups (Fig. 2). Disregarding statistical Bonferroni-adjusted non-significance, investigation of the items using the item characteristic curves (ICCs) graphically displayed uniform DIF for the HLS19-Q12 in item42 for the “level in society” subgroups and for the HLS19-SF12 in item6 for the “health status” subgroups (not reported in the Figures).

Fig. 2
figure 2

Items displaying differential item functioning – DIF. Note: DIF with parallel slopes is referred to as uniform DIF, whereas non-uniform DIF is present when locations are the same but the slopes are different or both the locations and slopes are different

Ordering of response categories

Among the four short versions, only item15 in the HLS19-SF12 and item16 in the HLS19-Q12 displayed disordered response categories. Figure 3 shows that response category “2” in both items was not the most likely category for any location on the continuum of person location estimates.

Fig. 3
figure 3

Visualization of disordered response categories of item15 and item16 in the HLS19-SF12 and the HLS19-Q12, respectively. Note: Using RUMM2030plus software, we observed that the category probability curves for item15 in the HLS19-SF12 and item16 in the HLS19-Q12 indicated disordered/reversed response categories. The response category 2 in both items was not the most likely for any location on the latent trait scale and might weaken the hypothesis of ordinal data. Disordered response categories were also observed for item21 in the HLS19-Q47 applying unidimensional Rasch model


In several Western health care systems, the patient role has been redefined expecting patients to be a more active part in his/her care and decision-making [68]. Accurate and precise measure of HL is very supportive for tailoring the communication between patients and health providers during the patient pathway. Similarly for the targeted public health measures. All this also applied to YP from the age of 16.

Despite the fact that the HLS19-Q47 and its short versions, HLS19-Q12, HLS19-SF12 and HLS19-Q12-NO, have been well studied and validated for the adult populations [21, 23, 31, 32], this study, to our knowledge, is the first one that simultaneously assessed the psychometric properties of all recently suggested 12-item versions of the HLS19-Q47 applied in YP aged 16–25.

Based on data from the Norwegian HLS19 study, the empirical evidence has weakened our null hypothesis associated with the psychometric properties of the previously 12-item short versions of the HLS19-Q47, i.e., HLS19-Q12, HLS19-SF12, and HLS19-Q12-NO. By examining poorly fitting items displayed from Rasch modelling and CFA, we successfully established a psychometrically sound parsimonious 12-item version (HLS19-YP12) for use among YP aged 16–25 years.

The empirical evidence suggested that the HLS19-YP12 has superior psychometric properties and convincingly outperforms other recently available 12-item short versions of the HLS19-Q47, i.e., HLS19-Q12, HLS19-SF12, and HLS19-Q12-NO.

Psychometric properties of the 12-item versions; HLS19-YP12, HLS19-Q12-NO, HLS19-Q12, and HLS19-SF12 at the overall level


Previous research has concluded that the HLS19-Q12-NO was psychometrically superior to other short versions of the HLS19-Q47 [21, 31]. However, the HLS19-Q12 was not reviewed in these studies. Nonetheless, all short versions have been suggested and validated for adult populations. Applied in data from YP, the HLS19-Q12-NO still seemed to fit the unidimensional Rasch model better than the other two scales, HLS19-Q12 and HLS19-SF12. Nevertheless, the present study provided evidence that the suggested HLS19-YP12 displayed even better fit to the unidimensional Rasch model than did the HLS19-Q12-NO, and unconditionally stood out as sufficiently unidimensional.

Applying the guidelines for CFA in Mplus set forth in Asparouhov and Muthén [63], established approximate fit was only tenable when SRMR ≤ 0.080 and all residuals were small (rres ≤ 0.10). Asparouhov and Muthén [63] claim that it would be inaccurate to consider models that have large residual values as approximately well-fitting models, as large residual values indicate major discrepancy between the model and the data. However, we exceptionally considered it acceptable if only remarkable few residuals that were barely higher than 0.10. Disregarding some residuals higher than 0.10, other GOF indices, such as RMSEA, CFI, and TLI, indicated that both one- and three-factorial models of the HLS19-Q12 and the HLS19-Q12-NO have relatively good data-model fit. Furthermore, the HLS19-SF12 did also display acceptable data-model fit based on these GOF indices. Nevertheless, researchers have discussed whether it is expedient to assess the other GOF indices (RMSEA, CFI, and TLI) when the criterion of SRMR and all small residuals are not met [63]. Large residual values indicate that model modifications are needed.

Based on our national representative sample (n = 890) of youth aged 16–25 years, it is strongly evident that the one-factorial CFA model explains best the data from the HLS19-YP12 in comparison with other 12-item short versions, as well as the data from this scale fitted best the unidimensional polytomous Rasch model.


All the 12-item short scales obtained a positive mean person location value indicating that the data as a whole was located at a higher level than the average of the scale. In other words, the items are deemed to be too easy for the participants’ ability. The HLS19-Q12 and the HLS19-SF12 obtained the highest values of mean person location, and we might have witnessed to the ceiling effect (extreme person scores), in which poor targeting have caused disordered response categories [69]. Out of the four 12-item short scales, the distribution of item-threshold locations and the distribution of person locations were best aligned for the HLS19-YP12 (Fig. 1), which is reflected by the lowest mean person location value (1.035). However, the instrument could benefit from adding items that are harder to endorse.

Psychometric properties of the 12-item versions; HLS19-YP12, HLS19-Q12, HLS19-Q12-NO, and HLS19-SF12 at item level

Item fit

In accordance with results from the Rasch analyses of the HLS19-Q12 when applied in adult populations [32], item31 in the HLS19-Q12 also displayed poor item fit and was the only item within all four short versions that under-discriminated. In addition to item31, item28 deals with difficulties appraising and applying health information from “mass media” as there were added instructions guiding the participants to recognize what mass media (i.e., Newspapers, TV, or Internet) refers to [19]. The various types of media might have caused the undistinguished response pattern regardless of the participant’s HL level, as the difficulty of appraising or applying information from mass media might be dependent on what kind of media they refer to.

Applying the one-factorial CFA model (Table 6), item28 in the HLS19-Q12-NO displayed the second lowest factor loading while item31 in the HLS19-Q12 had the lowest loaded factor on their respective dimensions. Therefore, items referring to the mass media, likely perceived as digital resources, may be replaced by other items as they are more likely related to e.g., a digital HL construct, which is another aspect of the overall HL.

Differential item functioning

DIF for societal levels was observed for item14 […to follow instructions on medication] and item42 […to judge how your housing conditions may affect your health and well-being] in the HLS19-Q12-NO and the HLS19-Q12, respectively. Supplementary analyses were conducted to understand why DIF was displayed for societal level among YP. The results (not reported in the Tables) showed that while about 80% of the youngest subgroup placed themselves at the highest societal level, only 60% of the oldest subgroup did the same. It could be explained that different understandings of societal level among the youngest and the oldest subgroups, i.e., a 16-year-old might perceive not owing a popular piece of clothing, like an expensive jacket, as being placed at a very low level in the society, whereas a 25-year-old might have another opinion and perception based on the wider context. In turn, the different perceptions might have caused the DIF for societal levels observed in item14 and item42 in the HLS19-Q12-NO and the HLS19-Q12, respectively. However, there is no evidence of DIF for age groups.

Further investigation of reasons to why there is DIF for item14 and item42, a supplementary frequency analysis (not reported) was conducted showing that 89% of the youngest subgroup answered (very) easy on item14: to follow instructions on medication. Shed light on this result, one might recognize that parents could have played an important role giving YP both a reminder and guidance [70] concerning medications and applying the information provided from the doctor with regard to medications. Surprisingly, the same proportions (80%) of both age subgroups have answered (very) easy on item42, as one might have expected a higher proportion of the youngest who experienced it more difficult considering that they are still living at their parents’ place. This demonstrates that the YP are as reflected as the adult population in these kind of questions, yet this phenomenon is to be investigated further in more details.

Ordered response categories

Disordered response categories might be explained by too few persons located at the specific threshold levels and it is most likely due to bad targeting as well [71]. Item16 in the HLS19-Q12 showed that the first two thresholds were very close together and slightly reversed. More severely disordered response categories were identified on item15 in the HLS19-SF12, in which the two first thresholds were clearly reversed. The latter case weakened the hypothesis of ordinal data.

Content validity

Item13, item36, item41, and item46 in the HLS19-YP12 are the unique items and distinguished from the other three 12-item scales. The remaining eight items (item4, item7, item10, item18, item23, item26, item30 and item38) are to be found in either the HLS19-Q12, the HLS19-SF12, or the HLS19-Q12-NO. Especially item26 […to judge which vaccinations you or your family may need] and item36 […to find information about how to promote health at work, at school, or in the neighborhood] are particularly relevant to YP, as they still have to deal with, e.g., vaccination programs and other health related issues at school age. Adopting these two items in the new 12-item short version contributed to responding on the critique from Bröder et al. [72] concerning the lack of YP’s specific needs and social structures in most of the models.

However, the face validity has not been explicitly performed for the instruments beforehand towards participants aged 16–17 years. This age group and 18-year-old persons, most likely represent pupils in the upper secondary school, so that the readability and response burden for this group was assumed not critically derived from the burden separately applied to persons aged 18 years. When examining the response time median (range: 16.7—18.9 min), it is evident that the response burden was not different for the 16- and 17-year-old participants (17.7 and 17.3 min, respectively) compared to participants aged 18–25 (range: 16.7—18.9 min). Even though the understandability of item content has been ensured through cognitive interviews in young adults aged 18 and above, more interviews may be considered for YP below 18, confirming that the items are also well understood in this target group.

Notably, one of the strengths of the HLS19-YP12 instrument is that it was developed based on a definition and conceptual framework of HL, by which the content validity has been ensured. Furthermore, the new instrument has included items that are considered more likely related to younger people, such as vaccination and health promoting activities in school and neighborhood. As for the scale’s targeting, the distribution of both item-threshold and person locations were best aligned for the HLS19-YP12, indicating that the content in the new instrument was better adapted to the target population.

Finally, YP are expected to use social media and digital platforms actively to access health information [6, 73]. Surprisingly, items related to mass media, e.g., item28 in the HLS19-Q12-NO and item31 in the HLS19-Q12 tend to under-discriminate. A prior study [3] might have provided the explanation, that YP preferred to utilize their family as information resources rather than social media platforms. Furthermore, YP might have perceived mass media as part of another construct relative to digital health information platforms and skills.


The sample size of the HLS19-Q47, the HLS19-YP12, and the HLS19-SF12 was limited to n = 419. Therefore, all analyses that aimed to compare the various short versions were based on this sample size. There are no strict requirements for sample size in Rasch modelling. However, a rule of thumb assumes the useful sample size for a test of 12 polytomous items with 3 thresholds should comprise at least 360 up to 720 persons, in which a reasonable ratio is between 10 to 20 persons for each threshold [60]. Mundfrom et al. [74] suggested that the minimum sample size for applying CFA is depending on the variables-to-factors ratio and the number of factors that are present in the data. However, Hair et al. [75] claimed that a sample size above 300 are unlikely to produce Heywood cases. Hence, we assumed that our sample size of n = 419 was sufficient for the analyses performed. Taking into consideration that data-model fit and analysis of DIF in Rasch modelling and exact fit in CFA both are relatively sensitive to sample size, in which DIF in Rasch modelling is more likely with larger sample size and model Chi-square significance in CFA is more sensitive to smaller sample size. Thus, interpretation of the findings might be doing with some cautions.

In this study, we have applied both modern (Rasch modelling) and classical test theory (CFA). However, future research may also consider other relevant modern short-form development techniques. Finally, the HLS19-YP12 was developed and psychometrically assessed based on national data. Hence, the psychometric properties of the instruments should be further assessed using multinational data.


The revised version of HLS-EU-Q47 (HLS19-Q47) was supplementarily confirmed to fit a 12-dimensional model best. Hence, it is not statistically defensible to report total score for individuals based on this scale as the person estimates of HL (person locations) cannot derive from her/his raw score from the multidimensional scale. This principle also applies to all short versions that are not sufficiently unidimensional.

Remaining as the best-fitted 12-item short version to the unidimensional Rasch model and the one-factorial CFA, including factor loading > 0.500 achievement for all items, the HLS19-YP12 is the first sufficiently unidimensional and conceptually developed HL instrument towards young people aged 16–25. This instrument is psychometrically superior and convincingly outperformed the other three 12-item short versions. Consequently, the HLS19-YP12 offers an efficient and much-needed screening tool for use among YP, which is likely a useful application in processes towards the development and evaluation of health policy and public health work, as well as for use in clinical settings.

Based on relatively strong evidence from the study, we suggest that the HLS19-YP12 instrument (Table S5) is preferred in future studies measuring HL among YP from the age of 16.

Availability of data and materials

The datasets used and/or analyzed during the current study are not publicly available, but can be accessed by applying to the Norwegian Study Centre of HLS19 via this website:



Apply health information


Akaike’s information criterion


Two-way analysis of variance


Confirmatory factor analysis


Comparative fit index


Confidence interval


Degree of freedom


Disease prevention


Differential item functioning


Estimated parameters


Finding health information


Information maximum likelihood


Health care


Health literacy


European Health Literacy Survey Questionnaire


European Health Literacy Population Survey 2019–2021 (HLS19) Questionnaire


Health promotion


Item characteristic curve


Item response theory


Judging/appraising health information


Marginal maximum likelihood estimation


Mean square


WHO action network on measuring population and organisational health literacy

MRCML model:

“Multidimensional random coefficients multinomial logit” model


Principal component analysis


Pairwise maximum likelihood estimation


Person separation index


Person separation reliability


Root-mean-squared error of approximation


Standardized root mean square residual


Standard deviation


Tucker and Lewis fit index


Understanding health information


Warm’s mean weighted likelihood estimation


Weighted least square mean and variance estimators


  1. Mosquera PA, Waenerlund A-K, Goicolea I, Gustafsson PE. Equitable health services for the young? A decomposition of income-related inequalities in young adults’ utilization of health care in Northern Sweden. Int J Equity Health. 2017;16(1):20.

    Article  Google Scholar 

  2. Haugen ALH, Riiser K, Esser-Noethlichs M, Hatlevik OE. Developing indicators to measure critical health literacy in the context of Norwegian lower secondary schools. Int J Environ Res Public Health. 2022;19(5):3116.

    Article  Google Scholar 

  3. Riiser K, Helseth S, Haraldstad K, Torbjørnsen A, Richardsen KR. Adolescents’ health literacy, health protective measures, and health-related quality of life during the Covid-19 pandemic. PLoS ONE. 2020;15(8): e0238161.

    Article  CAS  Google Scholar 

  4. Paakkari LT, Torppa MP, Paakkari O-P, Välimaa RS, Ojala KSA, Tynjälä JA. Does health literacy explain the link between structural stratifiers and adolescent health? Eur J Pub Health. 2019;29(5):919–24.

    Article  Google Scholar 

  5. Levesque J-F, Harris MF, Russell G. Patient-centred access to health care: conceptualising access at the interface of health systems and populations. International Journal for Equity in Health. 2013;12(1):18.

    Article  Google Scholar 

  6. Paakkari L, Paakkari O. Health literacy as a learning outcome in schools. Health Education. 2012.

    Google Scholar 

  7. Sørensen K, Van den Broucke S, Fullam J, Doyle G, Pelikan J, Slonska Z, et al. Health literacy and public health: a systematic review and integration of definitions and models. BMC Public Health. 2012;12(1):1–13.

    Article  Google Scholar 

  8. McCormack L, Haun J, Sørensen K, Valerio M. Recommendations for advancing health literacy measurement. J Health Commun. 2013;18(sup1):9–14.

    Article  Google Scholar 

  9. The Ministry of Health and Care Services (Norway). A Norwegian Strategy to increase Health Literacy in the Population. Norway: The Norwegian Government; 2019.

    Google Scholar 

  10. Okan O, Bauer U, Levin-Zamir D, Pinheiro P, Sørensen K. International Handbook of Health Literacy: Research, practice and policy across the lifespan: Policy Press. 2019.

    Google Scholar 

  11. Griffin JM, Partin MR, Noorbaloochi S, Grill JP, Saha S, Snyder A, et al. Variation in estimates of limited health literacy by assessment instruments and non-response bias. J Gen Intern Med. 2010;25(7):675–81.

    Article  Google Scholar 

  12. Haun J, Luther S, Dodd V, Donaldson P. Measurement variation across health literacy assessments: implications for assessment selection in research and practice. J Health Commun. 2012;17(sup3):141–59.

    Article  Google Scholar 

  13. Sørensen K, Pleasant A. Understanding the conceptual importance of the differences among health literacy definitions. Stud Health Technol Inform. 2017;240:3–14.

    Google Scholar 

  14. Guo S, Armstrong R, Waters E, Sathish T, Alif SM, Browne GR, et al. Quality of health literacy instruments used in children and adolescents: a systematic review. BMJ Open. 2018;8(6):e020080.

    Article  Google Scholar 

  15. Okan O, Lopes E, Bollweg TM, Bröder J, Messer M, Bruland D, et al. Generic health literacy measurement instruments for children and adolescents: a systematic review of the literature. BMC Public Health. 2018;18(1):166.

    Article  Google Scholar 

  16. Ormshaw MJ, Paakkari LT, Kannas LK. Measuring child and adolescent health literacy: a systematic review of literature. Health Educ. 2013;113(5):433–55.

    Article  Google Scholar 

  17. Perry EL. Health literacy in adolescents: an integrative review. J Spec Pediatr Nurs. 2014;19(3):210–8.

    Article  Google Scholar 

  18. Urstad KH, Andersen MH, Larsen MH, Borge CR, Helseth S, Wahl AK. Definitions and measurement of health literacy in health and medicine research: a systematic review. BMJ Open. 2022;12(2): e056294.

    Article  Google Scholar 

  19. The HLS19 Consortium of the WHO Action Network M-POHL. International Report on the Methodology, Results, and Recommendations of the European Health Literacy Population Survey 2019–2021 (HLS19) of M-POHL. Vienna: Austrian National Public Health Institute; 2021.

    Google Scholar 

  20. The HLS-EU Consortium. Measurement of health literacy in Europe: HLS-EU-Q47; HLS-EU-Q16; and HLS-EU-Q86. Health Literacy Project 2009–2012. Maastricht: The HLS-EU Consortium; 2012.

    Google Scholar 

  21. Finbråten HS, Wilde-Larsson B, Nordström G, Pettersen KS, Trollvik A, Guttersrud Ø. Establishing the HLS-Q12 short version of the European health literacy survey questionnaire: latent trait analyses applying Rasch modelling and confirmatory factor analysis. BMC Health Serv Res. 2018;18(1):1–17.

    Article  Google Scholar 

  22. Le C, Finbråten HS, Pettersen KS, Joranger P, Guttersrud Ø. Health Literacy in the Norwegian Population. English Summary. In: Befolkningens helsekompetanse, del I. The International Health Literacy Population Survey 2019–2021 (HLS19)–et samarbeidsprosjekt med nettverket M-POHL tilknyttet WHO-EHII: The Norwegian Directorate of Health. 2021.

    Google Scholar 

  23. Duong TV, Aringazina A, Kayupova G, Nurjanah F, Pham TV, et al. Development and validation of a new short-form health literacy instrument for the general public in six Asian countries. Health Lit Res Pract. 2019;3(2):91–102.

    Google Scholar 

  24. van der Heide I, Rademakers J, Schipper M, Droomers M, Sørensen K, Uiters E. Health literacy of Dutch adults: a cross sectional survey. BMC Public Health. 2013;13(1):1–11.

    Google Scholar 

  25. Sørensen K, Van den Broucke S, Pelikan JM, Fullam J, Doyle G, Slonska Z, et al. Measuring health literacy in populations: illuminating the design and development process of the European Health Literacy Survey Questionnaire (HLS-EU-Q). BMC Public Health. 2013;13(1):1–10.

    Article  Google Scholar 

  26. Rouquette A, Nadot T, Labitrie P, Van den Broucke S, Mancini J, Rigal L, et al. Validity and measurement invariance across sex, age, and education level of the French short versions of the European health literacy survey questionnaire. PLoS ONE. 2018;13(12): e0208091.

    Article  CAS  Google Scholar 

  27. Duong TV, Aringazina A, Baisunova G, Pham TV, Pham KM, Truong TQ, et al. Measuring health literacy in Asia: validation of the HLS-EU-Q47 survey tool in six Asian countries. J Epidemiol. 2017;27(2):80–6.

    Article  Google Scholar 

  28. Duong VT, Lin I-F, Sorensen K, Pelikan JM, Van den Broucke S, Lin Y-C, et al. Health literacy in Taiwan: a population-based study. Asia Pacific Journal of Public Health. 2015;27(8):871–80.

    Article  Google Scholar 

  29. Nakayama K, Osaka W, Togari T, Ishikawa H, Yonekura Y, Sekido A, et al. Comprehensive health literacy in Japan is lower than in Europe: a validated Japanese-language assessment of health literacy. BMC Public Health. 2015;15(1):1–12.

    Article  CAS  Google Scholar 

  30. Finbråten HS, Pettersen KS, Wilde-Larsson B, Nordström G, Trollvik A, Guttersrud Ø. Validating the European health literacy survey questionnaire in people with type 2 diabetes: Latent trait analyses applying multidimensional Rasch modelling and confirmatory factor analysis. J Adv Nurs. 2017;73(11):2730–44.

    Article  Google Scholar 

  31. Maie A, Kanekuni S, Yonekura Y, Nakayama K, Sakai R. Evaluating short versions of the European Health Literacy Survey Questionnaire (HLS-EU-Q47) for health checkups. Health Evaluation and Promotion. 2021;48(4):351–8.

    Article  Google Scholar 

  32. Guttersrud Ø, Le C, Pettersen KS, Finbråten HS. Rasch analyses of data collected in 17 countries: a technical report to support decision-making within the M-POHL consortium. In: Publications on international HLS19 results. Available from: Accessed 16 Nov 2022.

  33. Brislin RW. Back-translation for cross-cultural research. J Cross Cult Psychol. 1970;1(3):185–216.

    Article  Google Scholar 

  34. Drennan J. Cognitive interviewing: verbal data in the design and pretesting of questionnaires. J Adv Nurs. 2003;42(1):57–63.

    Article  Google Scholar 

  35. Andersen EB. Sufficient statistics and latent trait models. Psychometrika. 1977;42(1):69–81.

    Article  Google Scholar 

  36. Andrich D. Distinctions between assumptions and requirements in measurement in the social sciences. Math theor Syst. 1989;4:7–16.

    Google Scholar 

  37. Andrich D. Rasch models for measurement. Newsbury Park, CA: SAGE Publications; 1988.

    Book  Google Scholar 

  38. Stenner A. Specific objectivity-local and general. Rasch Meas Trans. 1994;8(3):374.

    Google Scholar 

  39. Masters GN. A Rasch model for partial credit scoring. Psychometrika. 1982;47(2):149–74.

    Article  Google Scholar 

  40. Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press; 1980.

    Google Scholar 

  41. Adams RJ, Wilson M, Wang WC. The multidimensional random coefficients multinomial logit model. Appl Psychol Meas. 1997;21(1):1–23.

    Article  CAS  Google Scholar 

  42. Adams R, Cloney D, Wu M, Osses A, Schwantner V, Vista A, et al. ACER ConQuest Manual. In: ConQuest Notes and tutorials. Available from: Accessed 17 Apr 2022. 

  43. RUMM laboratory Pty Ltd. Displaying the RUMM 2030 Analysis: Plus Edition. Duncraig: RUMM laboratory Pty Ltd.; 2019.

  44. Katsikatsou M, Moustaki I, Yang-Wallentin F, Jöreskog KG. Pairwise likelihood estimation for factor analysis models with ordinal data. Comput Stat Data Anal. 2012;56(12):4243–58.

    Article  Google Scholar 

  45. Bock RD, Aitkin M. Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika. 1981;46(4):443–59.

    Article  Google Scholar 

  46. Warm TA. Weighted likelihood estimation of ability in item response theory. Psychometrika. 1989;54(3):427–50.

    Article  Google Scholar 

  47. Smith EV Jr. Understanding Rasch measurement: detecting and evaluating the impact of multidimensionality using item fit statistics and principal components analysis of residuals. J Appl Meas. 2002;3(2):205–31.

    Google Scholar 

  48. Hagell P. Testing rating scale unidimensionality using the principal component analysis (PCA)/t-test protocol with the Rasch model: the primacy of theory over statistics. Open J Stat. 2014;4(6):456–65.

    Article  Google Scholar 

  49. Strout WF. A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika. 1990;55(2):293–325.

    Article  Google Scholar 

  50. Tennant A, Pallant JF. Unidimensionality matters. Rasch MeasTrans. 2006;20(1):1048–51.

    Google Scholar 

  51. Christensen KB, Makransky G, Horton M. Critical values for Yen’s Q3: Identification of local dependence in the Rasch model using residual correlations. Appl Psychol Meas. 2017;41(3):178–94.

    Article  Google Scholar 

  52. Tennant A, Conaghan PG. The Rasch measurement model in rheumatology: what is it and why use it? When should it be applied, and what should one look for in a Rasch paper? Arthritis Care Res. 2007;57(8):1358–62.

    Article  Google Scholar 

  53. Dueber DM. Bifactor Indices Calculator: A Microsoft Excel-based tool to calculate various indices relevant to bifactor CFA models. 2017.

    Google Scholar 

  54. Frisbie DA. Reliability of scores from teacher-made tests. Educ Meas Issues Pract. 1988;7(1):25–35.

    Article  Google Scholar 

  55. Smith RM, editor. Using item mean squares to evaluate fit to the Rasch model. The annual meeting of the American educational research association; San Francisco, CA. 1995.

  56. Wright B, Linacre JM. Reasonable mean-square fit values. In: Rasch measurement transactions contents. Accessed 22 May 2022.

  57. Adams RJ, Wu ML, (August 2010). Tutorial 7 - Multidimensional models. In: ConQuest notes and tutorials. Accessed 17 Apr 2022.

  58. Masters GN. Item discrimination: when more is worse. J Educ Meas. 1988;25(1):15–29.

    Article  Google Scholar 

  59. Bland JM, Altman DG. Multiple significance tests: the Bonferroni method. BMJ. 1995;310(6973):170–1.

    Article  CAS  Google Scholar 

  60. Andrich D, Marais I. A Course in Rasch Measurement Theory: Measuring in the Educational, Social and Health Sciences. Singapore: Springer; 2019.

    Book  Google Scholar 

  61. Andrich D, de Jong J, Sheridan B. Diagnostic opportunities with the Rasch model for ordered response categories. In: Rost J, Langeheine R, editors. Applications of Latent Trait and Latent Class Models in the Social Sciences. New York, NY: Waxmann Verlag GMBH; 1997. p. 59–70.

    Google Scholar 

  62. Muthén LK, Muthén BO. Mplus User’s Guide. 8th ed. Los Angeles, CA: Muthén & Muthén; 1998-2017.

  63. Asparouhov T, Muthén B, (2nd May 2018). SRMR in Mplus. In: Mplus: technical appendices related to new features in version 8. Accessed 17 Apr 2022.

  64. Asparouhov T, Muthén B, (26th April 2022). Assessing model fit for SEM models with categorical variables via contingency tables. In: Mplus: technical appendices related to new features in version 8. Accessed 15 May 2022.

  65. Kline RB. Principles and Practice of Structural Equation Modeling. 4th ed. New York: The Guilford Press; 2016.

    Google Scholar 

  66. Hu Lt, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Struct Equation Model. 1999;6(1):1–55.

    Article  Google Scholar 

  67. Brown TA. Confirmatory factor analysis for applied research. New York: Guilford Publications; 2015.

    Google Scholar 

  68. Håkansson Eklund J, Holmström IK, Kumlin T, Kaminsky E, Skoglund K, Höglander J, et al. “Same same or different?” A review of reviews of person-centered and patient-centered care. Patient Educ Couns. 2019;102(1):3–11.

    Article  Google Scholar 

  69. Hagquist C, Bruce M, Gustavsson JP. Using the Rasch model in nursing research: an introduction and illustrative example. Int J Nurs Stud. 2009;46(3):380–93.

    Article  Google Scholar 

  70. Domanska OM, Firnges C, Bollweg TM, Sørensen K, Holmberg C, Jordan S. Do adolescents understand the items of the European Health Literacy Survey Questionnaire (HLS-EU-Q47) – German version? Findings from cognitive interviews of the project “Measurement of Health Literacy Among Adolescents” (MOHLAA) in Germany. Arch Public Health. 2018;76(1):46.

    Article  Google Scholar 

  71. Hagquist C, Andrich D. Is the sense of coherence-instrument applicable on adolescents? A latent trait analysis using Rasch-modelling. Pers Individ Differ. 2004;36(4):955–68.

    Article  Google Scholar 

  72. Bröder J, Okan O, Bauer U, Bruland D, Schlupp S, Bollweg TM, et al. Health literacy in childhood and youth: a systematic review of definitions and models. BMC Public Health. 2017;17(1):1–25.

    Google Scholar 

  73. Esmaeilzadeh S, Ashrafi-Rizi H, Shahrzadi L, Mostafavi F. A survey on adolescent health information seeking behavior related to high-risk behaviors in a selected educational district in Isfahan. PLoS ONE. 2018;13(11): e0206647.

    Article  Google Scholar 

  74. Mundfrom DJ, Shaw DG, Ke TL. Minimum Sample Size Recommendations for Conducting Factor Analyses. Int J Test. 2005;5(2):159–68.

    Article  Google Scholar 

  75. Jr Hair JF, Black WC WC, Babin BJ BJ, Anderson RE. Multivariate Data Analysis. 7th ed. Upper Saddle River: Prentice Hall; 2009. Print.

    Google Scholar 

Download references


The authors thank Professor emeritus Kjell Sverre Pettersen [principal investigator of the Norwegian HLS19 study] for contributing to this research and providing feedback on the study’s conception and design as well as his contribution during the data collection.


The overall data collection was funded by the Norwegian Directorate of Health. The publication of this study was supported by the internal funding from the Inland Norway University of Applied Sciences.

Author information

Authors and Affiliations



Study conception and design, translation of the instrument, and data collection: CL, ØG, HSF; Rasch modelling, CFA, and manuscript drafting: CL; Interpretation of results: CL, ØG, HSF; All authors (CL, ØG, KS, HSF) were involved in reading and substantively revising the manuscript, and approved the final version of the manuscript for submission.

Corresponding author

Correspondence to Christopher Le.

Ethics declarations

Ethics approval and consent to participate

The study was conducted in accordance with the ethical principles outlined in the Declaration of Helsinki. The Data protection services at the Norwegian Centre for Research Data (NSD) was notified about the project. The study was considered outside the Norwegian Act of Medical and Health Research, thereby, not required an approval from the Norwegian Regional Committees for Medical and Health Research Ethics. The NSD approved the project (project number 896850). The approval concerns the use of personal/private data (questionnaires, consent form, storage of data, etc.). Participation was voluntary, and the questionnaire was completed anonymously. As data were collected using telephone interviews, verbal informed consent was obtained from the participants. From 1st January 2022, NSD was merged with two other Norwegian organizations: Uninett and The Directorate for ICT and joint services in higher education and research, to form the new Norwegian Agency for Shared Services in Education and Research (Sikt).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Table S1. Overall fit statistics applying unidimensional and multidimensional Rasch models of HLS19-Q47 and its short versions.

Additional file 2:

Table S2. Entries in the residual correlation matrix for the 12-item short scales.

Additional file 3:

Table S3. Fit statistics for different factor structures applying confirmatory factor analyses of the HLS19-Q47.

Additional file 4:

Table S4. Item characteristics and DIF of HLS19-Q47 applying the 12-dimensional Rasch model.

Additional file 5:

Table S5. The HLS19-YP12 instrument with response options.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Le, C., Guttersrud, Ø., Sørensen, K. et al. Developing the HLS19-YP12 for measuring health literacy in young people: a latent trait analysis using Rasch modelling and confirmatory factor analysis. BMC Health Serv Res 22, 1485 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: