Developing the HLS19-YP12 for measuring health literacy in young people: a latent trait analysis using Rasch modelling and confirmatory factor analysis
BMC Health Services Research volume 22, Article number: 1485 (2022)
Accurate and precise measures of health literacy (HL) is supportive for health policy making, tailoring health service design, and ensuring equitable access to health services. According to research, valid and reliable unidimensional HL measurement instruments explicitly targeted at young people (YP) are scarce. Thus, this study aims at assessing the psychometric properties of existing unidimensional instruments and developing an HL instrument suitable for YP aged 16–25 years.
Applying the HLS19-Q47 in computer-assisted telephone interviews, we collected data in a representative sample comprising 890 YP aged 16–25 years in Norway. Applying the partial credit parameterization of the unidimensional Rasch model for polytomous data (PCM) and confirmatory factor analysis (CFA) with categorical variables, we evaluated the psychometric properties of the short versions of the HLS19-Q47; HLS19-Q12, HLS19-SF12, and HLS19-Q12-NO. A new 12-item short version for measuring HL in YP, HLS19-YP12, is suggested.
The HLS19-Q12 did not display sufficient fit to the PCM, and the HLS19-SF12 was not sufficiently unidimensional. Relative to the PCM, some items in the HLS19-Q12, the HLS19-SF12, and the HLS19-Q12-NO discriminated poorly between participants at high and at low locations on the underlying latent trait. We observed disordered response categories for some items in the HLS19-Q12 and the HLS19-SF12. A few items in the HLS19-Q12, the HLS19-SF12, and the HLS19-Q12-NO displayed either uniform or non-uniform differential item functioning. Applying one-factorial CFA, none of the aforementioned short versions achieved exact fit in terms of non-significant model chi-square statistic, or approximate fit in terms of SRMR ≤ .080 and all entries ≤ .10 that were observed in the respective residual matrix. The newly suggested parsimonious 12-item scale, HLS19-YP12, displayed sufficiently fit to the PCM and achieved approximate fit using one-factorial CFA.
Compared to other parsimonious 12-item short versions of HLS19-Q47, the HLS19-YP12 has superior psychometric properties and unconditionally proved its unidimensionality. The HLS19-YP12 offers an efficient and much-needed screening tool for use among YP, which is likely a useful application in processes towards the development and evaluation of health policy and public health work, as well as for use in clinical settings.
In several Western countries, young people (YP) from the age of 16 are expected to take responsibility for health on their own . Today, YP are frequently exposed to health-related information from different sources, such as peers, adults, social media, and commercial enterprises . Several studies have shown that YP might lack sufficient health literacy (HL) to access, understand, critically appraise, and use such information [3, 4].
YP from the age of 16 report worse access to healthcare than does the adult population . According to Levesque et al.’s conceptualization of access to healthcare , there are five corresponding abilities of the populations required to generate access: ability to perceive, ability to seek, ability to reach, ability to pay, and ability to engage. These required abilities reflect the importance of individuals’ HL in different health-related situations, e.g., accessing the health services.
Sufficient HL might empower YP to deal with health information, enable, and access health-promoting activities . According to the HLS-EU Consortium, “Health literacy is linked to literacy and entails people’s knowledge, motivation and competences to access, understand, appraise, and apply health information in order to make judgments and take decisions in everyday life concerning healthcare, disease prevention, and health promotion to maintain or improve quality of life during the life course” . Based on the comprehensive definition, the HLS-EU Consortium  developed a conceptual model and an associated framework for questionnaire item development, which combined three health domains (HDs) and four cognitive domains (CDs) operationalized into a 12-cell matrix. Subsequently, the 12-cell matrix focuses on finding (F), understanding (U), judging (appraising; J), and applying (A) health information concerning healthcare (HC), disease prevention (DP), and health promotion (HP).
Accurate and precise measurement is vital for identifying vulnerable groups with low HL that might need support in managing health issues, suggesting tailored interventions, and evaluating progress in HL promotion . Only when population HL is appropriately described, the public health and health care services can make targeted prioritizations, become more efficient, continuously improve the quality of services towards vulnerable groups, and contribute to increasing population HL . During the past decades, more than 200 tools have been developed focusing on various aspects of HL . The inconsistencies due to instrument diversity have complicated the interpretation of findings across studies, as well as the choice of instruments for new studies [11, 12]. Another major challenge is that different instruments and tools measure different aspects of HL owing to different definitions, contexts, and/or subpopulations .
Several reviews of measurement instruments for youth HL have been published to date [14,15,16,17]. The systematic review of generic HL measurement instruments for children and adolescents  revealed that most instruments did not provide sufficient conceptual information, as they only measured the researchers’ own contextual understanding of HL. A more recent systematic review  also uncovered an inconsistency in how researchers define HL versus develop measures of HL, in which there is a high risk of missing information necessary to understand the underlying conceptualization of HL in the studies. Subsequently, Guo et al.  suggested that most studies on the use of HL instruments applied to children and adolescents were of poor methodological quality, and involved vague descriptions of the target population. Moreover, the best-developed HL instrument for young people (HLAT-8) identified in their review has not been tested for adolescents under 18. The instrument is multidimensional, and was not conceptually developed based on a theoretical framework.
The European Health Literacy Survey Questionnaire (HLS-EU-Q) is widely used to measure HL in adult populations. It was developed on basis of the 12-cell conceptual model of Sørensen et al. , reflecting people’s proficiency in finding, understanding, appraising, and applying health information across three health domains: HC, DP, and HP. Several short versions of this comprehensive instrument have been suggested (see the Table 1). As opposed to the 12-item short versions, the 16-item short version, HLS-EU-Q16, does not reflect the 12-cell matrix. The present study, therefore, excluded the 16-item version from the comparative analyses of the short versions. In 2019, the WHO Action Network on Measuring Population and Organisational Health Literacy (M-POHL) revised the HLS-EU-Q47 items for the HLS19 instrument in terms of rewording items and adding/removing instruction details, such as examples within items . Furthermore, the HLS19 Consortium also suggested an additional 12-item short version: HLS19-Q12. The revised HLS19-Q47 and the short version HLS19-Q12 were applied in the HLS19 survey to measure general HL in the adult population in 17 countries. The Table 1 below provides an overview of the HLS19 instrument and its short versions.
The psychometric properties of the HLS-EU-Q47 have been widely assessed using several techniques, such as principal component analysis (PCA) [24, 25], confirmatory factor analysis (CFA) [26,27,28,29], and Rasch modelling [21, 23, 30]. Also, the short versions of HLS-EU-Q47; HLS-EU-Q16 , HLS-Q12 , HLS-SF12 , and HLS19-Q12 , have been suggested [19,20,21, 23] and validated for adult populations [31, 32], but not in YP. Nonetheless, Okan et al.  concluded that there still is a lack of valid and reliable unidimensional scales for measuring general HL explicitly targeted at YP.
Consequently, our aims are to: (1) evaluate the psychometric properties of the 12-item short versions of the HLS19-Q47 in YP and (2) consecutively suggest a parsimonious unidimensional short version suitable for measuring general HL among YP. Specifically, the hypothesis is that when applied in YP aged 16–25, the short versions of the HLS19-Q47 achieve approximate fit and display acceptable goodness of fit-indices when evaluated using CFA, and are sufficiently unidimensional, well-targeted scales with acceptable person separation (reliability), consisting of independent and invariant items at the ordinal level (i.e., ordered response categories) each displaying sufficient fit to the unidimensional Rasch model. This hypothesis forms the basis for comparison against the psychometric properties of the consecutively suggested parsimonious unidimensional short version: HLS19-YP12.
Sampling and data collection
This study used data from the Norwegian part of the Health Literacy Survey 2019–2021 (HLS19) , which was collected during April–October 2020. The Norwegian HLS19 study applied a population-based cross-sectional survey study design, and was funded by the Norwegian Directorate of Health. The survey was conducted in cooperation with Oslo Metropolitan University and Inland Norway University of Applied Sciences. A Norwegian market research agency (Norstat), with access to country representative strata, collected the data using computer-assisted telephone interviewing (CATI). The data collection was performed in two steps. In the first step (n = 3000) data on the comprehensive 47-item instrument were collected, whereas in the second step (n = 3000) data were collected only on the two short versions: HLS19-Q12-NO and HLS19-Q12. Out of 6000 participants, 890 participants met our inclusion criteria “YP aged 16–25 years”, and 419 responded to the comprehensive scale HLS19-Q47.
Characteristics of the participants
The study’s sample included 890 participants with a slight predominance of males (Table 2). Due to the stepwise data collection, only the smaller sample (n = 419) was applicable to the scales: HLS19-YP12, HLS19-SF12, and HLS19-Q47. Most of the participants have an education equal to upper secondary school or lower. Two-thirds report belonging to the upper social level, and above three quarters report no economic deprivation. Most of the participants also report being healthy.
Measures, translation, and cultural adaptations
In combination with the HL scales, we collected person factors and covariates, such as age, gender, education, self-reported level in the society, economic deprivation, long-term illness, and health status. In addition, the HL-scales have been culturally adapted and translated into Norwegian as described below.
The HLS19-Q47 and its 12-item short versions
The HLS19-Q47 and its 12-item short versions (see the Table 1) reflect the conceptual model of Sørensen et al. , and uses a 4-point rating scale with the response categories: (1) very difficult, (2) difficult, (3) easy, and (4) very easy. Moreover, the “don’t know” response category was used when stated spontaneously by the participants, which was recoded to missing data in the analyses.
Translation and cultural adaptation of the HLS19-Q47
The translation of the HLS19-Q47 was performed in accordance with Brislin’s protocol . The questionnaire was translated from English to Norwegian by two bilingual persons (translators) independently. The concept of HL was deeply understood by the translators, and they were experienced questionnaire developers. The two translators compared their translated versions and discussed item content and wording. A third person read the Norwegian translation, made comments, and suggested amendments. A professional translator was engaged to do a back-translation when consensus had been reached. The original English version was then compared with the back-translated version, in order to gain the most semantically, technically, and contextually equivalent versions. Finally, the translation was quality-assured by the data collection agency (Norstat). To ensure that the item contents were understood and could be considered relevant also in a Norwegian context, cognitive interviews with a think aloud-procedure  were conducted when translating the HLS-EU-Q47 . The results from these cognitive interviews were monitored as part of the translation process in the current study.
Pilot testing of the instruments
Prior to the main data collection, a pilot of the instruments was conducted in several institutions and organizations, such as municipalities, directorates, universities, NGOs, and hospitals. Some HLS19-Q47 items were revised based on results from the pilot survey. These amendments were based on empirical observations interpreted in light of theoretical expectations.
There are three main item response theory (IRT) models: 1) the one-parameter IRT model, 2) the two-parameter model, and 3) the three-parameter model. The one-parameter IRT model corresponds to the Rasch model. Distinct from other IRT models, the Rasch models meet requirements of fundamental measurement, such as sufficiency , additivity , invariance , and specific objectivity . On this background, the unidimensional Rasch model was applied in this study.
We tested data up against the partial credit parameterization  of the unidimensional Rasch model for polytomous data , and up against the partial credit parameterization of the “between-item” “multidimensional random coefficients multinomial logit” (MRCML) model . The latter was used when testing the HLS19-Q47 data up against a 12-dimensional model that reflects all 12 cells in the HLS-EU HL matrix: three health domains by four cognitive domains (12 correlated sub-scales). Using the unidimensional approach, we assume perfectly correlated subscales, that is, three perfectly aligned health domains (HP, DP, and HC) and/or four perfectly aligned cognitive domains (F, U, J and A). Using the three- and 12- dimensional approaches, we relax this constraint and allow health domains and/or cognitive domains to covary. Additionally, consecutive approach (treating the subscales as orthogonal or uncorrelated) was used when assessing item invariance. Models were estimated by applying the ConQuest 5 software  and the RUMM2030plus software .
For item-location estimates, RUMM2030plus uses pairwise maximum likelihood estimation (PMLE) , while ConQuest 5 uses marginal maximum likelihood estimation (MMLE) . Normality may be considered a prerequisite when using maximum likelihood estimation. As such, the raw data obtained from the scales measuring YP’s HL were transformed into person-location estimates (logit values) using RUMM2030plus and ConQuest 5 software. Subsequently, the transformed data could be considered continuous and at interval level, and there is evidence of data normality when examining the normal distribution histograms. For unbiased person-location estimates, both softwares apply Warm’s mean weighted likelihood estimation (WLE) . The average item-location estimate was set to 0.0 in all analyses.
Using Rasch measurement theory, we evaluated dimensionality, response dependency, targeting, reliability, item fit, differential item functioning (DIF), and ordering of response categories.
For each of the instrument versions, the dimensionality was assessed applying the combined principal component analysis (PCA) of residuals and paired t-test procedure [43, 47]. Based on the PCA, two subsets of items were identified. Person-location estimates on the respective two subsets were then compared using paired t-test. Multidimensionality is indicated when the proportion of individuals with significantly different person-location estimates on the compared subscales exceeds 5% [47, 48]. Unidimensionality is deemed to be strictly proved as opposed to multidimensionality . Given a normal distribution of the differences in person-location estimates derived from the two subsets, Tennant and Pallant  claimed that this approach is robust enough to detect multidimensionality. In such a case, where the proportion of individuals with significantly different person-location estimates on the compared subscales exceeds 5%, we also manually performed the binomial test, which is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories. If the proportion lower bound 95% confidence interval in terms of number of significant t-tests is lower than or equal to 0.05 (5%), then the scale could be considered sufficiently unidimensional.
Effective instruments do not collect redundant information and are free from response dependency, which is present when responses to an item are statistically dependent on the responses to a previous item. The average of the residual correlations added to 0.2 (average + 0.2) was used as a cut-off to indicate possible “significant” response dependency . When the responses to a pair of items are locally dependent, we may construct a subtest or, when developing instruments, delete one of the items.
Targeting of persons and items
For a well-targeted scale, the distribution of the person-estimates should match the distribution of the item threshold estimates or difficulties . As the scale is always centered on zero logits in the Rasch software, the mean person location value for a well-targeted scale would be close to the value of zero. Poor targeting may result in deflated variance in person estimates, which subsequently leads to poor person separation and deflated “test–retest” reliability indexes.
Reliability – internal consistency
The person separation reliability (PSR) and the person separation index (PSI) were estimated using the ConQuest 5 software and the RUMM2030plus software, respectively. In addition, Omega was estimated using the Mplus 8.6 software and the Microsoft excel-based tool to calculate ordinal Omega by standardized factor loadings and standardized residual variances . Frisbie  has suggested that the reliability of the sum scores should exceed 0.85 or 0.65 when drawing conclusions at the individual or group level, respectively.
Individual item fit
Using ConQuest 5, weighted Mean Square Error (infit MNSQ) or variance-weighted fit residual was used to indicate individual item fit to the Rasch model . The expected infit MNSQ value is 1, which implies perfect data-model fit. Using instruments at the population level, we consider 0.7 > infit < 1.3 as sufficient [32, 56]. Furthermore, item under- and over-discrimination relative to Rasch models was indicated by values significantly different from the expected value of 1 with an absolute value of the T statistic higher than 1.96 [55, 57]. Under-discriminating items most likely measure too much of “something else” that does not correlate positively with the latent trait, with the result that they will not discriminate sufficiently well between persons with high and low standing on the latent trait .
A non-significant chi-square item fit statistic (p > 0.05) indicates good data-model fit, but the probability of detecting significant values or “misfit items” increases by the number of significance tests performed. The Bonferroni correction is one of several methods to counteract this effect . For a 12-item scale, the Bonferroni adjusted chi-square probability is p/12 = 0.05/12 = 0.004.
Differential item functioning
A central requirement of the Rasch model is measurement invariance, which means that items should function in the same way across different groups of people , such as gender and people with different health status. Items display differential item functioning (DIF) when items have different relative difficulty (uniform DIF) or discriminate differently (non-uniform DIF) for different groups of people.
We explored whether the items displayed DIF for selected person factors by two-way analysis of variance (ANOVA) of standardized residuals and inspecting graphical displays . Owing to the inclusion criteria “YP aged 16–25 years”, we dichotomized participants’ highest education level (“upper secondary school or below” versus “above upper secondary school”), and we dichotomized participants’ age accordingly (16–20 years old versus 21–25 years old). Participants’ self-reported social status on a scale from 1 to 10 was dichotomized, as the two age groups probably define their level in the society based on different criteria due to life experiences: education level, living conditions, and economic status. Economic deprivation was present, as some reported difficulties with paying bills at the end of the month. Participants described their health status (mostly healthy or increased risk of/having a chronic health problem) and reported whether they suffered from long-term illness expected to last or had lasted for at least six months.
Ordered response categories
Polytomous items (here: 4-point response scale) with ordered response categories yield categorical data at the ordinal level. This implies significantly different and ordered thresholds, where thresholds are the locations at the latent trait where adjacent response categories are equally likely . Disordered thresholds indicate response categories not working as intended .
Confirmatory factor modelling
Using the software Mplus 8.6 , one- and three-factorial CFAs of the HLS19-YP12, HLS19-Q12, HLS19-SF12, and HLS19-Q12-NO data, were conducted to examine the correlation structure and item loadings in light of the theoretical framework – the HLS-EU health literacy matrix . The one-, two-, three-, four- and 12-factorial CFAs of the HLS19-Q47 data were supplementarily performed to assist confirmation of prior studies.
Following Asparouhov and Muthén , a significant model chi-square statistic implies that the suggested confirmatory factor model fails the “exact fit test”. Applying categorical data, weighted least square (WLS) estimator was used to obtain the model chi-square statistic . Other fit indices were estimated using robust diagonally weighted least squares (WLSMV) estimator: a default option for categorical data in Mplus 8.6. Using WLSMV estimators with ordered-category data, polychoric correlation coefficients were estimated and reported in Table 3.
Other absolute fit indices below their target value, such as the standardized root mean square residual (SRMR ≤ 0.080) combined with small residual correlation matrix entries  (i.e., absolute value ≤ 0.10) , indicate approximate fit. Other “goodness of fit” (GOF) indices (with target value in parenthesis) may assist model evaluation, such as the root mean square error of approximation (RMSEA ≤ 0.06), comparative fit index (CFI ≥ 0.95), and Tucker-Lewis index (TLI ≥ 0.95) . However, RMSEA values ≤ 0.08 may be considered acceptable in a small sample, whereas the other GOF indices suggest a good model fit. Additionally, CFI between 0.90 and 0.95 also indicates reasonable fit, while values < 0.90 are considered poor fit .
Developing the HLS19-YP12
The suggested 12-item short version in the present study was developed from analyses of the HLS19-Q47 and the other three 12-item short versions, applied in YP aged 16–25 in Norway. The development was stepwise: 1) exclude items that in the Rasch analyses displayed poor fit, DIF, disordered response categories, and that might collect redundant information; and 2) using CFA to assess the fit statistics, in which large residual correlation matrix entries indicate the need for model modifications. Items included in the suggested version were continuously ensured reflecting the conceptual 12-cell matrix.
Handling missing data
Missing data also comprises “don’t know” responses, which on average made up 2 percent of the data. The highest missing rates (5–7%) were observed for items 2, 3, 10, 11, 19 and 34, while items 8, 14, 22, 32, 33, 37, 38, 39, 40, 42, 43 and 44 had less than 1% missing values. However, using full information maximum likelihood (FIML) estimation, person-locations and item-thresholds are estimated based on available information .
Descriptive statistics and correlations between the items of HLS19-YP12
For all items, the percentage of participants who had the “difficult” and “very difficult” responses is lower than the percentage for responses of “easy” and “very easy” (Table 3). The most difficult items were item41, item10, and item18 with 46, 43, and 42% of (very) difficult responses, respectively. The easiest items were item4, item23, item46, and item13 with 86, 84, 81, and 80% of (very) easy responses, respectively. The correlations between the items of HLS19-YP12 could be considered small to medium (range: 0.190 – 0.474).
Overall data-model fit and unidimensionality of 12-item short versions
The HLS19-YP12, the HLS19-Q12-NO, and the HLS19-SF12 data displayed sufficiently overall fit to the PCM (non-significant overall chi-square statistic), while the HLS19-Q12 data did not. All short versions explored in our study had reliability indexes (PSR, PSI and Omega) above 0.65. The HLS19-YP12, the HLS19-Q12, and the HLS19-Q12-NO are considered sufficiently unidimensional, while the HLS19-SF12 is not (Table 4).
No response dependency was observed for any short version, but the HLS19-Q47 suffers from serious local dependency with up to 35 pairs of dependent items when applying the unidimensional PCM. For details, see Supplementary Table S1.
No short version was particularly well-targeted to the YP, but the distribution of item-threshold locations and the distribution of person locations were best aligned for the HLS19-YP12 (Fig. 1); mean person location for the scales HLS19-YP12, HLS19-Q12, HLS19-SF12, and HLS19-Q12-NO were 1.035, 1.155, 1.141, and 1.084, respectively (Table 4).
Exploring dimensionality by using confirmatory factor analysis
Comparing one- and three-factorial models, only the one-factor model of HLS19-YP12 achieved approximate fit with acceptable SRMR (0.030) and with no entry in the residual correlation matrix > 0.10 (Table 5). Supplementary Table S2 provides an overview of all entries in the residual correlation matrix based on all four 12-item scales, applying both one- and three-factor models. Other GOF indices indicated that the model-implied correlation matrix sufficiently well re-created the observed correlation matrix: RMSEA (0.039; 0.034), CFI/TLI (0.985/0.981; 0.989/0.986) (Table 5). Results related to the comprehensive scale HLS19-Q47 are supplementarily reported in Supplementary Table S3.
While all short versions: HLS19-YP12, HLS19-Q12, HLS19-SF12, and HLS19-Q12-NO, achieved SRMR < 0.080 for both one- and three-factorial models, the HLS19-SF12 had most entries in the residual correlation matrix > 0.10, whereas the HLS19-YP12 had none for the one-factor model and only one high entry (-0.13) for the three-factor model. Among the 12-item short scales, the HLS19-YP12 obtained the most acceptable standardized factor loadings applying the one-factor structure model (all items > 0.500) (Table 6).
Rasch analyses at item level for HLS19-YP12, HLS19-Q12, HLS19-SF12, and HLS19-Q12-NO
Individual item fit
Applying unidimensional Rasch modelling, all items for all short versions had acceptable infit values (Tables 7, 8, 9 and 10). For the HLS19-Q12, item31 had a T-value of 2.1 meaning that the item under-discriminated relative to the PCM. In addition, Bonferroni-adjusted chi-square probability (chi-square: 21.18; p < 0.001) for item42 in the same scale was significant (not reported in the Tables). Significant total item chi-square (Table 4) indicated also problems at the individual item level. Following this problem, Class Interval main effect indicating item misfit was also observed for this item concerning all person factor variables: age, gender, education, economic deprivation, level in society, long-term illness, and health status. Class Interval main effect was also observed, but only for the person factor “long-term illness”, in item45 in the HLS19-SF12 scale. Supplementary investigation of the HLS19-Q47 showed, however, there were five items (29, 34, 38, 41, 45) in the 12-dimensional model that under-discriminated relative to the PCM (Supplementary Table S4).
Differential item functioning—DIF
While there was no DIF observed, neither graphical nor by significant ANOVA tests, for any item in the HLS19-YP12, significantly uniform DIF was observed for the HLS19-Q12-NO in item14 for the “level in society” subgroups, whereas item45 in the HLS19-SF12 scale displayed significantly non-uniform DIF for the “long-term illness” subgroups (Fig. 2). Disregarding statistical Bonferroni-adjusted non-significance, investigation of the items using the item characteristic curves (ICCs) graphically displayed uniform DIF for the HLS19-Q12 in item42 for the “level in society” subgroups and for the HLS19-SF12 in item6 for the “health status” subgroups (not reported in the Figures).
Ordering of response categories
Among the four short versions, only item15 in the HLS19-SF12 and item16 in the HLS19-Q12 displayed disordered response categories. Figure 3 shows that response category “2” in both items was not the most likely category for any location on the continuum of person location estimates.
In several Western health care systems, the patient role has been redefined expecting patients to be a more active part in his/her care and decision-making . Accurate and precise measure of HL is very supportive for tailoring the communication between patients and health providers during the patient pathway. Similarly for the targeted public health measures. All this also applied to YP from the age of 16.
Despite the fact that the HLS19-Q47 and its short versions, HLS19-Q12, HLS19-SF12 and HLS19-Q12-NO, have been well studied and validated for the adult populations [21, 23, 31, 32], this study, to our knowledge, is the first one that simultaneously assessed the psychometric properties of all recently suggested 12-item versions of the HLS19-Q47 applied in YP aged 16–25.
Based on data from the Norwegian HLS19 study, the empirical evidence has weakened our null hypothesis associated with the psychometric properties of the previously 12-item short versions of the HLS19-Q47, i.e., HLS19-Q12, HLS19-SF12, and HLS19-Q12-NO. By examining poorly fitting items displayed from Rasch modelling and CFA, we successfully established a psychometrically sound parsimonious 12-item version (HLS19-YP12) for use among YP aged 16–25 years.
The empirical evidence suggested that the HLS19-YP12 has superior psychometric properties and convincingly outperforms other recently available 12-item short versions of the HLS19-Q47, i.e., HLS19-Q12, HLS19-SF12, and HLS19-Q12-NO.
Psychometric properties of the 12-item versions; HLS19-YP12, HLS19-Q12-NO, HLS19-Q12, and HLS19-SF12 at the overall level
Previous research has concluded that the HLS19-Q12-NO was psychometrically superior to other short versions of the HLS19-Q47 [21, 31]. However, the HLS19-Q12 was not reviewed in these studies. Nonetheless, all short versions have been suggested and validated for adult populations. Applied in data from YP, the HLS19-Q12-NO still seemed to fit the unidimensional Rasch model better than the other two scales, HLS19-Q12 and HLS19-SF12. Nevertheless, the present study provided evidence that the suggested HLS19-YP12 displayed even better fit to the unidimensional Rasch model than did the HLS19-Q12-NO, and unconditionally stood out as sufficiently unidimensional.
Applying the guidelines for CFA in Mplus set forth in Asparouhov and Muthén , established approximate fit was only tenable when SRMR ≤ 0.080 and all residuals were small (rres ≤ 0.10). Asparouhov and Muthén  claim that it would be inaccurate to consider models that have large residual values as approximately well-fitting models, as large residual values indicate major discrepancy between the model and the data. However, we exceptionally considered it acceptable if only remarkable few residuals that were barely higher than 0.10. Disregarding some residuals higher than 0.10, other GOF indices, such as RMSEA, CFI, and TLI, indicated that both one- and three-factorial models of the HLS19-Q12 and the HLS19-Q12-NO have relatively good data-model fit. Furthermore, the HLS19-SF12 did also display acceptable data-model fit based on these GOF indices. Nevertheless, researchers have discussed whether it is expedient to assess the other GOF indices (RMSEA, CFI, and TLI) when the criterion of SRMR and all small residuals are not met . Large residual values indicate that model modifications are needed.
Based on our national representative sample (n = 890) of youth aged 16–25 years, it is strongly evident that the one-factorial CFA model explains best the data from the HLS19-YP12 in comparison with other 12-item short versions, as well as the data from this scale fitted best the unidimensional polytomous Rasch model.
All the 12-item short scales obtained a positive mean person location value indicating that the data as a whole was located at a higher level than the average of the scale. In other words, the items are deemed to be too easy for the participants’ ability. The HLS19-Q12 and the HLS19-SF12 obtained the highest values of mean person location, and we might have witnessed to the ceiling effect (extreme person scores), in which poor targeting have caused disordered response categories . Out of the four 12-item short scales, the distribution of item-threshold locations and the distribution of person locations were best aligned for the HLS19-YP12 (Fig. 1), which is reflected by the lowest mean person location value (1.035). However, the instrument could benefit from adding items that are harder to endorse.
Psychometric properties of the 12-item versions; HLS19-YP12, HLS19-Q12, HLS19-Q12-NO, and HLS19-SF12 at item level
In accordance with results from the Rasch analyses of the HLS19-Q12 when applied in adult populations , item31 in the HLS19-Q12 also displayed poor item fit and was the only item within all four short versions that under-discriminated. In addition to item31, item28 deals with difficulties appraising and applying health information from “mass media” as there were added instructions guiding the participants to recognize what mass media (i.e., Newspapers, TV, or Internet) refers to . The various types of media might have caused the undistinguished response pattern regardless of the participant’s HL level, as the difficulty of appraising or applying information from mass media might be dependent on what kind of media they refer to.
Applying the one-factorial CFA model (Table 6), item28 in the HLS19-Q12-NO displayed the second lowest factor loading while item31 in the HLS19-Q12 had the lowest loaded factor on their respective dimensions. Therefore, items referring to the mass media, likely perceived as digital resources, may be replaced by other items as they are more likely related to e.g., a digital HL construct, which is another aspect of the overall HL.
Differential item functioning
DIF for societal levels was observed for item14 […to follow instructions on medication] and item42 […to judge how your housing conditions may affect your health and well-being] in the HLS19-Q12-NO and the HLS19-Q12, respectively. Supplementary analyses were conducted to understand why DIF was displayed for societal level among YP. The results (not reported in the Tables) showed that while about 80% of the youngest subgroup placed themselves at the highest societal level, only 60% of the oldest subgroup did the same. It could be explained that different understandings of societal level among the youngest and the oldest subgroups, i.e., a 16-year-old might perceive not owing a popular piece of clothing, like an expensive jacket, as being placed at a very low level in the society, whereas a 25-year-old might have another opinion and perception based on the wider context. In turn, the different perceptions might have caused the DIF for societal levels observed in item14 and item42 in the HLS19-Q12-NO and the HLS19-Q12, respectively. However, there is no evidence of DIF for age groups.
Further investigation of reasons to why there is DIF for item14 and item42, a supplementary frequency analysis (not reported) was conducted showing that 89% of the youngest subgroup answered (very) easy on item14: to follow instructions on medication. Shed light on this result, one might recognize that parents could have played an important role giving YP both a reminder and guidance  concerning medications and applying the information provided from the doctor with regard to medications. Surprisingly, the same proportions (80%) of both age subgroups have answered (very) easy on item42, as one might have expected a higher proportion of the youngest who experienced it more difficult considering that they are still living at their parents’ place. This demonstrates that the YP are as reflected as the adult population in these kind of questions, yet this phenomenon is to be investigated further in more details.
Ordered response categories
Disordered response categories might be explained by too few persons located at the specific threshold levels and it is most likely due to bad targeting as well . Item16 in the HLS19-Q12 showed that the first two thresholds were very close together and slightly reversed. More severely disordered response categories were identified on item15 in the HLS19-SF12, in which the two first thresholds were clearly reversed. The latter case weakened the hypothesis of ordinal data.
Item13, item36, item41, and item46 in the HLS19-YP12 are the unique items and distinguished from the other three 12-item scales. The remaining eight items (item4, item7, item10, item18, item23, item26, item30 and item38) are to be found in either the HLS19-Q12, the HLS19-SF12, or the HLS19-Q12-NO. Especially item26 […to judge which vaccinations you or your family may need] and item36 […to find information about how to promote health at work, at school, or in the neighborhood] are particularly relevant to YP, as they still have to deal with, e.g., vaccination programs and other health related issues at school age. Adopting these two items in the new 12-item short version contributed to responding on the critique from Bröder et al.  concerning the lack of YP’s specific needs and social structures in most of the models.
However, the face validity has not been explicitly performed for the instruments beforehand towards participants aged 16–17 years. This age group and 18-year-old persons, most likely represent pupils in the upper secondary school, so that the readability and response burden for this group was assumed not critically derived from the burden separately applied to persons aged 18 years. When examining the response time median (range: 16.7—18.9 min), it is evident that the response burden was not different for the 16- and 17-year-old participants (17.7 and 17.3 min, respectively) compared to participants aged 18–25 (range: 16.7—18.9 min). Even though the understandability of item content has been ensured through cognitive interviews in young adults aged 18 and above, more interviews may be considered for YP below 18, confirming that the items are also well understood in this target group.
Notably, one of the strengths of the HLS19-YP12 instrument is that it was developed based on a definition and conceptual framework of HL, by which the content validity has been ensured. Furthermore, the new instrument has included items that are considered more likely related to younger people, such as vaccination and health promoting activities in school and neighborhood. As for the scale’s targeting, the distribution of both item-threshold and person locations were best aligned for the HLS19-YP12, indicating that the content in the new instrument was better adapted to the target population.
Finally, YP are expected to use social media and digital platforms actively to access health information [6, 73]. Surprisingly, items related to mass media, e.g., item28 in the HLS19-Q12-NO and item31 in the HLS19-Q12 tend to under-discriminate. A prior study  might have provided the explanation, that YP preferred to utilize their family as information resources rather than social media platforms. Furthermore, YP might have perceived mass media as part of another construct relative to digital health information platforms and skills.
The sample size of the HLS19-Q47, the HLS19-YP12, and the HLS19-SF12 was limited to n = 419. Therefore, all analyses that aimed to compare the various short versions were based on this sample size. There are no strict requirements for sample size in Rasch modelling. However, a rule of thumb assumes the useful sample size for a test of 12 polytomous items with 3 thresholds should comprise at least 360 up to 720 persons, in which a reasonable ratio is between 10 to 20 persons for each threshold . Mundfrom et al.  suggested that the minimum sample size for applying CFA is depending on the variables-to-factors ratio and the number of factors that are present in the data. However, Hair et al.  claimed that a sample size above 300 are unlikely to produce Heywood cases. Hence, we assumed that our sample size of n = 419 was sufficient for the analyses performed. Taking into consideration that data-model fit and analysis of DIF in Rasch modelling and exact fit in CFA both are relatively sensitive to sample size, in which DIF in Rasch modelling is more likely with larger sample size and model Chi-square significance in CFA is more sensitive to smaller sample size. Thus, interpretation of the findings might be doing with some cautions.
In this study, we have applied both modern (Rasch modelling) and classical test theory (CFA). However, future research may also consider other relevant modern short-form development techniques. Finally, the HLS19-YP12 was developed and psychometrically assessed based on national data. Hence, the psychometric properties of the instruments should be further assessed using multinational data.
The revised version of HLS-EU-Q47 (HLS19-Q47) was supplementarily confirmed to fit a 12-dimensional model best. Hence, it is not statistically defensible to report total score for individuals based on this scale as the person estimates of HL (person locations) cannot derive from her/his raw score from the multidimensional scale. This principle also applies to all short versions that are not sufficiently unidimensional.
Remaining as the best-fitted 12-item short version to the unidimensional Rasch model and the one-factorial CFA, including factor loading > 0.500 achievement for all items, the HLS19-YP12 is the first sufficiently unidimensional and conceptually developed HL instrument towards young people aged 16–25. This instrument is psychometrically superior and convincingly outperformed the other three 12-item short versions. Consequently, the HLS19-YP12 offers an efficient and much-needed screening tool for use among YP, which is likely a useful application in processes towards the development and evaluation of health policy and public health work, as well as for use in clinical settings.
Based on relatively strong evidence from the study, we suggest that the HLS19-YP12 instrument (Table S5) is preferred in future studies measuring HL among YP from the age of 16.
Availability of data and materials
The datasets used and/or analyzed during the current study are not publicly available, but can be accessed by applying to the Norwegian Study Centre of HLS19 via this website: https://www.oslomet.no/forskning/forskningsprosjekter/befolkningens-helsekompetanse-hls19
Apply health information
Akaike’s information criterion
Two-way analysis of variance
Confirmatory factor analysis
Comparative fit index
Degree of freedom
Differential item functioning
Finding health information
Information maximum likelihood
European Health Literacy Survey Questionnaire
European Health Literacy Population Survey 2019–2021 (HLS19) Questionnaire
Item characteristic curve
Item response theory
Judging/appraising health information
Marginal maximum likelihood estimation
WHO action network on measuring population and organisational health literacy
- MRCML model:
“Multidimensional random coefficients multinomial logit” model
Principal component analysis
Pairwise maximum likelihood estimation
Person separation index
Person separation reliability
Root-mean-squared error of approximation
Standardized root mean square residual
Tucker and Lewis fit index
Understanding health information
Warm’s mean weighted likelihood estimation
Weighted least square mean and variance estimators
Mosquera PA, Waenerlund A-K, Goicolea I, Gustafsson PE. Equitable health services for the young? A decomposition of income-related inequalities in young adults’ utilization of health care in Northern Sweden. Int J Equity Health. 2017;16(1):20.
Haugen ALH, Riiser K, Esser-Noethlichs M, Hatlevik OE. Developing indicators to measure critical health literacy in the context of Norwegian lower secondary schools. Int J Environ Res Public Health. 2022;19(5):3116.
Riiser K, Helseth S, Haraldstad K, Torbjørnsen A, Richardsen KR. Adolescents’ health literacy, health protective measures, and health-related quality of life during the Covid-19 pandemic. PLoS ONE. 2020;15(8): e0238161.
Paakkari LT, Torppa MP, Paakkari O-P, Välimaa RS, Ojala KSA, Tynjälä JA. Does health literacy explain the link between structural stratifiers and adolescent health? Eur J Pub Health. 2019;29(5):919–24.
Levesque J-F, Harris MF, Russell G. Patient-centred access to health care: conceptualising access at the interface of health systems and populations. International Journal for Equity in Health. 2013;12(1):18.
Paakkari L, Paakkari O. Health literacy as a learning outcome in schools. Health Education. 2012.
Sørensen K, Van den Broucke S, Fullam J, Doyle G, Pelikan J, Slonska Z, et al. Health literacy and public health: a systematic review and integration of definitions and models. BMC Public Health. 2012;12(1):1–13.
McCormack L, Haun J, Sørensen K, Valerio M. Recommendations for advancing health literacy measurement. J Health Commun. 2013;18(sup1):9–14.
The Ministry of Health and Care Services (Norway). A Norwegian Strategy to increase Health Literacy in the Population. Norway: The Norwegian Government; 2019.
Okan O, Bauer U, Levin-Zamir D, Pinheiro P, Sørensen K. International Handbook of Health Literacy: Research, practice and policy across the lifespan: Policy Press. 2019.
Griffin JM, Partin MR, Noorbaloochi S, Grill JP, Saha S, Snyder A, et al. Variation in estimates of limited health literacy by assessment instruments and non-response bias. J Gen Intern Med. 2010;25(7):675–81.
Haun J, Luther S, Dodd V, Donaldson P. Measurement variation across health literacy assessments: implications for assessment selection in research and practice. J Health Commun. 2012;17(sup3):141–59.
Sørensen K, Pleasant A. Understanding the conceptual importance of the differences among health literacy definitions. Stud Health Technol Inform. 2017;240:3–14.
Guo S, Armstrong R, Waters E, Sathish T, Alif SM, Browne GR, et al. Quality of health literacy instruments used in children and adolescents: a systematic review. BMJ Open. 2018;8(6):e020080.
Okan O, Lopes E, Bollweg TM, Bröder J, Messer M, Bruland D, et al. Generic health literacy measurement instruments for children and adolescents: a systematic review of the literature. BMC Public Health. 2018;18(1):166.
Ormshaw MJ, Paakkari LT, Kannas LK. Measuring child and adolescent health literacy: a systematic review of literature. Health Educ. 2013;113(5):433–55.
Perry EL. Health literacy in adolescents: an integrative review. J Spec Pediatr Nurs. 2014;19(3):210–8.
Urstad KH, Andersen MH, Larsen MH, Borge CR, Helseth S, Wahl AK. Definitions and measurement of health literacy in health and medicine research: a systematic review. BMJ Open. 2022;12(2): e056294.
The HLS19 Consortium of the WHO Action Network M-POHL. International Report on the Methodology, Results, and Recommendations of the European Health Literacy Population Survey 2019–2021 (HLS19) of M-POHL. Vienna: Austrian National Public Health Institute; 2021.
The HLS-EU Consortium. Measurement of health literacy in Europe: HLS-EU-Q47; HLS-EU-Q16; and HLS-EU-Q86. Health Literacy Project 2009–2012. Maastricht: The HLS-EU Consortium; 2012.
Finbråten HS, Wilde-Larsson B, Nordström G, Pettersen KS, Trollvik A, Guttersrud Ø. Establishing the HLS-Q12 short version of the European health literacy survey questionnaire: latent trait analyses applying Rasch modelling and confirmatory factor analysis. BMC Health Serv Res. 2018;18(1):1–17.
Le C, Finbråten HS, Pettersen KS, Joranger P, Guttersrud Ø. Health Literacy in the Norwegian Population. English Summary. In: Befolkningens helsekompetanse, del I. The International Health Literacy Population Survey 2019–2021 (HLS19)–et samarbeidsprosjekt med nettverket M-POHL tilknyttet WHO-EHII: The Norwegian Directorate of Health. 2021.
Duong TV, Aringazina A, Kayupova G, Nurjanah F, Pham TV, et al. Development and validation of a new short-form health literacy instrument for the general public in six Asian countries. Health Lit Res Pract. 2019;3(2):91–102.
van der Heide I, Rademakers J, Schipper M, Droomers M, Sørensen K, Uiters E. Health literacy of Dutch adults: a cross sectional survey. BMC Public Health. 2013;13(1):1–11.
Sørensen K, Van den Broucke S, Pelikan JM, Fullam J, Doyle G, Slonska Z, et al. Measuring health literacy in populations: illuminating the design and development process of the European Health Literacy Survey Questionnaire (HLS-EU-Q). BMC Public Health. 2013;13(1):1–10.
Rouquette A, Nadot T, Labitrie P, Van den Broucke S, Mancini J, Rigal L, et al. Validity and measurement invariance across sex, age, and education level of the French short versions of the European health literacy survey questionnaire. PLoS ONE. 2018;13(12): e0208091.
Duong TV, Aringazina A, Baisunova G, Pham TV, Pham KM, Truong TQ, et al. Measuring health literacy in Asia: validation of the HLS-EU-Q47 survey tool in six Asian countries. J Epidemiol. 2017;27(2):80–6.
Duong VT, Lin I-F, Sorensen K, Pelikan JM, Van den Broucke S, Lin Y-C, et al. Health literacy in Taiwan: a population-based study. Asia Pacific Journal of Public Health. 2015;27(8):871–80.
Nakayama K, Osaka W, Togari T, Ishikawa H, Yonekura Y, Sekido A, et al. Comprehensive health literacy in Japan is lower than in Europe: a validated Japanese-language assessment of health literacy. BMC Public Health. 2015;15(1):1–12.
Finbråten HS, Pettersen KS, Wilde-Larsson B, Nordström G, Trollvik A, Guttersrud Ø. Validating the European health literacy survey questionnaire in people with type 2 diabetes: Latent trait analyses applying multidimensional Rasch modelling and confirmatory factor analysis. J Adv Nurs. 2017;73(11):2730–44.
Maie A, Kanekuni S, Yonekura Y, Nakayama K, Sakai R. Evaluating short versions of the European Health Literacy Survey Questionnaire (HLS-EU-Q47) for health checkups. Health Evaluation and Promotion. 2021;48(4):351–8.
Guttersrud Ø, Le C, Pettersen KS, Finbråten HS. Rasch analyses of data collected in 17 countries: a technical report to support decision-making within the M-POHL consortium. In: Publications on international HLS19 results. Available from: https://m-pohl.net/Rasch_Analy. Accessed 16 Nov 2022.
Brislin RW. Back-translation for cross-cultural research. J Cross Cult Psychol. 1970;1(3):185–216.
Drennan J. Cognitive interviewing: verbal data in the design and pretesting of questionnaires. J Adv Nurs. 2003;42(1):57–63.
Andersen EB. Sufficient statistics and latent trait models. Psychometrika. 1977;42(1):69–81.
Andrich D. Distinctions between assumptions and requirements in measurement in the social sciences. Math theor Syst. 1989;4:7–16.
Andrich D. Rasch models for measurement. Newsbury Park, CA: SAGE Publications; 1988.
Stenner A. Specific objectivity-local and general. Rasch Meas Trans. 1994;8(3):374.
Masters GN. A Rasch model for partial credit scoring. Psychometrika. 1982;47(2):149–74.
Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press; 1980.
Adams RJ, Wilson M, Wang WC. The multidimensional random coefficients multinomial logit model. Appl Psychol Meas. 1997;21(1):1–23.
Adams R, Cloney D, Wu M, Osses A, Schwantner V, Vista A, et al. ACER ConQuest Manual. In: ConQuest Notes and tutorials. Available from: https://conquestmanual.acer.org/. Accessed 17 Apr 2022.
RUMM laboratory Pty Ltd. Displaying the RUMM 2030 Analysis: Plus Edition. Duncraig: RUMM laboratory Pty Ltd.; 2019.
Katsikatsou M, Moustaki I, Yang-Wallentin F, Jöreskog KG. Pairwise likelihood estimation for factor analysis models with ordinal data. Comput Stat Data Anal. 2012;56(12):4243–58.
Bock RD, Aitkin M. Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika. 1981;46(4):443–59.
Warm TA. Weighted likelihood estimation of ability in item response theory. Psychometrika. 1989;54(3):427–50.
Smith EV Jr. Understanding Rasch measurement: detecting and evaluating the impact of multidimensionality using item fit statistics and principal components analysis of residuals. J Appl Meas. 2002;3(2):205–31.
Hagell P. Testing rating scale unidimensionality using the principal component analysis (PCA)/t-test protocol with the Rasch model: the primacy of theory over statistics. Open J Stat. 2014;4(6):456–65.
Strout WF. A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika. 1990;55(2):293–325.
Tennant A, Pallant JF. Unidimensionality matters. Rasch MeasTrans. 2006;20(1):1048–51.
Christensen KB, Makransky G, Horton M. Critical values for Yen’s Q3: Identification of local dependence in the Rasch model using residual correlations. Appl Psychol Meas. 2017;41(3):178–94.
Tennant A, Conaghan PG. The Rasch measurement model in rheumatology: what is it and why use it? When should it be applied, and what should one look for in a Rasch paper? Arthritis Care Res. 2007;57(8):1358–62.
Dueber DM. Bifactor Indices Calculator: A Microsoft Excel-based tool to calculate various indices relevant to bifactor CFA models. 2017.
Frisbie DA. Reliability of scores from teacher-made tests. Educ Meas Issues Pract. 1988;7(1):25–35.
Smith RM, editor. Using item mean squares to evaluate fit to the Rasch model. The annual meeting of the American educational research association; San Francisco, CA. 1995.
Wright B, Linacre JM. Reasonable mean-square fit values. In: Rasch measurement transactions contents. https://www.rasch.org/rmt/rmt83b.htm. Accessed 22 May 2022.
Adams RJ, Wu ML, (August 2010). Tutorial 7 - Multidimensional models. In: ConQuest notes and tutorials. https://conquestmanual.acer.org/s2-00.html#s2-08. Accessed 17 Apr 2022.
Masters GN. Item discrimination: when more is worse. J Educ Meas. 1988;25(1):15–29.
Bland JM, Altman DG. Multiple significance tests: the Bonferroni method. BMJ. 1995;310(6973):170–1.
Andrich D, Marais I. A Course in Rasch Measurement Theory: Measuring in the Educational, Social and Health Sciences. Singapore: Springer; 2019.
Andrich D, de Jong J, Sheridan B. Diagnostic opportunities with the Rasch model for ordered response categories. In: Rost J, Langeheine R, editors. Applications of Latent Trait and Latent Class Models in the Social Sciences. New York, NY: Waxmann Verlag GMBH; 1997. p. 59–70.
Muthén LK, Muthén BO. Mplus User’s Guide. 8th ed. Los Angeles, CA: Muthén & Muthén; 1998-2017.
Asparouhov T, Muthén B, (2nd May 2018). SRMR in Mplus. In: Mplus: technical appendices related to new features in version 8. https://www.statmodel.com/download/SRMR2.pdf. Accessed 17 Apr 2022.
Asparouhov T, Muthén B, (26th April 2022). Assessing model fit for SEM models with categorical variables via contingency tables. In: Mplus: technical appendices related to new features in version 8. https://www.statmodel.com/download/Tech10.pdf. Accessed 15 May 2022.
Kline RB. Principles and Practice of Structural Equation Modeling. 4th ed. New York: The Guilford Press; 2016.
Hu Lt, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Struct Equation Model. 1999;6(1):1–55.
Brown TA. Confirmatory factor analysis for applied research. New York: Guilford Publications; 2015.
Håkansson Eklund J, Holmström IK, Kumlin T, Kaminsky E, Skoglund K, Höglander J, et al. “Same same or different?” A review of reviews of person-centered and patient-centered care. Patient Educ Couns. 2019;102(1):3–11.
Hagquist C, Bruce M, Gustavsson JP. Using the Rasch model in nursing research: an introduction and illustrative example. Int J Nurs Stud. 2009;46(3):380–93.
Domanska OM, Firnges C, Bollweg TM, Sørensen K, Holmberg C, Jordan S. Do adolescents understand the items of the European Health Literacy Survey Questionnaire (HLS-EU-Q47) – German version? Findings from cognitive interviews of the project “Measurement of Health Literacy Among Adolescents” (MOHLAA) in Germany. Arch Public Health. 2018;76(1):46.
Hagquist C, Andrich D. Is the sense of coherence-instrument applicable on adolescents? A latent trait analysis using Rasch-modelling. Pers Individ Differ. 2004;36(4):955–68.
Bröder J, Okan O, Bauer U, Bruland D, Schlupp S, Bollweg TM, et al. Health literacy in childhood and youth: a systematic review of definitions and models. BMC Public Health. 2017;17(1):1–25.
Esmaeilzadeh S, Ashrafi-Rizi H, Shahrzadi L, Mostafavi F. A survey on adolescent health information seeking behavior related to high-risk behaviors in a selected educational district in Isfahan. PLoS ONE. 2018;13(11): e0206647.
Mundfrom DJ, Shaw DG, Ke TL. Minimum Sample Size Recommendations for Conducting Factor Analyses. Int J Test. 2005;5(2):159–68.
Jr Hair JF, Black WC WC, Babin BJ BJ, Anderson RE. Multivariate Data Analysis. 7th ed. Upper Saddle River: Prentice Hall; 2009. Print.
The authors thank Professor emeritus Kjell Sverre Pettersen [principal investigator of the Norwegian HLS19 study] for contributing to this research and providing feedback on the study’s conception and design as well as his contribution during the data collection.
The overall data collection was funded by the Norwegian Directorate of Health. The publication of this study was supported by the internal funding from the Inland Norway University of Applied Sciences.
Ethics approval and consent to participate
The study was conducted in accordance with the ethical principles outlined in the Declaration of Helsinki. The Data protection services at the Norwegian Centre for Research Data (NSD) was notified about the project. The study was considered outside the Norwegian Act of Medical and Health Research, thereby, not required an approval from the Norwegian Regional Committees for Medical and Health Research Ethics. The NSD approved the project (project number 896850). The approval concerns the use of personal/private data (questionnaires, consent form, storage of data, etc.). Participation was voluntary, and the questionnaire was completed anonymously. As data were collected using telephone interviews, verbal informed consent was obtained from the participants. From 1st January 2022, NSD was merged with two other Norwegian organizations: Uninett and The Directorate for ICT and joint services in higher education and research, to form the new Norwegian Agency for Shared Services in Education and Research (Sikt).
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Table S1. Overall fit statistics applying unidimensional and multidimensional Rasch models of HLS19-Q47 and its short versions.
Table S2. Entries in the residual correlation matrix for the 12-item short scales.
Table S3. Fit statistics for different factor structures applying confirmatory factor analyses of the HLS19-Q47.
Table S4. Item characteristics and DIF of HLS19-Q47 applying the 12-dimensional Rasch model.
Table S5. The HLS19-YP12 instrument with response options.
About this article
Cite this article
Le, C., Guttersrud, Ø., Sørensen, K. et al. Developing the HLS19-YP12 for measuring health literacy in young people: a latent trait analysis using Rasch modelling and confirmatory factor analysis. BMC Health Serv Res 22, 1485 (2022). https://doi.org/10.1186/s12913-022-08831-4