Methodological review: quality of randomized controlled trials in health literacy

Background The growing move towards patient-centred care has led to substantial research into improving the health literacy skills of patients and members of the public. Hence, there is a pressing need to assess the methodology used in contemporary randomized controlled trials (RCTs) of interventions directed at health literacy, in particular the quality (risk of bias), and the types of outcomes reported. Methods We conducted a systematic database search for RCTs involving interventions directed at health literacy in adults, published from 2009 to 2014. The Cochrane Risk of Bias tool was used to assess quality of RCT implementation. We also checked the sample size calculation for primary outcomes. Reported evidence of efficacy (statistical significance) was extracted for intervention outcomes in any of three domains of effect: knowledge, behaviour, health status. Demographics of intervention participants were also extracted, including socioeconomic status. Results We found areas of methodological strength (good randomization and allocation concealment), but areas of weakness regarding blinding of participants, people delivering the intervention and outcomes assessors. Substantial attrition (losses by monitoring time point) was seen in a third of RCTs, potentially leading to insufficient power to obtain precise estimates of intervention effect on primary outcomes. Most RCTs showed that the health literacy interventions had some beneficial effect on knowledge outcomes, but this was typically for less than 3 months after intervention end. There were far fewer reports of significant improvements in substantive patient-oriented outcomes, such as beneficial effects on behavioural change or health (clinical) status. Most RCTs featured participants from vulnerable populations. Conclusions Our evaluation shows that health literacy trial design, conduct and reporting could be considerably improved, particularly by reducing attrition and obtaining longer follow-up. More meaningful RCTs would also result if health literacy trials were designed with public and patient involvement to focus on clinically important patient-oriented outcomes, rather than just knowledge, behaviour or skills in isolation. Electronic supplementary material The online version of this article (doi:10.1186/s12913-016-1479-2) contains supplementary material, which is available to authorized users.


Background
As populations age and live longer with (often multiple) chronic conditions and healthcare becomes ever more complex and fragmented there is widespread recognition that many people lack the knowledge and skills required to best manage their own health needs, particularly in response to medical advice. This set of useful capacities are often described under the umbrella term of Health Literacy, which may also be defined as the degree to which individuals have the capacity to obtain, process, understand and use basic health information and services needed to make appropriate health decisions [1].
Inadequate health literacy (HL) is strongly associated with poor health outcomes in populations or individuals [2], and this lack of knowledge, skills and capacity has been estimated to add an extra 3-5 % to national health care budgets [3]. Disadvantaged groups are at greater risk of having relatively low health literacy, which also makes it a social justice issue [4]. Addressing health literacy is essential to maximise the potential positive benefits of other popular initiatives in modern medicine, such as patient-centred care [5], and shared decisionmaking [6]. It has even been argued that, with regard to verifying efficacy, addressing "health literacy is as important as randomization and statistical analyses in the research design of educational interventions" [7].
Hence, there is growing focus on efforts to improve health literacy in the hope that this will also result in better patient outcomes and wider societal benefits. The policy agenda of improving population health literacy has been embraced by governments [8][9][10] and professional associations [1,11,12]. Academic research on HL is increasingly prominent with growing numbers of published interventional studies. Systematic reviews of health literacy interventions [2,13] report overall modestly positive impacts, but there appears to be some variation in efficacy, with inconsistent long-term improvements of health literacy skills in individuals. More importantly, reviews also repeatedly comment that there is a lack of high quality evidence about which strategies are most effective [2,9,14].
Recommendations have emerged that there should be comprehensive evaluation measures with development of experimental designs that better support the research outcomes in relation to health literacy interventions [9,15]. The lack of robust evidence creates difficulties in evidence grading and policy development [9,16] because decisions on use of HL interventions should be backed up by good evidence of benefit, with low risk of bias from methodologically rigorous trials.
Consequently, there is a pressing need to assess the methodology used in contemporary HL interventions, in particular the quality (or risk of bias [17]) within randomized controlled trials (RCTs). Identification of methodological strengths and limitations would raise awareness of design limitations, and stimulate development of rigorous and perhaps innovative strategies for high quality HL RCTs. Hence, we conducted a methodological review of recent RCTs describing HL interventions in order to determine the methodological quality and types of outcomes evaluated. Many attributes of quality of trial conduct were considered (detailed below). Also, because health literacy can be a social justice issue, it may be argued that HL interventions should prioritise targeting vulnerable individuals. Yet, other research suggests that vulnerable groups are under-represented in both health-management programmes [18] and clinical research trials [19]. Therefore, we also recorded demographic characteristics of patients in each study, in order to determine the types of populations involved in HL trials.

Selection criteria
We aimed to assess contemporary (publication year 2009-2014) randomized controlled trials of health literacy interventions in the published literature. The selection criteria were: Participants were adults age 18 years and above Must describe an intervention that applies health literacy concepts (compatible with the definition of health literacy as stated earlier [1]), or uses low literacy tools to improve health-related outcomes. We only included articles where health literacy was explicitly stated and a key component in how the intervention was designed Intervention target must be same person that the benefits are measured for Outcomes must be measured in terms of improvement in three areas, that we broadly summarise as 'knowledge' , 'behaviour' or 'patient well-being/health'. These are described in much more detail below.
We excluded studies that featured children (under 18 years old) as targets or indirect beneficiaries, or that were applied within formal educational programmes leading to qualifications. Studies that looked at associations or correlation rather than effect of the interventions, (for instance, a health promotion project that made a link between outcomes and pre-existing health literacy levels in participants), were excluded. We excluded abstracts because we felt that there would not be sufficient space for authors to report full methodological details.

Search strategy
We searched Pubmed on 1 December 2014, using a validated algorithm [20] that gives the best balance between specificity and sensitivity: Those search terms were duplicated as closely as permissible to find additional articles in the following sources (see Additional file 1: Table S1 for specific search terms): Embase, Cochrane central, Cinahl, Psychinfo and (United States National Cancer Institute) Research-tested Intervention Programs (RTIPS). There were no restrictions for language or country.

Study screening and data extraction
Abstracts and titles were independently duplicate-screened to remove citations that failed to meet the inclusion criteria (listed above under selection criteria). The full text version of the remaining potentially relevant articles that passed through abstract + title screening was read by two screeners (JB, YKL) to confirm eligibility. Data were independently extracted by at least two reviewers (JB, SHW, YKL) from all remaining eligible articles, using a customised data extraction form that recorded bibliographic details, study location(s) and funder(s), patient demographics, number of patients in each trial arm and outcomes. Disagreements at all stages were resolved by discussion or using referral to a third reviewer (YKL or CS).

Assessment of trial quality
Relevant to statistical significance, we noted whether recruitment targets were calculated to ensure that statistical calculations were adequately powered for the stated "primary" outcome, and whether actual recruitment subsequently met stated targets.
We used the Cochrane Risk of Bias tool to assess quality of RCT implementation and reporting [17]. Risk of bias (RoB) for random sequence generation was assessed as low if the authors indicated that a random number generator was used. Low RoB for allocation concealment resulted if the authors explained clearly how both participants and investigators were masked to group assignment at the moment of allocation. RoB for trial performance or detection was only assessed as low if investigators were masked during intervention delivery (performance) and monitoring (detection). Low RoB for attrition was assessed if total loss was below 20 % between intervention start and last monitoring date.
In order to detect the possibility of bias due to selective reporting bias, we recorded and compared the prespecified outcomes in the Methods section against the list of outcomes that were actually reported in the Results section (see Additional file 1: Item S2). This was intended to allow us to judge the possibility of any missing outcome data or subsequent addition of post-hoc subgroup analyses based on presence or absence of statistically significant findings. Previous comparisons of protocols or registry entries of published reports for RCTs suggested that it is not unusual for primary outcomes in final reports to vary from those that were prespecified [21].

Evidence of efficacy
We categorized outcomes of the interventions into one of three categories: knowledge, behaviour or healthrelated (K, B or H), by drawing on relationships between knowledge and behaviour with health outcome improvements that have been identified by others [22,23]. Knowledge encompassed both patient understanding and awareness of the disease course and aims/strategies of treatment (e.g., complications of high blood pressure and drugs that can control it). Behaviour changes included all actions that may improve health (such as using smartphone reminders for timing of blood pressure medication) or improvement in skills, self-efficacy, readiness to change or intentions. Health outcomes were defined as clinical measures in patients such as reduction in distress (or better quality of life), weight loss or improved disease indicators (such as lower blood pressure or normal laboratory test results). These changes could be assessed subjectively or objectively. We were concerned to focus on outcomes that indicate independent acquisition of understanding or skills, shown through sustained behaviour or improvements in experienced health. We did not aim to analyse or categorize some outcomes such as decisions to take-up cancer screening tests [24] and accuracy in following simultaneous instructorguided instructions [25], because these involved complex, multifaceted components, subjective value judgement or manual dexterity skills in artificial test settings. We extracted information on the stated RCT outcomes in KBH categories whether or not they were significantly better (p ≤ 0.05) for the intervention over control arm.

Demographics of participants and other study aspects
We recorded participant characteristics, including socioeconomic traits: mean age, gender balance and percentage of trial participants that were ethnic minorities (within that country), low income (defined as income < US $20,000/year), or low education (less than 12 years or US high school diploma equivalent (GED). We also noted the duration of monitoring post intervention.

Results
We identified 328 potentially relevant RCT reports from the searches, and after screening, we performed data extraction on 40 included papers (references numbered 24, 25, 28-48 and 60-76). The study selection process is shown in Fig. 1.

Study characteristics
The number of RCTs published each year increased over time, starting with three papers in 2009, four papers from 2010, five articles from 2011, seven published in 2012, 13 in 2013 and eight papers in 2014. Typically, interventions were educational in approachfull details of the interventions, control arms, and the relationship to health literacy are listed in Additional file 1: Table S3. Twelve papers addressed aspects of managing chronic illness (most often Type 2 diabetes). Correct administration of medication and aspects of mental health problems were each the focus of seven RCTs. Nutrition choices were the focus of six articles. Other trials covered cancer screening, physical activity, patient-provider communication, sterilisation choices and preventing cardiovascular disease. Country locations were 82.5 % USA, 15 % Australia, and one study in Iran (2.5 %). Funding mostly came from national government bodies. 55 % were solely government funded, 15 % of articles mentioned funding from only charitable groups. 12 % of articles listed a mix of government and charity funders, while in 18 % of studies the funding was other (usually insurance companies or academic institutions) or unclear.
Demographics of participants and other study aspects Table 1 shows demographic information for study participants, as well as duration of monitoring period and validity of health literacy instruments used (if any). Low income and ethnic minority individuals were mostly over-represented, which is probably desirable because they are especially vulnerable both for poorer health outcomes and for likelihood that low health literacy will increase risk of poor health outcomes [4]. This representation pattern may reflect priorities of the funding bodies (mostly government institutions and charities). The percentage of participants with low educational attainment, and the percentage with low or inadequate HL as calculated by formal instruments, is about average for the USA and Australia [26,27]. It is not surprising that participants were mostly female and mostly middle aged or older (this group tends to be strongly represented in educational interventions) [18]. Targeting people age 50 + is desirable because health literacy tends to decline with age and people age 65+ are a distinct at-risk group for low health literacy [16].

Methodological characteristics/Trial quality
Many (65 %) of the RCTs reported power calculations for a target sample size on their primary outcome. However, relatively few (20 % of the total 40 studies) retained enough participants until the final monitoring date so as to be adequately powered to detect a meaningful effect. Therefore, most studies were, in at least some of their results, under-powered, thus creating uncertainty and imprecision in estimates of the intervention's effect. Table 2 summarises the distribution of Risk of Bias decisions for the 40 RCTs. Most (73 %) had low bias for random sequence generation. About half (53 %) had low bias for allocation concealment. About a fifth (23 %) had low performance bias, roughly one third had low detection bias (35 %) and two thirds had low attrition bias (68 %). Unclear reporting was a common problem in these RCTs. Risk of bias was recorded as unclear for 29-41 % of studies in the first four domains but the number of participants in each arm from initiation to final monitoring point was always clearly reported.

Selective reporting bias
Eight studies reported unexpected (secondary) outcomes: that is, those not specified in the methods section: three were on knowledge and use of the intervention materials [28][29][30], while significant results for knowledge, different HL levels, and use of the intervention were reported in three studies [31][32][33]. One report highlighted significant interactions between patient performance on hypertension knowledge and the study interviewer [34]. Another found demographic factors such as income, HL age and ethnicity were significant or almost significant predictors of (medication adherence) error rate [35]. Protocols were available for only six of the 40 included studies [36][37][38][39][40][41]. Of these, only three final trial reports deviated from protocol (arguably minor deviations). Duncan et al. [37] omitted a secondary outcome (weight change), Freed et al. [38] changed primary outcome from "comprehension" to "word recognition", while Gulliver et al. [39] reported on stigma rather than "attitude" as described in the protocol. Table 3 describes the outcomes reported by the trials, outcomes were grouped together where very similar. In Table 3, the numbers to the left indicate how many studies reported at least one intervention benefit in K, B or H areas (statistically significant at p ≤ 0.05). The largest groups of outcomes related to mental health issues (mostly depression), nutrition, diabetes self-management and medication adherence.

Evidence of efficacy
The underlying heterogeneity of the specific outcome measures and the small pool of articles addressing each outcome make formal meta-analysis inappropriate but some observations may be made about groups of related outcomes in Table 3. Measurement of the intervention's effects on knowledge was far more frequent than behaviour change or improved health indicators. For instance, aspects of medication adherence were a common theme. All studies to improve medication knowledge reported success, but only 2 of 8 studies that attempted to improve adherence behaviour were successful, and clinically important medication errors were not avoided in another study. Similarly, diabetes knowledge or literacy improved in 4/4 studies, but HbA1C measures in blood improved in none of three studies. An exception to this trend is with regard to attitudes towards mental illness (especially depression). Depression knowledge and literacy improved in most interventions (4/5 trials) while less stigma and increased willingness to seek or offer help for people suffering depression was also often reported in the intervention group (which we deemed a behavioural change, reported in 5/6 studies). Such a readiness to change or other improvement in aspects of self-efficacy was not as common for other health topics. Table 4 summaries how many distinct outcomes each RCT reported (within the KBH categories) that were statistically significant (better) for intervention over control   40) had no health outcome differences between the intervention and control arm. Some studies reported up to 3 benefits in knowledge [25,39,[42][43][44] or behaviour [45][46][47] outcomes, however. There were 31 trials that specified at least one primary outcome, of which statistically significant effects of the intervention were reported in 17.
Most RCTs (29/40 = 72.5 %) reported improvements in at least one of the KBH areas. Conversely, 27.5 % (11/ 40) of studies recorded no significant between group differences for any of the KBH outcomes, at the final monitoring point.

Follow-up analyses
Sixteen studies (40 %) had no follow-up analysis (monitoring of impacts after day of intervention end). Nine studies (22.5 %) had follow-up that was 1-4 weeks after the intervention finished, and nine further trials followed up between 8 weeks and 6-months post-intervention. Six trials (15 %) had followup > 6 months (up to 12 months). With regard to trial quality, it is noticeable that the six trials that monitored for > 6 months all Recall of healthy lifestyle advice [65] 0 Attempts to comply with multifactor health lifestyle advice [65] 0 Folate B12 and homocysteine concentration in blood [48] 0 Reduced smoking [28] 0 Reduced alcohol consumption [32] 0 Appointment keeping [63] 1 Home safety actions [46] #SS = Number of studies that reported at least one intervention benefit (statistically significant at p ≤ 0.05) for stated outcome reported at least one statistically significant outcome in the KBH areas. Length of follow-up period otherwise appeared to have no association with KBH results. Some RCTs reported transient benefits soon after the intervention, but between-group differences were not found at the final monitoring date [29,31,33,45,48].

Discussion
We found areas of strengths (good randomization and allocation concealment), but areas of weakness regarding blinding of participants, people delivering the intervention and outcomes assessors. Substantial attrition (losses by monitoring time point) was seen in a third of RCTs. This creates difficulty in interpreting findings of lack of benefit, because it may be due to inadequate power, rather genuine absence of efficacy. These important limitations undermine the validity of actual recent RCTs in health literacy. Blinding to prevent performance and detection bias in educational or behaviour-related RCTs is difficult but not impossible. Attrition bias is much harder for trialists to control. Most of our reviewed RCTs had recruitment targets guided by formal power calculations, but most of these studies also failed to retain as many participants (to the final monitoring time point) as their power calculations required. It was not clear from these RCTs if there were commonly avoidable reasons for attrition rates. As a short term solution, it may be best that recruitment targets are raised for health literacy RCTs in order to ensure adequately powered results. In general, more research is needed about how to ensure high recruitment to RCTs [49].
Evidence of reporting bias was not generally found, but this may reflect post-hoc addition or omission of intended study outcomes. Also, some studies only achieved statistical significance for outcomes by looking for improvements within subgroups, particularly individuals with the lowest levels of initial health literacy [28,29,43]. Subgroup analysis is helpful scientifically and justified given the potentially diverse educational needs of different population groups, but such findings need to be cautiously interpreted. 45 % of RCTs (14 of the 31 studies that specified primary outcomes) found no intervention benefits in the primary outcome(s) at their final monitoring time point.
We are concerned by the predominance of short-term, knowledge based outcomes in health literacy RCTs. Only 15 % of our included RCTs followed up beyond 6 months. The short time scale of many RCTs does not inform which strategies are most sustainable, and therefore may be most cost-effective and ultimately lead to maximum patient benefits. Hence it is perhaps not surprisingly that our included health literacy RCTs had modest success at changing clinical outcomes (= tangible improvement in actual patient health). Ioannidis [50] concluded that too many RCTs are problematic in execution, stating that many "…simply represent wasted effort because the questions they ask and the comparisons and outcomes they choose to study are clinically irrelevant. Looking at the many thousands of clinical trials launched annually, this irrelevance may be actually the biggest source of waste in randomized controlled trials…" Ioannidis stated that it was regrettable that few trials in the published literature are guided from inception by patient-centred outcomes [50], even though it has been accepted for some time that an evidenced-based approach to patient care should be informed by "what is meaningful and valuable to the individual patient" [51]. Instead, research tends to focus on academic and clinical researcher-preferenced knowledge or skills tests, ignoring the social, cultural and economic contexts in which patients live. Health literacy trials frequently focus on the individual as problematic rather than the larger context and the demands placed on patients by complex modern healthcare systems [52]. Detailed, well thoughtout patient participatory process evaluations within trials are rare, which means that patient experience and understanding of important factors such as patientclinician relationships are missed [53]. Health promotion efforts, including health literacy RCTs, are prone to an underlying bias that if people are told and trained what to do, they will both do it and become healthier, when in reality the relationship between knowledge, behaviour and health outcome is complex and highly personal [54,55]. A more sustainable strategy to effective promotion of healthy behaviour (including design of RCTs designed to address health literacy deficits) may be to involve patients in intervention design and implementation, as a form of patient-centred care [51] that considers many contextual aspects of barriers to knowledge, healthier behaviour choices & skills acquisition. Otherwise, it seems likely that without the engagement of the patient and their family, especially when increasing multimorbidity is involved, such trials will fail to reflect the reality of managing health for the individual, and weak or no positive outcomes beyond the duration of the trial are likely to follow.

Limitations of our review
We included relatively recent trials (published 2010-2014) to provide a snapshot of what is actually happening rather than create a historical perspective. Nevertheless, poor quality reporting meant that in many instances, we had to record Risk of bias (RoB) domains as unclear. Such research is obviously still developing and it may be that the quality of HL research has not yet advanced to the quality standard that Cochrane RoB tools demand, or perhaps that Cochrane RoB tools are not fully suited to these types of complex or educational interventions.
We report only on published literature; there may well be unpublished trials that are of different quality. It is widely recognised that trials without significant results are less likely to be published [50], and that over-representation of successful studies undermines the reliability of scientific conclusions [56]. Almost all of the literature we found was from English-speaking countries, particularly from the USA, and to some extent the format, priorities and successes of these interventions must be biased by cultural influences.

Conclusions
The implications of our review are that while trials in HL are of growing interest, they may pose difficult methodological challenges because of the nature of the topic and its interventions. Our methodological evaluation shows that health literacy trial design, conduct and reporting could be considerably improved. To support such development, trialists can refer to many existing guidelines on good methodological practice for implementing and reporting RCTs: eg CONSORT 2010 statement [57] or COMET initiative [58].
Assessing quality of evidence (as we have done) is an essential pre-requisite before selecting and implementing interventions for patient benefit [59]. We can also recommend: Health literacy trials should be informed by inclusion of patient-centred health outcomes at their inception, design, delivery and evaluation stages. Without this, any findings have the potential to be meaningless.

Additional file
Additional file 1: Table S1. Search phrases and results. Item S2: Data extraction form.