Scoping review and bibliometric analysis of Big Data applications for Medication adherence: an explorative methodological study to enhance consistency in literature

Background Medication adherence has been studied in different settings, with different approaches, and applying different methodologies. Nevertheless, our knowledge and efficacy are quite limited in terms of measuring and evaluating all the variables and components that affect the management of medication adherence regimes as a complex phenomenon. The study aim is mapping the state-of-the-art of medication adherence measurement and assessment methods applied in chronic conditions. Specifically, we are interested in what methods and assessment procedures are currently used to tackle medication adherence. We explore whether Big Data techniques are adopted to improve decision-making procedures regarding patients’ adherence, and the possible role of digital technologies in supporting interventions for improving patient adherence and avoiding waste or harm. Methods A scoping literature review and bibliometric analysis were used. Arksey and O’Malley’s framework was adopted to scope the review process, and a bibliometric analysis was applied to observe the evolution of the scientific literature and identify specific characteristics of the related knowledge domain. Results A total of 533 articles were retrieved from the Scopus academic database and selected for the bibliometric analysis. Sixty-one studies were identified and included in the final analysis. The Morisky medication adherence scale (36%) was the most frequently adopted baseline measurement tool, and cardiovascular/hypertension disease, the most investigated illness (38%). Heterogeneous findings emerged from the types of study design and the statistical methodologies used to assess and compare the results. Conclusions Our findings reveal a lack of Big Data applications currently deployed to address or measure medication adherence in chronic conditions. Our study proposes a general framework to select the methods, measurements and the corpus of variables in which the treatment regime can be analyzed.


Background
Medication non-adherence (MnA) in chronic diseases is one of the most complex issues in Public Health. As highlighted by the World Health Organization (WHO) [1] in 2003, MnA and in general a low degree of adherence to treatment regimes as prescribed lead to poor health outcomes [2,3] and overall increasing health-care costs [4,5].
Although MnA is one of the most significant public health issues [6], there are still inconsistent healthcare outcomes and methodological measurements [7], and there is no widely-accepted agreement on its definition [8,9].
A recent proposal [10] defined medication adherence as the process by which patients take their medication as prescribed, and identified three quantifiable phases, intention, implementation and discontinuation, as consistent, measurable and quantifiable steps to analyze the adherence-related management [11].
In our definition of MnA we also include persistence, i.e. the length of time between initiation and the last dose prescribed [10].
Finding a gold standard to measure the rate of nonadherence or simply a common approach to address the problem is difficult [12]. MnA is a complex issue because diseases differ widely and patients and their drug interaction may react differently. The issue is further compounded by differing beliefs [13], social support and socio-economic status. Moreover, all these variables interact at different levels [14] to determine the magnitude of medication adherence [15].
Different approaches, measurement types and methods have thus been used to explore MnA [16], though this remains an unclear and deeply fragmented field of study [17].
Progress in medical technology has created new solutions to tackle complex issues, as well as new tools and policy strategies to deliver better healthcare services [18] and reduce or make more efficient the overall costs for the National Health Service (NHS) [19]. The proliferation of studies and applications on health information technologies have boosted enormously the possibility to create, understand and obtain valuable information from different sources. Data are generated from medical devices, hospital databases, laboratories and the like, but also from insurance claims, smart-phones, social media, and even more from real-time geolocated sources such as wireless technologies, wearables, and GPSs [20].
This has made the concept of Big Data one of the most promising fields of application in the healthcare domain [21]. Big Data analytics [22] refers to a new generation of technologies and architectures designed to extract value from a very large volume (i.e. petabytes) of a wide variety of data, by enabling high-velocity streaming data (3 V's) across a wide range of sources. These three characteristics are the common elements that currently enable data-intensive technology analysis.
The application of such promising technologies has led to the development of new and effective applications in the healthcare sector [23], although most of the research tends to focus on technical issues [24].
Despite the technologies already in use, little has been done to develop methods, techniques or strategies to apply data intensive analysis to investigate medication adherence issues, particularly in terms of providing longterm strategies for medication adherence. In addition, all the determinants (factors) that impact medication adherence need to be considered, especially when these factors, and all the related variables, are modifiable risk factors and sources of information. Once these factors and variables have been recognized they can then be analyzed to determine the magnitude of the MnA, or at least to define a specific strategy to tackle it. Although strategies that define and explore types of medication adherence, measurements and barriers, are quite common, what is extremely important is the ability to engage and maintain a patient's persistence for the entire period of the treatment. The ability to improve patients' selfmanagement capabilities and to encourage any lifestyle changes (behaviors) entails a deep understanding of the determinants (factors) that drive patients' adherence. However there are inconsistent findings on the important role played by medication adherence factors, and the dynamics between the factors, methods and type of measurements have been poorly investigated, and some of its key components have been neglected.
This paper presents the scoping review results of the methods and measurements adopted for medication adherence management in chronic disease. We highlight to what extent Big Data techniques have been deployed to improve or detect evidence-based results for medical decision-making.
1In doing so, we used the taxonomy of Big Data techniques in healthcare developed by Alonso et al. [25], to compare whether Big Data techniques are deployed in medication adherence measurements for chronic diseases.

Method
We followed the PRISMA Extension for Scoping Reviews (PRISMA-ScR) [26]. We exploited a two steps process to cover all the aspects of our research objectives. See Fig. 1.
First step: Perform a Scoping Review adopting the Arksey and O'Malley framework [27].
Second step: A bibliometric analysis from the retrieved scoping review results, to analyze the knowledge domains, and possible future research trends.
A scoping review is a relatively new research methodology, particularly effective for summarizing and covering broad research topics, comprising a high number of previous studies, publications, methods, theories, or evidence [28].
Most importantly, a scoping review can pinpoint research gaps without losing research robustness and rigorous quality assessments [29]. Arksey and O'Malley [28] outlined the fundamental five steps for a scoping review: (1) identify the research question and operationalize the definitions, (2) identify relevant studies through electronic databases and reference lists, (3) establish inclusion-exclusion criteria for the selection of studies, (4) chart the data through a narrative review,11 and (5) analyze, summarize, and report the results.
To give a more comprehensive and complete overview of the extent and complexity of the topic of medication adherence, we decided to add, as a second step in our research study, a bibliometric mapping analysis [30,31] using two software programs: VOSviewer [32] and the bibliometrix package [33] in R statistical software. We used a bibliometric analysis to measure and visualize the influence of the scoping review results in the scientific community and VOSviewer explore keyword trends and the related concepts, as well as collaboration network maps.
Step 1: scoping review 11The aims of this scoping review goals are to: (i) draw up a framework of the types of measurements, study design and methods that were used for medication adherence; (ii) determine whether, and if so which, Big Data techniques lend themselves to improving the decision-making procedures regarding patients' adherence; (iii) explore the research opportunities and strategies aimed at tackling medication adherence using a data-driven perspective.

Stage1.1) development of the research questions
Rq1: What kinds of methods, measurements and approaches are applied to assess medication adherence in chronic diseases?
Rq2: Considering the Big Data techniques used in healthcare [25], are there any data mining methods or techniques used in medication adherence measurement?
Rq3: Are any new data-driven technologies or methods applied in relation to medication adherence?

Stage 1.2) framework stage: identification of electronic databases and relevant studies
This study uses articles retrieved from the Scopus electronic database [34], which offers a wider range of journals compared with PubMed and Web of Science [35], and its citation analysis is faster and includes more information [36]. This citation property is a crucial component for our research investigation that influenced many of our decisions regarding data sources.
At this stage, we defined the eligibility criteria (inclusion and exclusion) and defined the searching strategy and keywords used to retrieve the articles. See Table 1.  All the retrieved articles were exported into Mendeley, the software used to organize, select and check the articles.

Stage 1.3) screening and selection of publications
We iteratively developed an extensive list of primary and secondary key terms, connected in a Boolean logic and filtering method in order to cover as many research articles as possible linked to the scope of the study. The primary search terms focused on the most common terms in the literature on medication and drug regimes, reflecting the core concept of medication adherence (adherence, compliance and concordance). The secondary type of key terms included a broader set of keywords related to factors, variables, datasets and methodologies applied to obtain specific results on those elements in the literature. A final set of keywords was related to the chronic conditions. The filtering methods included the date range (within the last 5 years), and articles written only in English. See Table 2 for details.

Results
The initial search results from the Scopus database yielded 533 articles. After an initial screening of the titles and abstracts, 285 articles were excluded either because they did not comply with the inclusion criteria and/or because they did not fit in with our study goals. A fulltext screening was then conducted on 248 articles and 187 studies were excluded for reasons following the exclusion/inclusion criteria.
A total number of 61 articles met all the criteria identified. Using Mendeley, two reviewers first screened the titles and abstracts for eligibility before reading the full texts. In the second part, the reviewers thoroughly examined the full text of all the potentially eligible articles to confirm whether or not they should be included. Disagreement was addressed by consensus after discussion, and a third reviewer was consulted if no consensus was reached. We used the PRISMA flow diagram for the selection flow [38], see Fig. 2.

Stage 1.4) data charting
In this section and in the following one, according to Arkesy and O'Malley's Scoping review framework, we present a synthesis and the composition of the 61 articles selected. Although there was a degree of heterogeneity in the literature retrieved, most of the characteristics selected in the inclusion and exclusion criteria enable the results to be categorized in a detailed and homogenous manner. The following Table 3 categorizes the articles based on authors and title, year of publication, country (selected by the first author's location), main objective of the study, key findings identified, methods, study duration, and population considered. Stage 1.5) analyzing data, summarizing and reporting the results Table 3 reports the key characteristics of the selected articles including important aspects of the study design, and the types of measurements and methods applied and carried out in the articles. This provides a picture of the methods and approaches applied in published papers in order to understand and measure medication adherence in real settings. Table 4 reports the study design applied, the types of methods used to analyze the variables, the types of measurements used to assess medication adherence, and the list of the major chronic diseases in which they are deployed.
As reported by Grimes and Schults [39], a description of clinical study designs can be used to categorize the types of evidence produced. Randomized controlled trials (RCTs) are the most appropriate in determining the causal effects and in reducing the likelihood of potential confounding results.
RCTs often do not comprise enough breadth and deep insights commensurate to the complexity of the diseases or to the degree of personalization of treatment needed.
Big Data analytics can fill these knowledge gaps between controlled clinical trials results and clinical practice needs, by collecting data from different sources, and by adopting machine learning techniques able to augment data monitoring and real-world data collection. Big data can provide new insights into disease patterns and help to improve the safety and effectiveness of RCT design [41].
Tapping into these rich resources of real-world data issued through daily clinical practice or collected on a regular basis by hospitals, public bodies or through smart devices, mobile applications, should boost both the output and relevance of controlled clinical research results [42]. However, there are barriers due to regulatory, ethical, and data aspects, as well as the costs of setting up the routine data collection infrastructure [43]. For instance, data management and data linkage might be quite complex to organize and maintain, requiring significant planning and software development. Data accuracy and noisy data are the most challenging tasks to deal with.
Regulatory and ethical aspects pose a major obstacle to the safe exchange and sharing of health records if not well prepared and organized. There are no definitive practical solutions to preserve privacy and to meet the current demand for intensive data-drive solutions. Data discrimination and data breaches are the key factors to avoid when developing a valuable strategy for Big Data          analytics implementation in healthcare programs and services [44].
On the other hand, observational methods such as the commonly-used cross-sectional studies [45], require less organizational efforts than experimental approaches like RCTs, and provide information on the presence or absence of the exposure in a specific period and act like a time snapshot of the prevalence of an illness in the population under investigation.
It is not the goal of this study to present a list of the best solutions to assess medication adherence in terms of quality or robustness of methods. However, taking into consideration the types of research design that emerged in our scoping review, there is an interesting balance in the study design types between experimental and observational studies. This is quite crucial in understanding adherence phenomenon, because even though RCTs are the gold standard for causal inference, medication adherence is still a patient's subjective choice and cannot be randomly assigned [46]. Therefore, also observational studies are key to exploring such complex phenomenon.
Looking at the types of models used to analyze and evaluate the effect of the variables on medication adherence, the regression model clearly dominates in the versions of both linear regression model (26%), and multivariate logistic regression (26%). Choosing the most appropriate model to analyse medication adherence is a critical decision, full of uncertainty and with little consensus regarding a standard method to operationalize this measure [47].
Although our aim is not to judge the quality or the appropriateness of the models used to analyse medication adherence we found that the most used model was the regression model whose main purpose is to assess whether or not the independent variable influences the dependent variable [48].
The regression model investigates the relationships between variables as well as the explanatory mechanism underlying the phenomenon under investigation. However, without a strong theory (or model) in which the relationship between variables and determinants is defined, no meaningful decision or result can be made regarding the analyses carried out [49]. Most of the methods highlighted by the results are guided by regression assumptions rather than by a data-driven approach.
Despite the complexity of medication adherence phenomena, we found a lack of studies considering a multivariate approach and time-dependent analyses. For example, latent-group or latent-trajectory analyses and similar methods, sometimes coupled with other methods, seem particularly attractive for studying medication adherence. However, our review found none of these methods in the studies selected.
The capacity of IT analysis in terms of both data storage and processing is immense, however a more effective approach for analyzing data related to medication adherence is needed in order to improve our understanding of indicators or proxies. In particular, techniques and methods to use and to profile adherence behavior over time and among population groups (identified, for instance, by clinical characteristics, socio-demographic data, therapy characteristics) are needed in order to identify potential population risks and behaviors and to establish appropriate methods to assess medication adherence.
Step 2: the bibliometric analysis Using R with the package bibliometrix [33], and VOSviewer [32], we analyzed 61 articles retrieved from  Scopus. Data were analyzed in terms of document statistics, collaboration index, journal impact, country productivity, document citation analysis, and key words. The main goal was to cover the relationships, connections and clusters of scientific production in medication adherence and the use of Big Data. This kind of analysis can map and identify the hidden connections in a vast bibliography [50] and most importantly, helps to summarize the fragmented research topic of medication adherence.
Below is a series of tables that summarize the descriptive analysis of these data.
As reported in Table 5 there were a total of 360 authors, with 5.9 authors per document. Over a five-year period an collaboration index [51] of 5.9 represents a significant collaboration score that involve the topic of medication adherence. Table 6 summarize the main impact represented by the h index score [52], the journal impact factor [53], and the total citations for the top ten journals and authors. The results presented in this section refer to 2014 to 2018.
Almost all of the journals publishing medication adherence-related papers fall into the specific areas of healthcare. No links are evident with the other main research areas covered by journals (e.g., social science, information technology, management, and economic research).

Mapping visualization
The 61 articles were visualized using VOSviewer to identify the most important and interesting research areas, aimed at automatically identifying the characteristics and dimensions of the country collaboration map, the document co-citation network, and the keyword trend analysis.
Using clustering techniques [54,55] the interactions between the selected items can be explored along with how they have shaped the literature, in order to map the scientific knowledge domain and reveal new emerging concepts.
A total of 24 countries with more than two publications were identified in the 61 articles, see The issue of medication adherence is attracting attention globally due to its negative effects on health outcomes, and also due to its negative impact on NHS performance in terms of costs. Clearly, there are differences in socio-economic factors, healthcare systems and specific geographical areas which influence the overall effects of the treatment regime implementation. Assumptions should not be made in terms of quality or by comparing different NHS performances in too much depth. Nevertheless, variables and factors connected to the socio-economic and the healthcare architecture certainly impact greatly on the level of medication adherence. Having a public or a private NHS system, or drug insurance or easy access to treatment, make an important difference in terms of medication adherence rate [56, [58] investigated the association between plan-level measures of health outcomes and medication adherence to assess the viability of adherence as a measure of plan performance, finding that plan-level averages of medication adherence were associated with lower rates of diseaserelated complications. Another example is Reethi N. Iyengar et al. (2015), who investigated how the dispensing channel impacts on adherence to medications using pharmacy claim data from a large national pharmacy Medicare Part D insurance plan. An enormous database, different sources of information and the related variables-factors specifically associated with the country's NHS architecture play an important role in the overall dynamics of medication adherence. Table 7 shows the top 5 documents in terms of citations and link strength in the 61 papers between 2014 and 2018.
Bibliographic coupling deploys a similarity measure, using citation analysis to establish a similarity  Fig. 4 shows, eight clusters were identified, with the most cited article being Rajpura J. et al [59] (42 citations and TLS = 7.00). The co-occurrence analysis was adopted to investigate the popular areas and directions of research and was key to monitoring developments in scientific areas and other disciplines. The keywords (used more than 6 times) used in titles and abstracts among the 61 publications were analyzed via VOS viewer to investigate how concepts and topics have evolved. Figure 5 reports the 71 keywords identified, grouped into approximately three clusters: "Perceptions" (yellow), interventions (green) and Preferences and needs (red cluster) (Fig. 5(a)). In the perceptions cluster, the frequently used keywords were distress, life, health-related quality of life (HRQoL), and religious coping. For interventions, the keywords were non-adherence, risk, medication possession ration (mpr), proportion of day covered (pdc). In Preferences and needs, which was the largest cluster, keywords were belief, health literacy, habit strength, illness perception, and relationship.
Co-occurrence clustering of keywords was also analyzed by color view map, based on the mean times they appeared (from 2014 to 2018) in all included     publications. Blue indicates that the keywords appeared earlier, and yellow later. As reported in Fig. 5 (b) some keywords such as adherence (or non-adherence), medicine, and habit strength underline how the use of these keywords has incorporated the shifting of concepts and definitions. For instance, adherence is sometimes used to define the drug regime therapy rather than using compliance or persistence. Two keywords -habit strength and belief -underline how important it is to take into consideration the personality and behaviours of the patients in properly managing their drug regime. The size of keywords is proportional to the occurrence rate, and in our case beliefs, relationship and effect were the highest, underlining the importance of a patient's personality in handling medication adherence.

Limitations
By design, the scoping review framework does not address specific research questions in a narrow area. In fact, scoping research is meant to address broader exploratory research questions aimed at mapping key concepts, types of evidence, and gaps in the related research domain, selecting and synthesizing existing knowledge, rather than systematically searching and assessing as would take place in a systematic review. By using just one database (Scopus) as a source of literature, part of the existing knowledge was not included.
Although a rigorous and detailed selection process was adopted, the amount and heterogeneity of literature on the medical treatment regime (e.g. medication adherence, compliance, life-style recommendations) is vast, and identifying common selection criteria is complex. This means that some subjective judgments on the inclusion-exclusion criteria selection were made, thereby obtaining a more comprehensive and homogenous process.
However, bibliometric analysis can contribute to better understanding and investigating a bottleneck in the literature research methodology, which is the difficulty of summarizing in a clear and accessible way the vast amount of literature with common key elements (e.g., subject category, topics, methodology applied) [60].

Discussion
The results of this scoping review underline how heterogeneous and complex the issue of medication adherence is. A common and robust strategy to tackle this challenge entails devising a more evidence-based and shared approach to improve measurement consistency and appropriate cut-off points to facilitate comparisons among studies [61]. Our review highlighted the lack of a common definition of medication adherence, and the lack of standard method used to measure it. This lack of standardized methods and guidelines impedes on the sharing of good practices and thus the ability to improve the quality of healthcare services, with cost-effective care, and dealing with professional pharmacy services without losing the typical socio-economic organizational characteristics of healthcare entities [62].
Big data enable organizations to analyse massive data sets from a wide range of sources to support evidencebased decision making, through predictive models, and statistical algorithms powered by high-performance systems [63]. Such analyses would enable healthcare organizations to turn information into knowledge by using a combination of existing and new approaches powered by the huge amounts of data generated [64].
However, we found that no specific technology (Big Data) or data-driven solutions are currently in place that offer sufficient accuracy or methodological strength to assess medication adherence.
One of the objectives of this study was to depart from the traditional way of analyzing the literature. We used network analysis techniques to visualize and detect trends and patterns produced in a robust and replicable way. The year-by-year evolution of intellectual and scientific knowledge [65] cannot easily be identified, especially when the number of sources is very high and the concepts so fragmented. We also believe that it is extremely useful to investigate and measure different indicators, such as cluster analyses, keyword trends and other bibliometric measures in order to gather information from unexplored metrics that can offer useful insights for the research topic under investigation.
Our results highlighted that no Big Data (or data mining methods) are currently deployed in medication adherence for chronic conditions, despite its acknowledged beneficial adoption in the healthcare sector. Our bibliometric analysis, and in particular the keyword analysis (Fig. 4), underlined the importance of patients' preferences, beliefs and habits. These important subjective patient aspects were also deeply investigated by Rajpura J. (2014) [59] and Shallcross AJ. (2015) [66], two of the most-cited documents retrieved. This thus underlines how the influence of illness perception, beliefs and psychosocial factors associated with medication adherence are a major area that can offer further insight supported by powerful Big Data tools.
The literature on medication adherence is widespread and vast, comprising an interdisciplinary approach, and characterized by different research designs and methods for knowledge production. However, it is not easy to obtain a clear overall path of the trends and theoretical approaches in which Big Data analytics can produce rapid and worthwhile results. This is probably because the multifactorial and multivariable aspects that define, make up and influence medication adherence phenomena are still unknown. We believe that developing specific Big Data applications around the patient's beliefs/ preferences would provide valuable insights, new solutions and better clinical feedback.

Conclusions
We have provided a literature scoping review on the methods, measurements and research design factors affecting medication adherence in chronic disease, exploiting Big Data analysis to improve the clinical decision-making process. We then used the studies selected to develop a literature-driven analysis using a bibliometric methodology to map and identify various future research directions and trends that could provide valuable insights.
Our results show that methods are being implemented to approach medication adherence with Big Data analysis. Embracing a more persuasive policy plan and standardized taxonomy to tackle adherence is needed to make progress in this field, which remains at the forefront of the public health burden.
In addition, the standardized adoption of data knowledge of patients' beliefs and preferences is needed in order to involve and engage patients in long-term treatment and to understand how their personality impacts on how long they adhere to medical treatment.
Despite the study's limitations, to the best of our knowledge our scoping review and bibliometric analysis is the first study to combine these two types of methodologies. It thus provides a) a comprehensive understanding of the hotspots and research fronts of medication adherence measurements and methodology assessments; b) a taxonomy of study design, types of measurements, types of methods and variables adopted in the literature retrieved. These could be exploited as a starting point for more precise and tailored evidence-based assessment strategies regarding chronic diseases for medication adherence, which could lead to a more robust application of Big Data analytics.