Evaluation of bottom-up interventions targeting community-dwelling frail older people in Belgium: methodological challenges and lessons for future comparative effectiveness studies

Background Optimizing the organization of care for community-dwelling frail older people is an important issue in many Western countries. In Belgium, a series of complex, innovative, bottom-up interventions was recently designed and implemented to help frail older people live at home longer. As the effectiveness of these interventions may vary between different population groups according to their long-term care needs, they must be evaluated by comparison with a control group that has similar needs. Methods The goal was to identify target groups for these interventions and to establish control groups with similar needs and to explore, per group, the extent to which the utilization of long-term care is matched to needs. We merged two databases: a clinical prospective database and the routine administrative database for healthcare reimbursements. Through Principal Component Analysis followed by Clustering, the intervention group was first stratified into disability profiles. Per profile, comparable control groups for clinical variables were established, based on propensity scores. Using chi-squared tests and logistic regression analysis, long-term care utilization at baseline was then compared per profile and group studied. Results Stratification highlighted five disability profiles: people with low-level limitations; people with limitations in instrumental activities of daily life and low-level of cognitive impairment; people with functional limitations; people with functional and cognitive impairments; and people with functional, cognitive, and behavioral problems. These profiles made it possible to identify long-term care needs. For instance, at baseline, those who needed more assistance with hygiene tasks also received more personal nursing care (P < 0.05). However, there were some important discrepancies between the need for long-term care and its utilization: while 21% of patients who were totally dependent for hygiene tasks received no personal nursing care, personal nursing care was received by 33% of patients who could perform hygiene tasks. Conclusions The disability profiles provide information on long-term care needs but not on the extent to which those needs are met. To assess the effectiveness of interventions, controls at baseline should have similar disability profiles and comparable long-term care utilization. To allow for large comparative effectiveness studies, these dimensions should ideally be available in routine databases. Electronic supplementary material The online version of this article (10.1186/s12913-019-4240-9) contains supplementary material, which is available to authorized users.


Background
The health status of older people is often characterized by the interplay between frailty, multi-morbidity, and disability. Frail older people are usually more vulnerable to stressors as a consequence of a significant reduction of physiological reserves [1,2]. Multi-morbidity is defined by the presence of two or more chronic conditions [3]. The World Health Organization (WHO) defines disability as "an umbrella term for impairments, activity limitations or participation restrictions" [4]; the last-named refers to difficulties in carrying out essential tasks for independent living and is assessed by the activities of daily life (ADL) and instrumental activities of daily life (IADL) scales [5]. However, it is worth noting that disability is strongly influenced by cognitive status [6,7].
These three distinct concepts are strong determinants of service needs and utilization, and more specifically of home care support [5]. Frailty is associated with a higher utilization of both health services-including the primary care providers, clinical specialist physicians, hospital admissions, and emergency department visits-and home care services such as nursing care, home help, and mealson-wheels [8]. People with multi-morbidity consult specialists and general practitioners more often and are more frequently admitted to hospital, where they have longer stays, on average, than people without multi-morbidity [3]. Multi-morbidity is also associated with long-term care needs. Indeed, long-term care needs are high in people suffering from neurological disorders such as dementia and Parkinson's disease, as well as in people who have had a stroke or suffer from diabetes or oncological conditions [9]. However, the level of disability is the foremost determinant of long-term care expenditure. The level of disability has different repercussions on long-term care needs [10] and determines the necessity to start using home care and to move from home care to a nursing home [11].
Finally, health services utilization is uneven among people with similar needs. Two situations of mismatch between needs and services utilization may be at play. First, the overuse of healthcare services that are unlikely to improve or may even negatively affect people's health, what are called "lower-value services" [12]; and second, the underuse of health and social services that would improve people's health. As a consequence, patterns of health service utilization should be analyzed in the light of service (mis)use.
In Belgium, the number of older persons and the number of frail or dependent older persons with multimorbidity will rise in the coming years [13]. An increasing number of dependent people are expected to remain at home, because of the expected shortage of beds in nursing homes [14]. The majority of older people also prefer to remain in their own surroundings, rather than be admitted to a nursing home [14]. Consequently, older people with many more disabilities will live at home, supported by both formal and informal caregivers. Against this background, a nationwide program (Protocol 3 (P3)), which consists of a set of innovative, bottom-up-designed interventions that include case management, occupational therapy at home, psychological support, and respite care, were financed by the National Institute of Health and Disability Insurance and were implemented in the form of pilot projects [15]. Case management intervention consisted of (a) an individualized assessment of needs and preferences, (b) planning of services, according to the results of this assessment, (c) patient-centered care coordination, and (d) re-evaluation and adjustment of care coordination. Occupational therapy provided mainly home adaptation in order to: (1) adapt patients' living environments to their current conditions and (2) offer better work conditions to home caregivers. Psychological support interventions delivered psychological, psychosocial support, and psychotherapy at home, provided by a trained psychologist or psychotherapist. The focus of these heterogeneous interventions was on the support of frail older people at home with the aim of preventing the risk of institutionalization in a nursing home while maintaining a satisfactory quality of life, without increasing the burden on family carers, and fostering a more efficient use of health and social care services [15].
The evaluation of these interventions consisted of describing them, identifying the facilitators and barriers to their implementation, and assessing their impact on clinical outcomes, service utilization, and cost for different sub-populations. Given the heterogeneity of both the interventions and the targeted populations, a comparative effectiveness design and multi-embedded case studies were considered to be more appropriate than a randomized control trial. This paper discusses the methodological challenges of the comparative effectiveness study; the case studies are the focus of another paper [16]. We used an observational longitudinal study design, with routine data and prospective data collection, to compare frail older people receiving care and benefiting from different types of interventions (intervention group) with a group of study participants benefitting from "routine care" (control group).
Two challenges are usually encountered in designing comparative effectiveness studies: first, the relevant stratification of the overall sample so as to identify the most suitable intervention per sub-group of frail older people; second, the definition of an adequate control group in order to evaluate the consequences of the interventions.
The identification of the long-term care needs of frail older people should help in identifying appropriate variables to use in defining population subgroups and creating a control group with characteristics similar to those of the intervention groups. As mentioned above, this can be done by looking at disability, family caregiving, and health services utilization. This paper will therefore propose a systematic methodological approach to addressing the following questions: Is it possible to identify the need for long-term care? Can this lead to a meaningful stratification of frail older people and a clear definition of a control group? To what extent does the utilization of long-term care match the need for long-term care? Is it possible to use a routine database of healthcare reimbursements to select controls with long-term care needs similar to those of the intervention groups?

Overall methodological approach
We developed a step-by-step methodological process combining different approaches. First, we divided the intervention group into intragroup-homogeneous strata to define different disability profiles. Second, we carried out a one-by-one matching of the participants from the intervention group with the control group to account for people with similar long-term care needs. Third, we compared healthcare utilization between the groups studied in order to evaluate the baseline difference in how widely those needs are met (low and high value).

Inclusion criteria
The inclusion criteria for selecting participants in the intervention and the control groups included: [17] being at least 60 years old and scoring at least six on the Edmonton Frail Scale [18]; having a dependency status of A, B, or C on the Katz home scale or a B, C, or Cd status on the Katz residential scale [19]; or having been diagnosed with dementia by a geriatrician, neurologist, or psychiatrist. People in both groups were living at home at baseline.

Characteristics of the databases used
Two databases were available and were merged with national registration numbers for this study.
The first is the BelRAI database which is the Belgian version of an internationally validated geriatric comprehensive assessment, the interRAI Instrument Home Care version (interRAI HC) [20]. This provided prospective data on the main clinical variables. Functional performance in activities of daily life (ADL) was evaluated by four items (personal hygiene, toilet use, locomotion, and eating) with a total score varying from 0 to 6. A total score above 3 (cut-off of ADL scale) indicates major dependency; 6 is the maximum level of dependency [21]. Functional performance in instrumental activities of daily life (IADL) was evaluated by eight items (meal preparation, ordinary housework, managing finances, managing medications, phone use, stairs, shopping, and transportation) with item scores varying from 0 to 6 and total scores varying from 0 to 48 [22]. A total score above 24 (cut-off of IADL scale) indicates major dependency; 48 is the maximum level of dependency. Cognitive status was evaluated by a cognitive performance score (CPS), which consisted of four items (decision-making, short-term memory, procedural memory, and comprehension). The total score ranges from 0 to 6; a total score above 3 (cut-off of CPS scale) indicates major cognitive impairment [23]. The depression rating scale (DRS), with total scores ranging from 0 to 14, was used to evaluate depression symptoms. A total score above 3 (cut-off of DRS scale) indicates significant depressive symptoms; a score of 14 indicates the presence of all mood symptoms in the last 3 days [24]. Behavioral problems (Behav.) were evaluated by the Aggressive Behavior Scale (ABS), which includes five items (wandering, verbal aggression, physical violence, abnormal social behavior, and resistance of care) [25]. The total score varies from 0 to 12 [25]. A total score above 0 indicates behavioral problems (cut-off of Aggressive Behavior Scale); a score of 12 indicates frequent and varied behavioral problems [25]. The level of presence of a family carer, the Zarit Burden interview, and the WHO-QoL-8 were added to interRAI Instrument in the BelRAI database. The level of presence of a family carer was encoded, according to living arrangements, as a categorical variable with three levels (without a family carer, with a non-resident family carer, and with a co-resident family carer). This categorization allows the differentiation of the intensity of informal caregiving provided [26]. The 12-item version of the Zarit Burden interview (ZBI-12) was used to estimate the burden on family carers. This scale contains 12 items, with scores from 0 to 4. The total score ranges from 0 to 48; higher scores are indicative of greater burdens on family carers. A total score above 10 (cut-off of ZBI-12 scale) indicates that a family carer has significant depressive symptoms [27]. The WHO-QoL-8 was used as a concise instrument to measure the client's perceived generic quality of life [28]. Finally, an ad-hoc economic questionnaire was used to measure social care services utilization and informal aid. The interRAI HC instrument was filled out at enrolment, at the exit of the patient from the project, and 6 months after enrolment [15]. Only the baseline evaluation data were used for this paper.
Secondly, the IMA database from the Inter-mutuality Agency provided routine data on the healthcare services reimbursed by the National Insurance for Health and Disability Institute (NIHDI).
These two merged databases were cleaned to contain only complete evaluations at baseline. Data were available for 10,783 beneficiaries of Protocol 3 intervention (intervention group) and 605 older persons benefitting from "usual care", who were recruited at home by nursing care organisations (control group). Of the beneficiaries of Protocol 3 intervention, 1811 lived at home and had no family carer, 5735 had a non-resident family carer, and 3842 had a co-resident family carer. In the control group, 78 lived at home and had no family carer, 268 had a non-resident family carer, and 259 had a coresident family carer.
In addition, the median fiscal income per household by municipality (data from statbel.fgov.be) was used as a proxy for socioeconomic level. Belgian municipalities were classified in three categories by using the first and the third quartiles of the median fiscal income per household. Municipalities included in the first quartile, accordingly, have a low median fiscal income per household. The municipalities included between the first and the third quartile, have a medium median fiscal income per household. And the municipalities in the last quartile have a high median fiscal income per household.

Stratification of the intervention group
In order to identify population groups with specific longterm care needs, we identified "natural" clustering of individuals with a similar disability profile in the intervention group, using statistical analysis to establish classification schemes [29].
The variables used for the establishment of classification schemes were inspired by existing classifications in which clinical scales (ADL, IADL, depression, and cognition) and comorbidity were included [30][31][32]. These classifications highlighted the fact that cognitive and functional limitations fit the different disability profiles better than comorbidities [30]. Indeed, the need for long-term care services is associated with specific limitations [10] (i.e. significant IADL limitations are associated with needing household aid services (domestic help, home care worker, or meals-on wheels, etc.); significant ADL limitations are associated with needing personal nursing care; and significant cognitive impairment is associated with needing supervision [33]). Hence, the comorbidities were not included in our classification. A specific scale for behavioral problems was included in our classification because people with behavioral problems represent a population with specific health and social care needs, including the need for supervision by family carers and the need for respite care for those carers [34]; in addition, this population is likely to make less use of support services [34] and imposes a greater burden on family carers [35].
The "natural" clustering was, consequently, established by combining the different scores on functional limitations, cognitive performance, and the presence of behavioral problems in a Principal Component Analysis (PCA), followed by a cluster analysis (based on the PCA correlation matrix between the initial variables and the principal components) [36]. A hierarchical algorithm (Ward algorithm) was used to define the number of groups by the decomposition of the inertia of the cloud of points while minimizing the loss of information at each additional clustering. The number of groups obtained was the result of a trade-off between intra-group homogeneity and intergroup heterogeneity [36]. This method was employed using the FactoMineR package [37] in R. Disability profiles were presented by means of the descriptive statistics of the BelRAI scales. For each scale, the median and interquartile range, both for the total score and for each item, and the proportion of individuals above the total score cut-off were presented.
One-by-one matching of participants from the intervention group with the control group The control group was constituted by one-by-one matching of P3 beneficiaries with individuals from the control group, based on a similar level of presence of a family carer and maximum similarity in scores on the clinical scales, age, and sex.
The establishment of the control group followed a multi-stage process. The initial step consisted of two successive stratifications. First, intervention and control groups were stratified by the level of presence of a family carer to ensure similarity of the groups studied with regard to this variable. The rationale behind this initial step was twofold. First, older people with co-resident family carers were frailer than both those without family carers and those with non-resident family carers [38]. Second, the level of utilization of health and social services is associated with the level of presence of a family carer [39]. Within the intervention group, each group of participants with a particular level of presence of a family carer was, in turn, stratified by disability profile. Then a model of propensity scores (with binary dependent variable intervention vs control groups) was created for each stratum, following a step-by-step approach. Each step consisted in a forward selection of a variable among those obtained from the BelRAI scales (ADL, IADL, CPS, DRS, presence of behavioral problems) and the predisposing factors. A likelihood ratio test was then performed to determine whether there was a significant difference between the intervention and control groups with regard to the variable added. A new variable that significantly improved the model was retained. The matching function used was the one-to-one matching of the nearest neighbors with replacement to make it easier to find the best match for a P3 beneficiary among all the potential controls. This was mandatory in our case because of the smaller size of the control group relative to the intervention group. The Matching package in R was used for this purpose [40].
Finally, the evaluation of the covariate balance was carried out using the standardized mean difference (smd) and the analogous variance ratio (VR). To evaluate multiple covariates balance, the average standardized mean difference (SMD) and the Geometric Mean Variance Ratio (GMVR) were used [41,42].
Relationship between disability profiles and healthcare utilization Identification and selection of relevant IMA-AIM proxies of chronic disability profiles The selection of proxies of disability was done by experts (geriatrician, liaison nurse, clinical pharmacist) based on two questions: (1) which kinds of healthcare utilization can be identified as signs of a long-term disability in older people? (2) Which kinds of healthcare utilization allow us to distinguish between different disability profiles among older people?
To identify chronic disability status, proxies were defined based on a one-year observation period before inclusion in a P3 project. By way of illustration, medication was considered chronic when a medicine was taken for more than 3 months in a given year. This selection was confirmed by a univariate test assessing whether proxies defined as binary variables were associated with the risk of institutionalization. The latter was strongly associated with cognitive and functional impairment [43]. Therefore, the factors associated with the risk of institutionalization were considered as good proxies for disability status.
Besides the proxies for chronic disability status, multimorbidity is also frequently discussed in the literature [5]. The multi-morbidity variables were created from prescription drug data using the Anatomical Therapeutic Chemical (ATC) classification system [44], which recognizes 22 chronic conditions. The presence of two or more chronic conditions in the same individual indicates multi-morbidity [5].

Comparison of healthcare utilization between different disability profiles within the intervention group
Proportions of healthcare users were presented to describe the healthcare utilization for the different disability profiles and were compared by a chi-squared test.

Comparison of healthcare utilization by groups studied
The difference in proportion of healthcare users between the intervention and the control groups was calculated by a logistic regression for all the proxies, with the intervention group status equaling one and the control group status equaling zero. The odds ratios with their normal bootstrap percentile confidence intervals were adjusted to socioeconomic variables (age, gender, median fiscal income by municipality, and region) and were computed using the Boot package in R [38,45]. The bootstrapping method used to build the confidence intervals withstands the assumption of normality and equality of variances.

Description of the disability profiles of the intervention group
The cluster analysis highlighted five disability profiles ( Table 1). The first disability profile (N = 2040) was called "low-level limitations (low limit.)", and consisted of individuals with no significant functional limitation or cognitive impairment.
The second disability profile (N = 1932) was the "IADL and low level of cognitive impairment (IADL (cogn.)" group. Basically, it consisted of individuals with IADL difficulties, of whom 88% were above the IADL cut-off, with difficulties mainly in preparing meals, performing ordinary housework, managing finances, managing medications, shopping, and transportation. In addition, a majority (55%) of individuals in this group had minor cognitive impairments (CPS score higher than 1), 14% had behavioral problems (mainly wandering), and 32% suffered from depressive symptoms.
The third disability profile (N = 3821), referred to as "functional limitations (func.)", was made up of individuals whose limitations were mostly functional ones. Of these, 92.5% had an IADL score above the cut-off of dependency and 71% had difficulties performing ADL, especially in hygiene tasks. Indeed, up to 64% of individuals in this group needed some help to accomplish 50% of personal hygienerelated tasks.
The fourth disability profile (N = 2316) was called "functional and cognitive impairments (func., cogn.)" and consisted of individuals with both functional limitations and cognitive impairment. Up to 100 and 89% of individuals in this group were above the IADL and ADL cut-offs respectively and 82.5% had CPS scores above the cut-off that defines cognitive impairment. Additionally, over 75% of participants in this category were completely dependent for almost all IADL tasks. The main ADL limitation was related to hygiene tasks, with which 86% of the participants needed extensive assistance. Such difficulties were more pronounced in this group than among those grouped under the "functional limitations" disability profile, probably due to the concurrence of functional limitations with cognitive impairment.
Finally, the fifth disability profile (N = 674), which was named "functional, cognitive, and behavioral problems (func., cogn., behav.)", consisted of individuals who combined functional limitations, cognitive impairment, and behavioral problems. In this group, 97% of study participants were above the IADL cutoff, 72% above the ADL cut-off, and 87% above the CPS cut-off. Half of them had behavioral problems, including wandering and verbal abuse on 2 days out of the three prior to the evaluation. Individuals in this group were similar to those grouped under the "functional and cognitive impairments" disability profile with regard to functional aspects, but they suffered from more serious cognitive problems (mainly in decision-making and comprehension).   Max The total or the items maximum score of the BelRAI scales, Low limit. Low-level limitations, IADL (cogn.) IADL and low level of cognitive impairment, Func Functional limitations, Func., cogn. Functional & cognitive impairments, Func., cogn., behav. Functional, cognitive & behavioural problems; % above the cut-off is the proportion of individuals above the cut-off of the total score of BelRAI scales; the other lines present the median with the interquartile range of the total or of the items scores of the BelRAI scales

Evaluation of the matching balance
The matching balance was respected: all the average standardized mean difference were smaller than 0.25 and all the geometric mean variance ratio were smaller than 2, excepted for strata with a small number of Protocol 3 beneficiaries and a small number of potential controls (e.g., where there were only 29 beneficiaries and one control without informal caregivers and with functional, cognitive, and behavioral problems, or where there were 121 beneficiaries and 9 controls without an informal caregiver and with functional and cognitive impairment) ( Table 2). That can be explained by the limited number of people living alone at home who had significant cognitive impairment.
Relationship between disability profile and healthcare utilization Description of healthcare utilization by disability profile in the intervention group The five disability profiles allow us to identify long-term care needs. However, it was not always clear to what extent the need for healthcare of those with particular disability profiles matched utilization of that care. People who had used nursing services for hygiene tasks (personal nursing care) more than twice a week for more than 3 months in the year preceding inclusion in the study were considered to be chronic users of nursing care for dependency reasons. The greater the need of assistance with hygiene tasks, the more personal nursing care was used. Personal nursing care was given to 33% of people without hygiene task difficulties, to 52% of those in need of occasional assistance, to 71% of those whose need of assistance was judged to be major or maximal, and to 79% of people with total dependency for hygiene tasks (Table 3). People who used physical therapy for dependency reasons had a prescription as part of the treatment of severe pathologies or for 60 physical therapy sessions after experiencing a fall. This need for physical therapy increased significantly with the need for assistance with locomotion: it was seen in 40% of people needing assistance with locomotion as against 18% of those not in need of assistance with locomotion (Table 3). Parkinson's medication intake varied significantly between disability profiles, ranging from 5% for individuals with a low-level limitations profile up to 12% for people with high levels of functional and cognitive impairment ( Table 3). The same variation was seen in those receiving the incontinence lump sum (a fixed payment system for incontinence material [46]), with proportions ranging from 1% among individuals with lowlevel limitations up to 23% for people with high levels of functional and cognitive impairment (Table 3).
Antidepressant intake was reported in 42% of the study participants (Table 4). It was more frequent (53%) among people whose depression rating score was above the cutoff. However, 38% of people who scored lower on the DRS scale took antidepressants (Table 4). Neuroleptic intake was significant among people with cognitive impairments (31% of those with CPS above the cut-off) and was higher among those with cognitive and behavioral problems, being prevalent in 41% of people who had the "functional, cognitive, and behavioral problems" disability profile. However, their use was not limited to people who were cognitively impaired: 15.5% of those without cognitive problems also took neuroleptics (Table 4). Visits to a neuropsychiatrist (at least one visit in the year preceding inclusion in the study) were more frequent among individuals suffering from cognitive problems than among people with depressive symptoms. On the other hand, consultation of a psychiatrist (at least one consultation in the year preceding inclusion) was more common in people with depressive symptoms than in those who were cognitively impaired. Day care center use was less frequent (3% of the total sample), but was higher in people with cognitive impairments (7%) ( Table 4).
Geriatricians were consulted more often by people with cognitive problems, whether these were combined with functional limitations or not. The more complex the disability profile, the more often a neurologist was consulted. For people with similar disability profiles, a neurologist was more frequently consulted by those with cognitive impairment. Neurologists were more frequently visited than geriatricians (13% versus 6% of the total sample) ( Table 5). The association between visits to an emergency department or out-of-hours general practitioner visits (evening, weekend, or public holiday consultations) and the severity of disabilities was unclear. Beneficiaries of Protocol 3 with functional limitations were the biggest users of emergency departments (36.62% used an emergency department at least once a year), followed by beneficiaries with functional and cognitive impairment (32.96%), followed by beneficiaries with functional, cognitive, and behavioral problems (28.45%) and beneficiaries with IADL limitations and a low level of cognitive impairment (28.17%) ( Table 5). Finally, up to 92.5% of people suffered from multimorbidity (Table 5) and the number of morbidities was roughly similar in the different disability profiles, with a median number of comorbidities equal to 4 in all disability profiles.

Disability profiles: adjusted comparison of healthcare utilization between groups studied
In spite of the extensive matching (Table 2), healthcare utilization related to disability profiles was significantly different between the intervention and control groups with respect to several types of healthcare services (Table 6). These differences were observed after adjustment for  The evaluation of the balance of the BelRAI covariables was carried out for each of the propensity score models, that is, for each level of presence of an informal caregiver (without a family carer, with a non-resident family carer, with a co-resident family carer) within different disability profiles (Low limit. Low-level limitations, IADL (cogn.) IADL and low level of cognitive impairment, Func Functional limitations, Func., cogn.
Functional & cognitive impairments, func., cogn., behav. Functional, cognitive & behavioural problems). smd Standardised mean difference, VR Variance ratio, SMD Average standardized mean difference, GMVR Geometric mean variance ratio, N Indicates the number of participants per group studied socioeconomic variables (Additional file 1 presents socioeconomic variables). Participants in the control group received significantly more personal nursing care and physical therapy and more of them received the incontinence lump sum. Parkinson's medication was prescribed more often in the intervention group, except for individuals with functional, cognitive, and behavioral problems, for whom there was no difference. Neuroleptics, a specific medication for cognitive impairments, was used more in the intervention group than in the control group by those who did not have significant cognitive impairments. Individuals with low to severe levels of cognitive impairment used day care centers significantly more often in the intervention group than in the control group. Finally, people in the intervention group consulted more specialists and used the emergency department significantly more than those in the control group, regardless of their disability profile.

Discussion
As argued in the introduction, schematized in Fig. 1, and described in Anderson's socio-behavioral model of health service utilization [47], the need for and the utilization of long-term care depend on predisposing factors associated with the likelihood of needing health services (e.g.. age or gender), on enabling factors (including the availability of health facilities and the capacity to avail of health services), and on health-related factors such as the level of disability [48]. In the context of aging, a crucial enabling factor for utilization of long-term care is social support and especially the living arrangements of a family carer. The living arrangements of a family carer influence both informal and formal care [26], as a co-resident family carer is more involved in help (i.e. intimate care) [49], while a non-resident family carer is more likely to call on health and social care services [39].
As the results showed that older persons with similar disability profiles don't necessary utilize healthcare in similar ways, the coverage of long-term care needs could be described by two distinct pathways when healthcare utilization and disability profiles are combined. First, a virtuous pathway (Vir loop) is present when the needs for health and social care are covered. The coverage of these needs diminishes future needs and lessens the use of services that have low value. For instance, it is known that the use of both a primary physician and primary care services along with a high level of continuity of care are associated with lower rates of emergency department use [50]. Second, a vicious loop is initiated by a lack of coverage of long-term care needs, leading to an overuse of low-value services (unplanned hospitalization, out-ofhours general practitioner visits); this results either in an increase in needs (Vic loop) or a break of the vicious loop when a response is provided to initial needs (dotted arrow). This vicious loop is well documented for emergency department use. People with unmet healthcare needs and without primary and supportive care services have been shown to visit   emergency departments more often. These people are extremely vulnerable and the focus on acute problems in emergency departments and inadequate follow-up can lead to recurring emergency department visits, subsequent hospitalizations, institutionalization, or even death [51]. The potential gain obtained from the coverage of previously uncovered needs by Protocol 3 interventions is not limited to the individual; these interventions can also substantially reduce the pressure on the health system. This has been supported by literature suggesting that a significant proportion of emergency department visits by older people could potentially be prevented by better primary healthcare [52]. It has also been shown that older people take up a higher proportion of emergency department use and of hospital stays [51]. Their emergency department visits are more complex and more likely to result in hospitalization than those of younger people [53].
The consequences of P3 interventions on the individual vicious loop and the systemic vicious loop can be studied adequately, provided that the three essential questions in conducting comparative effectiveness studies, and at the basis of this paper's objective, can be tackled.
We will now review key lessons and elements of responses to these questions in the light of our results.   Is it possible to identify the need for long-term care? Can this lead to a meaningful stratification of frail older people and a clear definition of a control group?
The stratification of the study population is often presented in the literature as a crucial process in order to test the effectiveness of interventions and their adaptation to the specific needs of people within groups [54]. The five disability profiles created in this study identify groups with specific long-term care needs [30]. A control group with similar characteristics in term of disability and predisposing and enabling factors was established. However, the groups studied differ in terms of healthcare utilization. The study sheds light on the difficulties involved in combining disability profiles and healthcare utilization to define the control group, in the actual context of available Belgian data.
To what extent does the utilization of long-term care match the need for long-term care?
Our results show that a database of healthcare reimbursements is not a good proxy for disability profiles (which make possible the identification of needs for long-term care). Not only are these data not specific enough to differentiate between disability profiles, but people with similar disability profiles do not necessarily utilize healthcare in similar ways.
A partial explanation of this difference could be the occurrence of acute health problems versus a progressive deterioration of health status. In our case, hospitalization within the 2 months before inclusion in the cohort has been used as a proxy for deterioration of health. This deterioration could mainly result either from acute problems for which the patient was being hospitalized [55] or from "iatrogenic disability", defined as "the avoidable dependency which often occurs during the course of care" [56]. The disability profile of older people without recent hospitalization was considered to be the consequence of a progressive deterioration and not the result of a recent acute problem. For people with such a profile, differences between the intervention and control groups could be viewed as underutilization of the healthcare system or as mismanagement of people's health problems. Another explanation could be the social characteristics. Anderson's socio-behavioral model of health service utilization proposes a series of predisposing factors alongside age and gender as social structure components, the health beliefs and a series of other enabling factors alongside the living arrangements of family career as the community ressources, the organization of health and social care, the knowledge of the services, the autonomy in their utilization, the income, the supplementary health insurance,... These factors should be integrated to the analysis to better understand the utilization and the need for health services [47]. However, these variables are not available.
To obtain more accurate information on disability profiles and more generally on health status, Electronic Health Records (EHR) may be of use in the future. However, in Belgium, these are not yet used by every practitioner and therefore they are not routinely available for scientific research [57]. The whole system is actually in development (EHR and pooling of information for population health and scientific studies), in Belgium. It is nevertheless not expected to include information collected by social worker. Indeed, in Belgium, due to a split decision-making power between the federal authorities and the federated entities [46,58] the organization and the funding of formal social services are separated from health care services. These information are even though essential to better support frail older persons at home.
Is it possible to use routine databases of healthcare reimbursement to select controls with long-term care needs similar to those of the intervention groups?
If the objective is to assess the effectiveness of an intervention, the easiest way to evaluate it is to compare the intervention group to a control group with similar baseline characteristics (in our case, with similar long-term care needs) [59]. As discussed earlier, the baseline characteristics must include disability profiles and healthcare utilization. Indeed, a similar disability profile allows us to evaluate the impact of the intervention on the evolution of clinical outcomes and on long-term care needs. On the other hand, similar baseline healthcare utilization facilitates the evaluation of the consequences of the interventions for how widely long-term care needs are met and for reducing the use of low-value healthcare services. Such a reduction can be interpreted as a decrease of the pressure on the healthcare system.
As shown in this study, some proxies for healthcare utilization appeared to be associated with some disability profiles. However, this association was not clear enough to allow the determination of disability profiles from healthcare utilization. So, in our study, we were not limited by the "technical" capacity to use variables in the control group (indeed, the propensity score method does not limit the number of control variables [60]). The limitations of this study are related to the availability of individually measured variables in the control group. There were two difficulties related to the scarcity of the variables required to create an "ideal" control group. First, people in the intervention group were recruited through different entry points, including at discharge from hospital, from the waiting lists of the nursing homes, and from other sources. People in the control group were recruited from among those receiving nursing care at home, for organizational reasons. The different recruitment points constituted an important potential for a selection bias. However, they make it possible to appreciate the importance of integrating healthcare utilization into the discussion about unmet long-term care needs. Second, the number of controls was considerably smaller than the number in the intervention group, due to resource constraints. The method of one-by-one matching of the nearest neighbors with replacement that was used in this study accounted for the replication of certain controls. Other statistical approaches were considered, including bootstrapping on the matching or changing the ratio of Protocol 3beneficiaries to controls. However, these alternatives did not entirely resolve the scarcity of some profiles within the control group. The best solution would have been to include more controls from diverse entry points, which would have demanded considerable time and financial resources. Another option would be to exclusively use more diverse routine data (such data is either not yet available in Belgium or else links cannot yet be made between such data) to establish the control group. Indeed, routine data are recognized as the best tool for evaluating large-scale interventions. They are recommended in comparative effectiveness studies [61], as they are available for the whole population. The use of routine data also makes it possible to study real-world effectiveness [62,63] and helps in policy-and decision-making, both through prediction of future needs and comparison with other public programs [64].
In Belgium, routine data provides information on reimbursed healthcare consumption, including healthcare utilization and proxies for comorbidities, but not disability profiles. These data can therefore help track the utilization of healthcare services over time, but they give no idea of long-term care needs and the coverage of those needs. Nevertheless, Belgium is trying to reorganize its health system and hopes to draw on the expertise gained from Protocol 3 projects to take a step forward in the implementation of projects designed to provide integrated care for people with chronic health problems [58]. These integrated care projects should involve a comprehensive reorganization of healthcare and social services, as well as the pooling of resources and policy adjustments. Such projects can only be successful if, at the same time, appropriate tools for data sharing are developed. These tools may be used for assessing the needs of patients and for evaluating healthcare utilization (by using population data). Clinical scales may help to better characterize disability profiles.

Conclusions
This paper raised three key questions that need to be addressed in order to understand the relationship between individual disability status, patterns of use of healthcare services, and the effectiveness of interventions. To explore the first question, we discussed an approach to stratifying and establishing a control group for the assessment of the effectiveness of interventions targeting frail older people at home. As a consequence, and in response to the second question, this study sheds more light on the association between disability profiles and healthcare utilization. Finally, and in relation to our third question, this study encourages both researchers and policy-makers to reflect on the best trade-off between using large databases with a limited number of available variables (such as healthcare utilization data without disability variables) and a relatively small sample with a sufficient number of variables (disability variables) specifically collected in an observational comparative effectiveness study. These results therefore represent an important contribution to future designs of comparative effectiveness studies. This contribution is all the more relevant at a time when important transformations in electronic records on health and healthcare use and resulting databases are under scrutiny in Belgium and other European countries. Our findings should definitely pave the way for better use of new routine databases to evaluate interventions.