Economic models of community-based falls prevention: a systematic review with subsequent commissioning and methodological recommendations

Background Falls impose significant health and economic burdens among older populations, making their prevention a priority. Health economic models can inform whether the falls prevention intervention represents a cost-effective use of resources and/or meet additional objectives such as reducing social inequities of health. This study aims to conduct a systematic review (SR) of community-based falls prevention economic models to: (i) systematically identify such models; (ii) synthesise and critically appraise modelling methods/results; and (iii) formulate methodological and commissioning recommendations. Methods The SR followed PRISMA 2021 guideline, covering the period 2003–2020, 12 academic databases and grey literature. A study was included if it: targeted community-dwelling persons aged 60 and over and/or aged 50–59 at high falls risk; evaluated intervention(s) designed to reduce falls or fall-related injuries; against any comparator(s); reported outcomes of economic evaluation; used decision modelling; and had English full text. Extracted data fields were grouped by: (A) model and evaluation overview; (B) falls epidemiology features; (C) falls prevention intervention features; and (D) evaluation methods and outcomes. A checklist for falls prevention economic evaluations was used to assess reporting/methodological quality. Extracted fields were narratively synthesised and critically appraised to inform methodological and commissioning recommendations. The SR protocol is registered in the Prospective Register of Systematic Reviews (CRD42021232147). Results Forty-six models were identified. The most prevalent issue according to the checklist was non-incorporation of all-cause care costs. Based on general population, lifetime models conducting cost-utility analyses, seven interventions produced favourable ICERs relative to no intervention under the cost-effectiveness threshold of US$41,900 (£30,000) per QALY gained; of these, results for (1) combined multifactorial and environmental intervention, (2) physical activity promotion for women, and (3) targeted vitamin D supplementation were from validated models. Decision-makers should explore the transferability and reaches of interventions in their local settings. There was some evidence that exercise and home modification exacerbate existing social inequities of health. Sixteen methodological recommendations were formulated. Conclusion There is significant methodological heterogeneity across falls prevention models. This SR’s appraisals of modelling methods should facilitate the conceptualisation of future falls prevention models. Its synthesis of evaluation outcomes, though limited to published evidence, could inform commissioning. Supplementary Information The online version contains supplementary material available at 10.1186/s12913-022-07647-6.

Trial-based evidence consistently suggests that diverse types of falls prevention interventions in the community setting can significantly reduce the number of falls and fallers [21][22][23]. In England and Wales, the National Institute for Health and Care Excellence (NICE) falls prevention clinical guideline (CG161) recommends that older persons aged 65+ in the community (i.e., not in extended or institutionalised care settings such as nursing homes and hospital wards) are routinely screened for falls risk by health and social care professionals [4]. High-risk individuals should subsequently be referred to multifactorial intervention involving multidisciplinary falls risk assessment followed by tailored treatments including exercise, home assessment and modification (HAM), vision correction and medication change [4]. In addition to this proactive (i.e., initiated by professional referral) pathway, CG161 also recommends a reactive pathway for those admitted to a medical facility for a fall (multifactorial intervention and HAM) [4]. Older persons may also 'self-refer' by voluntarily enrolling in a falls prevention intervention (e.g., exercise) available in the community [24,25].
Given scarce care resources, commissioning of falls prevention should be informed by economic evaluations that consider the costs and consequences of any falls prevention strategy against the next best alternative use of resources [26]. Decision modelling is a vehicle for economic evaluation that combines multiple epidemiological, intervention and economic parameters from diverse sources in a coherent mathematical and statistical framework suitable for decision-making [27]. Relative to economic evaluations alongside a single clinical study, models can inform decisions at a broader population level (rather than for specific patient groups), incorporate the long-term costs and consequences of falls, and systematically evaluate the impact of all relevant scenarios and input parameter uncertainties as commissioning relevant factors for consideration [28].
A systematic review uses systematic and explicit methods to identify, select and critically appraise relevant research in the topic area, and perform data extraction and analyses [29,30]. Conducting a systematic review of community-based falls-prevention decision models can perform two functions simultaneously. First to inform commissioning decisions, by summarising all available model outcomes relevant to the decision problem and context; alternatively, it can identify an existing model that can be adapted and re-used [31]. Second to appraise the methodological features of models, detailing and critically appraising methodological features that significantly affect the evaluation results including structural assumptions made by decision models [26,31]; this can be achieved by applying a pre-established methodological and reporting quality checklist, then conducting a narrative synthesis of the methodological features including their strengths and limitations [32]. Ideally, the systematic review should perform both functions together: the commissioners would benefit from the methodological appraisal that qualifies the model outcomes; the modellers basing the conceptualisation of future models on the reviewed methodological features would need to know how the features affect the model outcomes and therefore the commissioning strategy.
A prior systematic overview of systematic reviews of falls prevention economic evaluations assessed how well previous reviews had performed both functions [33]. Seven systematic reviews covering 21 decision models were identified [34][35][36][37][38][39][40]. The systematic overview reported that the identified systematic reviews extracted a limited range of methodological model features and evaluation outcomes to inform commissioning; for example, the extracted methodological features were limited to model type and brief summaries of data sources. A pilot Medline search by the current authors identified 10 decision models of community-based falls prevention that were not included in the aforementioned seven systematic reviews. Therefore, current systematic reviews are now outdated and provide insufficient detail.
The aim of this study is to conduct a systematic review of community-based falls prevention economic models. We systematically search for and identify communitybased falls prevention decision models, then apply a pre-established checklist for assessing the reporting and Conclusion: There is significant methodological heterogeneity across falls prevention models. This SR's appraisals of modelling methods should facilitate the conceptualisation of future falls prevention models. Its synthesis of evaluation outcomes, though limited to published evidence, could inform commissioning. methodological quality of falls prevention economic evaluations [32]. We subsequently conduct a narrative synthesis and critical appraisal of methodological features of identified models including key features of falls epidemiology, falls prevention interventions, and evaluation methods. We then formulate methodological and commissioning recommendations based on the aforementioned. This systematic review can inform commissioners and other consumers of economic evidence (e.g., care professionals and patient groups), producers of economic evidence (e.g., modellers) and systematic reviewers interested in the review methodology.

Methods
The systematic review protocol is registered on the Prospective Register of Systematic Reviews (CRD42021232147). We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline and the checklist is reported in the Supplementary Materials [29,30].

Data sources and study selection
The search covered the period January 2003 to December 2020 and 12 academic databases: Medline, Embase, PubMed, CDSR, CENTRAL, EconLit, CINAHL, Psy-cInfo, ASSIA, CRD, CEA Registry and PEDro. Grey literature was searched from online sites of the Department of Health, Chartered Society of Physiotherapy, College of Occupational Therapy, Royal College of Nursing and Age UK. A previous systematic review to inform the NICE falls prevention clinical guideline had covered the period before 2003 and found just one decision model [34]; hence, the period from 2003 was covered. The search strategy was an intersection between terms for falls, older people, and economic evaluation. All database and grey literature search strategies are given in Tables A1.1 to A1.8 and related text in Supplementary Materials. References and citations of included studies were also searched.
Two researchers (JK and YL) independently reviewed the titles and abstracts of identified articles at the first stage and the full texts of approved articles at the second stage. Those that received two second-stage approvals were included for data extraction. Another researcher (TY) arbitrated in case of disagreement.
A study was included if it: (i) targets a population of community-dwelling (i.e., not in extended or institutionalised care settings such as nursing homes and hospital wards) older persons (aged 60+) and/or individuals aged 50-59 at high falls risk; (ii) evaluates intervention(s) designed to reduce the number of falls or fall-related injuries; (iii) against any comparator(s); (iv) reports outcomes of economic evaluation (i.e., comparative analysis of interventions in terms of their relative costs and consequences [26]); (v) uses a decision model [26]; and (vi) has English full text. The age range in criterion (i) sought to increase the evidence for primary and/or earlier-life prevention which is a key principle of geriatric public health intervention [41,42]. The Cochrane systematic reviews of community-based falls prevention randomised controlled trial (RCT) evidence had also set the lower age bound at 60 rather than 65 [21][22][23]; a previous systematic review of community-based falls prevention economic evaluations had covered the high-risk group aged 50-64 [37].
Models evaluating interventions for specific disease areas (e.g., stroke) with minor falls prevention components were excluded. Interventions aiming to reduce specific falls risk factor (e.g., balance) and/or health consequences of falls (e.g., fear of falling) were excluded if the model did not explicitly incorporate falls as events. Economic evaluations alongside a single clinical study were excluded but their references were searched. Eligible models included in previous systematic reviews of falls prevention economic evaluations were included [34][35][36][37][38][39][40]. Table 1 shows the data fields extracted from identified models, including the following categories: (A) model and evaluation overview; (B) falls epidemiology features; (C) falls prevention intervention features; (D) evaluation methods and outcomes; and (E) key methodological challenges for public health economic models. The data extraction was primarily conducted by JK, supported by YL.

Model overview and checklist scores for reporting and methodological quality
The extracted features for model and evaluation overview under category (A) in Table 1 were reported. A checklist specifically designed to assess the reporting and methodological quality of falls prevention economic evaluations was applied after being adapted for use on decision models, as described and presented in Supplementary Materials, Table A2 [32].

Narrative synthesis of methodological features and methodological recommendations
The extracted methodological features under categories (B), (C) and (D) in Table 1 were narratively synthesised, mainly using tabular formats. The synthesised features were selected based on their potential to affect model credibility and evaluation results as noted by guidelines on conducting and reporting falls prevention economic evaluation [32], wider falls prevention literature including the NICE clinical guideline CG161 [4], and the health technology assessment (HTA) checklist for quality assessment of Table 1 Data fields extracted from decision models identified by systematic review Abbreviations: CBA Cost-benefit analysis, CEA Cost-effectiveness analysis, CUA Cost-utility analysis, DSA Deterministic sensitivity analysis, PSA Probabilistic sensitivity analysis, QALY Quality-adjusted life year, RCT Randomised controlled trial, ROI Return on investment a Community-dwelling or institutionalised b Cost-effectiveness analysis (CEA) uses natural health units (e.g., number of falls) as health outcomes; cost-utility analysis (CUA) generic quality-adjusted life year (QALY). Costbenefit analysis (CBA) values health outcomes using societal or consumption value of health. Return on investment analysis (ROI) only compares the net financial outcomes of two or more interventions c Expert guideline on falls prevention economic evaluation recommends that evaluations report all-cause healthcare costs in the base case and fall-related costs in sensitivity analysis [32]. All-cause care costs are comprised of fall-related and comorbidity care costs d Intervention type classification should follow the Prevention of Falls Network Europe categories [43] e Potential intervention pathways are: proactive -initiated by professional screening/referral; reactive -initiated after medical attention for a fall; and self-referred -enrolled voluntarily by older persons f Falls risk screening is required if: (1) model prescribes intervention to a subset of the whole target population with certain characteristics (e.g., higher falls risk) and this subset must be identified; and (2) model's target population itself is a specific patient group (e.g., cataract patients) and this group must be identified from the general population before model baseline. Falls risk screening is distinct from falls risk assessment as part of multifactorial intervention g This concerns models that import falls efficacy evidence from external intervention studies. Main falls incidence metrics are falls risk and falls rate, and their matching efficacy metrics are relative risk (RR) and rate ratio (RaR), respectively. Models should ensure that the external efficacy metric matches the internal falls incidence metric h Like note f, this concerns decision models using external efficacy evidence. The fall type (e.g., hospitalised fall, fall-induced fracture) for the efficacy data should match that for the model incidence i The effectiveness period is a function of efficacy durability and implementation sustainability. Efficacy durability should not extend beyond the intervention study's timespan unless the intervention is sustained [32]. Key determinants of sustainability are demand-side persistence and supply-side maintenance j For example, falls prevention exercise can improve cardiovascular health [25] k Structural or face validity concerns validity of model structure, data sources and assumptions as assessed by modelling and disease-area experts and broader stakeholders [31,44]. Structural validity can be assessed prospectively during the model development stage through proactive involvement of stakeholders in model conceptualisation; it can also be assessed retrospectively by evaluating scenarios on different structural assumptions [31] l Internal validity concerns the accuracy of model coding; external validity concerns comparability between model and real-world results; and cross validity concerns comparability between model results and results of other models addressing the same decision problem [44] Category

Data field
Reporting and methodological quality checklist The checklist designed for falls prevention economic evaluations by a panel of falls prevention experts [32] was adapted to specifically suit decision models. There were 32 items, each scored 0 (recommendation not followed), 0.5 (partially followed), and 1 (fully followed), giving maximum score of 32. See Table A2 in Supplementary Materials for adapted version.
(A) Model and evaluation overview 1. Bibliography: author(s); publication year 2. Setting and aim: country; region; decision-maker; evaluation aim 3. Target population demographics and comorbidities (e.g., residence, a age, sex, socioeconomic status, health conditions unrelated to falls risk) 4. Type of analysis: e.g., CEA; CUA; CBA; ROI b 5. Perspective (e.g., public sector, societal) 6. Cost-effectiveness threshold: monetary amount and type (e.g., health opportunity cost in healthcare system, willingness to pay as consumer) 7. Model type (e.g., decision tree, Markov) 8. Model time horizon decision models [45]. Critical appraisal identified betweenstudy variation in the methods used to characterise the features and their respective strengths and limitations (including those mentioned by the model's developers).
Methodological recommendations for future model development were subsequently formulated by this systematic review.
Features under category (E) were informed by the systematic methodological review of key methodological challenges to public health economic modelling [46]; these features are synthesised and appraised (with associated methodological recommendations) in a future publication. Nevertheless, features that potentially affected the model outcomes significantly are discussed in this article whilst formulating commissioning recommendations.

Developing commissioning recommendations by this systematic review
Extracted under category (D) in Table 1, commissioning recommendations from model evaluation results are based primarily on a subset of models that targeted general older populations -as opposed to specific patient groups -and conducted analyses over a lifetime horizon. Prioritising this subset addresses the information needs of decision-makers overseeing geographically defined jurisdictions (e.g., national) [28]. The evaluation over a lifetime horizon is recommended by the expert guideline on falls prevention economic evaluation [32].
The recommendations considered all available evaluation outcomes -including not only cost-per-unit ratios but also aggregate, population-level impact and wider decisional outcomes (e.g., impact on social inequities of health) -and methodological caveats potentially affecting credibility and outcomes. Monetary outcomes were converted to US$ in 2021 using the consumer price index (CPI) in the country of study to account for inflation up to 2021 [47] and the most recent purchasing power parity (PPP) exchange ratio between US$ and the original currency [48]. For cost-utility analysis (CUA), an ICER less than US$41,900 (£30,000) per qualityadjusted life year (QALY) gain was deemed cost-effective according to the threshold recommended by the NICE HTA guideline [49]. Figure 1 presents the PRISMA flow diagram. In total, 15,730 titles and abstracts were screened. Ninety-two full texts were screened from which 46 decision models were identified. Six studies were identified from the grey literature and references of other studies. The main reason for exclusion at the full text screening stage was not conducting economic evaluation via decision modelling. The titles of the excluded studies are given in Table A3 in the Supplementary Materials. Table 2 provides an overview of the 46 included models. Apart from Agartioglu [50] set in Turkey, all models were set in developed countries: 14 from the US and Canada (30.4%); 12 Australia and New Zealand (26.1%); 11 UK (23.9%); and eight Europe (17.4%). Twenty-four (52.2%) models aimed to inform decision-making at the national level, while the rest adopted more local application levels including state, city, and clinical commissioning groups in the UK.
There were four types of economic analysis: cost-effectiveness analysis (CEA), cost-benefit analysis, (CBA), return-on-investment analysis (ROI), and CUA. No further types, e.g., cost-consequence analysis (CCA), were identified. There were two costing perspectives: public sector and societal. Several models adopted multiple types of analysis and perspectives, resulting in 69 distinct analyses. Of these, CUA was most used (n = 32; 46.4%), followed by ROI and CEA (each n = 17; 24.6%), and then CBA (n = 3; 4.3%). Around a third of analyses (n = 22) adopted the societal perspective.
Exercise was the most evaluated intervention type with 17 models; eight evaluated multiple exercise forms. Multifactorial intervention was the second most evaluated type with 13 models: three evaluated multiple forms [58,59,66]; two combined multifactorial intervention with environmental modifications [53,73]. Twelve evaluated multiple types of interventions: four compared multiple types directly [58,66,80,89]. The most common comparator scenario was not receiving the modelled intervention(s). Eight models described the 'usual care' (without falls prevention properties) received in the comparator scenario, e.g., non-expedited cataract surgery compared to expedited [86]; but others (24 of 32 with non-receipt scenario) were vague in the description or used 'no intervention' and 'usual care' interchangeably [34,60,61,85,93]. There were four model type categories: (1) binary decision (n = 14); (2) static (n = 9); (3) cohort-level Markov (n = 19); and (4) patient-or individual-level Markov (n = 4). Binary decision models compared the state of the world with and without the intervention and did not incorporate transition probabilities or time cycles. All static models except Smith [88] were decision trees without time cycles; Smith [88] compared several falls risk cut-off levels without time cycles. Model time horizon varied between one year and lifetime. Seventeen of 23 Markov models adopted lifetime horizons.

Checklist scores for methodological and reporting quality
Tables 3, 4 and 5 shows the item-specific checklist scores for models. The overall quality score ranged between 13.5 and 27 (average 21.2) of maximum 32. The lowest scored item across models was item 15, which recommends reporting total/all-cause health resource utilisation costs under base case analysis and fall-related costs under sensitivity analysis. For this, only four models (all using primary collection of cost data) incorporated all-cause healthcare costs as the main economic outcome [51,52,86,87]; six incorporated comorbidity care costs, which together with fallrelated costs constitute all-cause costs [54,62,70,73,82,92]. The second lowest scored item was item 21, which recommends: (i) reporting intervention costs and all-cause/fall-related healthcare costs separately; and (ii) reporting both aggregate and mean costs. For this, eight followed both recommendations [59, 67, 69, 71-73, 85, 93], five followed (i) only [56,75,76,83,84], and four followed (ii) only [64,80,81,94]. The third lowest scored item was item 8 for clearly stating and justifying the comparator which, as discussed above, was done by less than half (n = 22) of studies.

Narrative synthesis: falls epidemiology features
As detailed in Table 1, falls epidemiology features are synthesised based on: (1) characterising baseline falls risk; (2) characterising recurrent falls; (3) range of falls risk factors; (4) range of falls health consequences; (5) health utilities for CUA; and (6) range of fall-related economic consequences. Table 6 shows four main approaches for characterising the baseline falls risk/rate of models: (1) analysis of individual-level epidemiological data; (2) use of published epidemiological data or expert/author opinion;

Baseline falls risk
(3) use of internal intervention study; and (4) use of falls risk/rate from RCT control group.     Markov cohort" describes cohort-level Markov models that simulate the proportion of a population that experience an event (e.g., fall incidence) and progresses to a different model state. "Markov patient" describes patient-or individual-level Markov models that simulate the progression of individuals with unique set of characteristics [95] b Intervention included individually tailored education, HAM and exercise and public space safety improvement c Binary decision models include two scenarios, with or without intervention, and no time-based cycles or probability trees Eight models employing (1) estimated the baseline falls risk/rate by analysing individual-level data relevant to the decision-making context. One used a local survey [63], but the other seven analysed administrative healthcare ('routine') datasets. For example, the four BODE3 models developed by the same research group analysed the insurance claims data at national and state levels to estimate the incidence rates of falls requiring medical attention (i.e., MA falls). A key strength of routine data is that falls incidence is linked to consequent care utilisation and cost; the latter can then be stratified by individual-level risk factors. The routine data should contain individual identifiers to distinguish between number of fallers and falls per faller. The BODE3 models did not make this distinction, counting multiple falls per person as multiple fallers and overestimating the baseline falls risk.
Twenty-five models used published epidemiological evidence (n = 22) or expert opinion (n = 3) [57,58,77]. Compared to approach (1), the use of published evidence restricted the range of falls risk factors and relevant population subgroups (see below). Nevertheless, published evidence allowed parameterisation of fallrelated events that are not well-observed in routine data (e.g., non-MA falls).
Nine models sourced the baseline falls risk/rate and intervention effectiveness from the same internal intervention study. For example, Albert [51] developed a decision tree model using the baseline risk, effectiveness, and costs evidence from a quasi-experimental evaluation of Table 3 Results of methodological and reporting quality checklist application to included models a See Table A2 in Supplementary Materials for item contents. Study is given a score of 1 if deemed to have followed the item recommendation fully, 0.5 if partially (light grey shading) and 0 (dark grey shading) if not followed multifactorial intervention. The reliance on a single intervention study makes these models similar to non-modelling evaluations alongside clinical studies. Nevertheless, the nine models: explicitly developed models using internal data [51,52,76]; extrapolated results over a longer time horizon [73,86,87]; extrapolated results to national population [91]; and extrapolated results to a wider societal perspective [53]. These models assumed that the internal intervention sample is representative of the target population; this assumption would not hold if there were sampling biases.
Four models used the falls risk/rate from the control group of an external RCT (or pool of RCTs). For example, Day [61] used the falls rate pooled from two Tai Chi RCTs to characterise the baseline rate, then applied the Tai Chi efficacy from a separate meta-analysis. Analysts can draw on diverse external RCTs to characterise the baseline risk; heterogeneous risks across subpopulations can be modelled by drawing on multiple sources simultaneously. However, this approach generally restricts the model time horizon to that of an external RCT and cannot model the long-term falls risk progression without being supplemented by longer-term observational data. Table 7 lists the models by model type category and their features relevant to characterising recurrent falls. The first feature is the transition entity, which is either the fall Table 4 Results of checklist application to included studies a See Table A2 in Supplementary Materials for item contents. Study is given a score of 1 if deemed to have followed the item recommendation fully, 0.5 if partially (light grey shading) and 0 (dark grey shading) if not followed Table 5 Results of checklist application to included studies (n = 46) a See Table A2 in Supplementary Materials for item contents. Study is given a score of 1 if deemed to have followed the item recommendation fully, 0.5 if partially (light grey shading) and 0 (dark grey shading) if not followed  [67]; PHE (2018) [85] event or individual. The individual-transitioning models, particularly those with cycle length of one year or longer, should ensure that recurrent falls could occur to individuals during each cycle. A qualifying factor is the type of main fall-related event: if the event is less likely to recur within a year (e.g., hip fracture), then the need to characterise recurrent falls is reduced. There were 23 models incapable of characterising recurrent falls. Seven of the 23 had fracture as the main event which are less likely to recur within a year [66,69,70,73,[78][79][80]; whilst 16 models with falls as the main event were incapable of characterising recurrent MA or non-MA falls. Of 13 individual-transitioning models that were capable of characterising recurrent falls, three methods were mainly used: (1) modelling separate health states for recurrent fallers; (2) assigning average number of falls per faller; and (3) incorporating cycle lengths shorter than one year. Three models used (1) [51,52,56]: e.g., CSP [56] incorporated age-and gender-specific risks of experiencing recurrent falls conditional on having fallen. Three used (2) (3), incorporating the following cycle lengths: one month [74,90]; three months [94]; and six months [68,89]. Hiligsmann [68] and Zarca [94] had fractures as the main event yet incorporated short cycles. Tannenbaum [89] modelled higher falls risk in the second of the two six-month cycles for those who experienced a fall in the first. Other methods included: applying a negative binomial regression on individuallevel data to adjust the falls risk for the number of falls per faller [76]; and targeting those who have experienced a fall immediately prior to the model baseline ('targeted recurrent fall' in Table 7) [71,93]. No study employed model types incorporating time-to-event data (e.g., discrete event simulation) to overcome the limitation of set cycle lengths. Table A4 in Supplementary Materials summarises the range of risk factors for falls and fall-related events incorporated by models that conducted primary analysis of individual-level data or used published epidemiological evidence (i.e., the first two approaches for characterising baseline risk in Table 6). For the eight models that conducted primary analysis, the individual-level granulation offered greater scope for incorporating a wide range of risk factors. For example, the four BODE3 models incorporated age, sex, ethnicity, and MA falls history as risk factors for MA fall, hospitalised fall and fatal fall. Smith [88] constructed a de novo MA falls risk prediction tool using diverse variables observed in the primary and secondary care routine data including history of fall/fracture; chronic disease diagnoses and history of all-cause secondary care utilisation.

Falls risk factors
Twenty-five models that used published evidence were more restricted in their incorporation of risk factors. Ten incorporated a single baseline risk or included age and/or sex as the only non-exogenous (i.e., not given at model baseline) risk factors [34,50,55,66,68,71,77,83,84,93]. Only four incorporated non-injurious or non-MA falls as a risk factor for further falls within model simulation [57,58,64,75]. No model incorporated fear of falling as a risk factor. Only three incorporated chronic diseases: osteoporosis [78,80]; and depression and cognitive impairment [75]. Physical impairments as risk factors included: vitamin D deficiency [74,94]; low bone mass density [80]; impaired gait or balance, leg weakness and functional impairment [75]; and functional dependency [70].
Models using internal intervention study evidence or external RCT data to characterise the baseline falls risk/ rate (i.e., the last two approaches in Table 6) took the risk factors as given from the internal or external studies. For example, Day [60] used the inclusion criteria of external RCTs to define the risk profiles of six model subgroups receiving different interventions. A representative population survey was then used to estimate the subgroup sizes. Table 8 summarises the health consequences of falls explicitly incorporated by models: i.e., studies included separate model states and probabilities for the consequence.

Falls health consequences
There was noticeable between-study variation in the range of health consequences: 21 (45.7%) models included non-injurious or non-MA falls; 10 (21.7%) considered only fractures, of which six considered only hip fracture; 16 (34.8%) included fatal falls; and six (13.0%) fear of falling. In Church [57,58], and Tannenbaum [89], fear of falling was associated with non-MA and MA fall incidence; in Lee [74] and PHE [85] only with MA fall; in Eldridge [63] fear could occur independently of falls. Fifteen (32.6%) incorporated fall-induced long-term care (LTC) admission; 12 (26.1%) incorporated excess mortality associated with major injuries.
Since a narrower range of health consequences would underestimate the cost-effectiveness of falls prevention, several models highlighted the exclusion of specific health consequences as a limitation: fear of falling [62,82,92]; fatal falls [76]; and non-fracture injuries [73,78,94]. Yet others advocated a narrower range to focus on falls with discernible health consequences [88] and generate conservative results [56]. Regardless, the between-study variation impairs outcome comparisons.  Table A5 in Supplementary Materials summarises the health utilities data used for CUA, the health states to which they are applied, and their sources. Twenty-nine models incorporated health utilities; 25 sourced them from external literature. EQ-5D was the most widely used instrument by 17 models; other instruments included HUI2, HUI3, and SF-6D. Four models concurrently used multiple instruments [70,79,89,90]; two used values directly elicited from TTO exercises [63,70].

Health utilities
The effect of an adverse event on health utility was depicted in three main approaches: (i) assigning an absolute decrement/loss to pre-event utility level; (ii) assigning proportional (i.e., multiplier) decrement to pre-event level; and (iii) assigning a specific health utility level to post-event state. An example of each are: EQ-5D loss of 0.200 for hip fracture in the 1st year, followed by loss of 0.060 for subsequent years [66]; multiplier 0.79 for hip fracture to pre-fracture level for 1st year, followed by multiplier 0.90 for subsequent years [94]; utility level of 0.050 for bad hip fracture requiring LTC admission [63]. These illustrate the significant between-model variation in the applied utility data reducing the comparability of CUA results. Table A6 summarises the economic consequences of falls from the health and social care perspective. The economic consequences were marked even if only their costs were considered without separate model states (unlike health consequences in Table 8). Care consequences directly attributed to falls are divided into six categories: (i) ambulatory care excluding emergency department (ED), e.g., GP visit and ambulance call-out; (ii) ED visit/admission; (iii) hospitalisation; (iv) rehabilitation, e.g., outpatient; (v) short-term social care, e.g., mealon-wheels; and (vi) LTC. The cost of LTC admission was incorporated by 26 (56.5%) models. Studies noted the technical difficulty in costing LTC admission, particularly in identifying admissions directly attributable to falls and in stratifying costs by age and life expectancy at admission [56,59,62].

Fall-related economic consequences
Four models incorporated all-cause (' AC'), rather than fall-specific, care consequences using primary data from intervention studies [51,52,86,87]. Six models incorporated comorbidity care costs [54,62,70,73,82,92]. The four BODE3 models incorporated annual (all-cause) healthcare cost and cost of dying that varied by age and sex; falls prevention indirectly affected these costs by changing the life expectancy and age at death via fatal fall prevention. Johansson [73] incorporated age-stratified societal costs of added life-years measured in net consumption (production value minus consumption and care costs) but not cost of dying. In Honkanen [70], the annual healthcare cost and cost of dying were stratified by functional dependency and residence (community vs. nursing home); fracture prevention indirectly affected these by lowering the risks of functional dependency and nursing home admission. Comorbidity care costs are hence relevant to models that incorporate fatal falls, excess mortality and serious injuries that contribute to increased frailty and care dependency. Yet these costs were included in only six (listed above) of 24 models that incorporated fatal falls and/or excess mortality.

Narrative synthesis: falls prevention intervention features
As detailed in Table 1, falls prevention intervention features are synthesised based on: (1) intervention access pathways; (2) falls risk identification methods; (3) intervention resource-use and cost; (4) intervention efficacy; and (5) wider health effects of interventions beyond falls prevention. Table A7 in Supplementary Materials provides additional detail on intervention components by study. Table 9 categorises all model-evaluated interventions by access pathway -reactive, proactive, self-referred or unclear -and intervention type. Of 101 interventions in total -counting multiple forms per study separatelynearly half (49) had unclear pathway descriptions. The most common pathway was proactive with 29 interventions, followed by self-referred (16) and reactive (7).

Intervention access pathway
Models with unclear access pathways frequently failed to mention how specific groups eligible for intervention were identified and recruited. For example, Church [58] evaluated group exercise, HAM, and multifactorial intervention given to the high falls risk subgroup within the target population but didn't mention how this subgroup would be identified; it similarly failed to mention how specific patient groups for cataract surgery, psychotropic medication withdrawal, and cardiac pacing would be identified.
Three models considered multiple pathways for the same intervention. Eldridge [63] evaluated a falls risk screening and referral programme that encompassed all three pathways operating in tandem: falls patients at A&E and hospital would be screened by the falls risk assessment tool (FRAT) and referred to a multidisciplinary falls clinic (reactive pathway); primary care professionals would screen and refer high-risk individuals to the falls clinic or bi-disciplinary treatment (proactive); the lowrisk individuals not referred could still self-refer to the bidisciplinary treatment (self-referred). In Nshimyumukiza [80], vitamin D and calcium supplementation could be  initiated proactively after fracture risk screening or reactively after fracture incidence. Wilson [92] evaluated a self-referred HAM in the base case and a proactive HAM (targeted at those with MA falls history) as an alternative scenario.

Falls risk screening
Falls risk screening is required to identify subgroups within target population eligible for intervention or specific risk/patient groups serving as the target population itself. Four methods were used to model the screening process: (i) using primary data to assign individual-level distribution of falls risk factors; (ii) using external data to assign cohort-level distribution of falls risk factors; (iii) using external data on screening efficacy (i.e., sensitivity and specificity) without assigning distributions; and (iv) incorporating screening cost only. Two models used (i): Eldridge [63] used primary survey data to estimate falls risk according to FRAT; Smith [88] used routine data to predict falls risk. Three used (ii): Lee [74] assigned ageand sex-stratified prevalence of vitamin D insufficiency; Zarca [94] a lognormal distribution of vitamin D level; and Nshimyumukiza [80] a distribution of BMD level. Screening detected (with perfect precision) vitamin D or BMD insufficiency for intervention referrals.
Two used (iii): CSP [56] assumed that the sensitivity and specificity of timed-up-and-go (TUG) test were both 87% regardless of the underlying distribution of gait/balance impairment; following screening, the 11% highest risk individuals from each five-year age group were referred to physiotherapy. The latter assumption is problematic given that older age groups likely have higher proportions of high-risk individuals (unless the test cut-off levels varied across age groups). Franklin [65] similarly incorporated fixed efficacies for TUG and quantitative TUG (QTUG) without modelling the underlying gait/balance distribution. A disadvantage of this approach is that subgroup variation in the joint distributions of diverse falls risk factors would introduce subgroup differences in the screening efficacy not explored by Franklin [65]. Seven used (iv) [34,60,68,86,87,89,90]: e.g., RCN [34] included the cost of identifying eligible high-risk individuals. Table A8 in Supplementary Materials summarises the intervention resource-use and cost from the public sector perspective (the societal intervention costs will be presented in a future publication). The resources are divided into auxiliary resources facilitating implementation (access, compliance and long-term sustainability) and resources generating therapeutic effects. Exercise and multiple-component interventions were most likely to incorporate these auxiliary resources: e.g., marketing to assist exercise uptake [55]. Falls risk screening resources were likewise auxiliary. Two models failed to cost their screening tools [56,88]. Three models included set-up costs [63,65,77]. There were noticeable between-study variations in resource incorporation for each intervention type.

Intervention resource-use and cost
Therapeutic resources included labour, training, transport, venue and overheads, and health technology and equipment. Labour was the most widely costed resource, including labour performed by nonprofessional volunteers and reimbursed by the public sector [51,62,71,77]. Models evaluating technology-based interventions such as hip protector and gait stabiliser tended to neglect the cost of contributory labour [69,70,74,80,81,83,84,89]. Training costs were concentrated in exercise interventions; only three non-exercise evaluations incorporated them [51,55,77]. Staff transport costs were concentrated in models evaluating exercise, HAM, and multifactorial intervention. Venue costs and overheads were generally included as simple supplements to per-participant labour cost: e.g., Frick [66] increased the labour cost by 50% to account for overheads; Velde [91] by 72%. All intervention types required some technology and equipment; yet not all models detailed or costed them. For example, Frick [66] costed the labour but not the equipment for HAM.
In costing the interventions, preserving the distinction between fixed and variable (i.e., per-participant) costs had a significant impact on results. For example, Eldridge [63] incorporated the fixed cost in running the falls clinic which, under a low uptake rate (6.5% of eligible population), increased the per-participant cost and reduced the cost-effectiveness. Likewise, Comans [59] included annual fixed cost of multifactorial intervention, which determined the uptake rate required to break-even financially. Despite this, 36 (78.3%) models only incorporated per-participant costs, some deliberately translating fixed costs into per-participant rates [60,61,77,92]. Table 10 specifies the fall-related event used for the intervention efficacy and, in parenthesis, the main fall-related event used to characterise falls risk/rate. Twelve (26.1%) models did not incorporate matching events (highlighted in bold). Thirty-six (78.3%) sourced efficacy data from internal or external RCTs and meta-analyses, while three used observational studies [69,80,89]. On using external RCT data, several models questioned whether it can be generalised to routine practice [55,60,61,71,83,85,93]; Mori [78] down-adjusted the RCT-based efficacy by 40% for generalisation. The fifth column details the efficacy and, in parenthesis, incidence metrics. The metrics did not match in 12 (26.1%) models: e.g., Deverall [62] applied RaR on individual falls risk. Table 10 also compares the model horizon with the 'effectiveness period'; i.e., a function of efficacy durability and implementation sustainability. Several studies contained significant disparities between the model horizon and the effectiveness period. For example, Johansson (2017) restricted the effectiveness period to one year within lifetime horizon to produce conservative outcomes. Several lifetime models incorporated long-term effectiveness for individuals who persisted in intervention uptake [62,70,79,81,94]. Models made diverse assumptions on post-implementation efficacy often without justification [57,58,68,77,85]. For example, Church [57,58] incorporated lifetime efficacy for expedited cataract surgery and cardiac pacing but one-year efficacy for other interventions; unsurprisingly, the latter were significantly less cost-effective. Some deliberately curtailed the model horizon to reduce the discrepancy with the effectiveness period [60,61,84,85].

Wider health effects of interventions
Few models incorporated wider health effects of interventions beyond falls prevention. Hiligsmann [68] evaluated a scenario where vitamin D and calcium supplementation reduced the background mortality risk. Alhambra-Borras [52] incorporated the effect of falls prevention exercise on frailty reduction. Boyd [54] allowed cataract surgery to generate QALY gain through vision improvement. Models that incorporated all-cause care costs captured wider health effects without specifying the mechanism [51,86,87]. Other models mentioned their non-incorporation as a limitation [55, 56, 60-62, 71, 73, 77, 78, 80, 94]. Deverall [62], for example, stated that the non-incorporation of exercise benefit on cardiovascular disease (CVD) risk reduction potentially biased the evaluation against the ethnic Maori subgroup who have greater CVD risk.
Two models incorporated adverse health effects and process costs of interventions. Hirst [69] considered the side-effect of transdermal buprenorphine as a replacement for (more fall-risk-inducing) tramadol in chronic pain management. Honkanen [70] expressed the process cost of hip protector use through a health utility decrement of 0.010 for each year of use. Due to the decrement, younger groups aged 65 and 70 experienced overall QALY loss from hip protector use despite fractures being prevented.

Narrative synthesis: evaluation methods
As detailed in Table 1, evaluation methods are synthesised based on three specific aspects: (i) model validation methods and results; (ii) methods for assessing parameter uncertainty; and (iii) alternative scenarios evaluated. Additionally, we focus on how different evaluation methods could lead to alternative commissioning recommendations.

Model validity
Four validity types influence the credibility of model results: structural/face; internal; external; and cross [44]. Seven models involved experts and stakeholders in model development to achieve structural validity prospectively [60,71,79,80,85,90,94]. For example, PHE [85] engaged two groups of stakeholders: a Steering Group of national falls prevention experts informing the model structure, and a User Group of local commissioners advising on model usability. Hirst [69] explicitly stated the purpose of alternative scenario analyses as retrospectively validating the model structure.
Six studies assessed the external model validity [68,73,78,80,83,94]. For example, Nshimyumukiza [80] compared the predicted fracture incidence and agespecific mortality rates to those reported in published literature and found less than 5% divergence. Only four studies reported conducting verification steps or sensitivity analyses to ensure internal validity [68,73,79,94]. Cross validity assessment by comparing the model results with those of previous models was the most common form of validation; yet 13 (28.3%) did not report having conducted it [55-57, 60, 61, 63, 65, 69, 72, 77, 81, 83, 85]. Only Zarca [94] conducted all four validations; four conducted three [68,73,79,80]. Overall, model validation is not yet a common methodological and reporting practice in this field. Table A9 in Supplementary Materials summarises the parameters unilaterally varied in deterministic sensitivity analysis (DSA) to assess their impact on outcomes. It also summarises the methods used to conduct probabilistic sensitivity analysis (PSA) assessing the impact of joint parameter uncertainty. The DSA parameters are divided into falls epidemiology and falls prevention intervention parameters. A distinction was made between parameter variations to assess parameter uncertainty and those depicting alternative scenarios based on studies' descriptions of the purpose of the variations.

Assessing parameter uncertainty
Twelve (26.1%) models conducted no assessment of parameter uncertainty. Of 21 models that conducted DSA, there was a wide between-study variation in the number of parameters assessed, ranging from two to 12. Twenty-eight (60.9%) conducted PSA. The cost-effectiveness acceptability curve (CEAC) which plots the probability of each intervention being the most cost-effective option at each cost-effectiveness threshold was the most frequently used presentation method (n = 18). Only Agartioglu [50] plotted the cost-effectiveness acceptability frontier (CEAF) which marks the threshold at which an intervention produces the highest expected value relative to alternatives across simulated runs. Only Albert [51] conducted value of information analysis, estimating that the cost-effectiveness of multifactorial intervention would improve under simulation runs that excluded uncertainty over health utility decrement parameters.   [73,80,93], there was a lack of clarity on how the scenarios were chosen among the range of possible options.  Tables 8 and A6) which hamper between-study comparison. All models except Nshimyumukiza [80] and Zarca [94] were Markov cohort models but mentioned no tunnel states for age-related progression in falls risk. This likely disadvantages the outcomes for younger subgroups at baseline whose stymied age-related risk progression means smaller lifetime intervention benefit derived in terms of proportional reduction in falls risk. Only three models were validated (other than cross-validation): Johansson [73] internally and externally; Nshimyumukiza [80] structurally and externally; and Zarca [94] structurally, externally and internally. The last column of Table 11 notes the main methodological caveats for each model that are relevant for commissioning. The methodological quality checklist scores in Tables 3, 4 and 5 should also be noted. All models that conducted CUA except Eldridge [63] produced ICER for at least one intervention relative to no intervention or usual care that can be deemed cost-effective under the NICE threshold. In the order of increasing ICER values, these interventions were:

Narrative synthesis: evaluation outcomes
• Johansson [73]: Combined multifactorial and environmental intervention for age 65+ • RCN [34]: Multifactorial intervention for high-risk group aged 60+ • Nshimyumukiza [80]: General physical activity promotion among women (without population-level fracture risk screening) aged 65+ • Honkanen [70]: Hip protector use for women aged 80 or 85 at baseline and men aged 85 • Wilson [92]: HAM for state-level population with or without MA falls history aged 65+ • Deverall [62]: Home exercise and peer-led group exercise for age 65+ • Pega [82]: HAM for national population with or without MA falls history aged 65+ • Zarca [94]: Vitamin D screening followed by supplementation for age 65+ • RCN [34]: Exercise for high-risk group aged 60+ • Farag [64]: Non-specific intervention of US$587 per-participant cost and 25% reduction in risk for age 65+ • Church [58]: Tai Chi for age 65+ Given these interventions, a key decisional factor is their aggregate impacts determined by their reaches. The combined intervention in Johansson [73] arguably has the greatest reach since it sets no risk-based eligibility criteria for multifactorial intervention, and its environmental components reduce risk factors independently of older people's demand. Therefore, the decision-maker should consult stakeholders to determine the local scalability of the combined intervention.
Consideration of aggregate impacts likewise shows that HAM in Pega [82] and Wilson [92] should not be targeted at those with MA falls history unless there are significant budget or capacity constraints: the universal approach remains highly cost-effective and produces greater aggregate impact than the targeted approach. In Honkanen [70], the sharp disparity in cost-per-unit ratios across baseline age subgroups justifies the age-based targeting of hip protector use, but the lack of age-related risk progression in the Markov cohort model may have disadvantaged the younger groups. In Zarca [94], the different reaches of alternative strategies were not clearly specified, with outcomes (incremental costs and QALYs) being reported at per-participant rates only. Universal vitamin D supplementation generated less favourable per-participant outcomes than targeted supplementation but may have produced greater aggregate benefits, especially when the model allows individuals with sufficient baseline vitamin D level (75 nmol/L) to derive fracture risk reductions from further supplementation (up to 105 nmol/L). The study's conclusion that targeted strategies are preferable to universal supplementation would be misleading if only perparticipant outcomes were compared. Eldridge [63] demonstrated how considerations of cost-per-unit ratio and aggregate impact are in reality closely linked. The cost-effectiveness of the multipathway intervention was poor with 40% probability of it being cost-effective vs. usual care under the threshold of US$41,900 (£30,000) per QALY (ICER point estimate was not reported). The study attributed this to low intervention uptake (6.5% among eligible population) which interacted with the substantial fixed intervention costs to worsen the cost-effectiveness. Hence, the uptake rate was the key policy variable: the model estimated that 100% screening uptake would reduce the number of fallers by 11.3% over one year compared to 2.8% under the base case. The potential impact on the ICER was not reported but can be anticipated to be highly positive.    Table 2 for study references; parenthesised number refers to the number of models included in the table   b All monetary units are converted to US$ in year 2021 using the average consumer price index (CPI) between the original year of reported currency to 2019 (most recent year for CPI data) [47] in the country of study and purchasing power parity (PPP) rate between the original currency and US$ in year 2020 (most recent PPP data) [48] c Intervention reach refers to the number/proportion of persons receiving the intervention. It is a function of intervention's normative reach defined by its eligibility criteria and targeting strategy and its implementation reach determined by the level of implementation (e.g., uptake and adherence) within the eligible population d The study does not mention how falls risk progressed with age in the absence of falls incidence (which has a separate model state). Markov model should incorporate tunnel states to allow for secular risk progression, but this is not stated or graphically illustrated e The study evaluated counterfactual scenarios where Maori/men had equal life expectancy as non-Maori/women and found that subgroup ICERs became similar (Maori/non-Maori only in Wilson (2017) [92] The study recommended a health promotion campaign to increase the uptake; the decision-maker should likewise consider investments in auxiliary implementation strategies.
Regarding intervention impact on social inequities of health, Deverall [62], Pega [82], and Wilson [92] -presented subgroup results across ethnicity (Maori vs. non-Maori) and found higher ICERs and lower health gains for Maori. The HAM and exercise interventions hence worsened the health inequity between ethnic groups relative to usual care, and this finding may generalise to other settings with similar social disparities in health opportunities. The decision-maker could choose to permit this increase in health inequity or design an alternative strategy that generates an equal or greater health gain for the socially deprived group. The latter would likely introduce an equity-efficiency trade-off relative to the base case strategy which may be accepted or rejected by stakeholders based on their inequity aversion [96]. Such scenarios were not explored by the models. Yet they identified pre-existing life expectancy differentials between ethnic subgroups as the main cause of the inequitable impact: assigning non-Maori life expectancy on the Maori subgroup nearly eliminated the health gain differentials. This presents a rationale for commissioning interventions at earlier life stages to reduce the life expectancy differential.

Methodological recommendations
Methodological recommendations are made based on accounting for falls epidemiology features, falls prevention intervention features, evaluation methods, and how evaluation outcomes are used to formulate commissioning recommendations.

Falls epidemiology features
1. Clearly state the type and source of data used to characterise the baseline falls risk and discuss the strengths and limitations of choice. 2. Use appropriate methods to characterise recurrent falls, particularly for individual-transitioning models with annual cycles. 3. Maximise the range of falls risk factors modelled including those highlighted by NICE CG161 (falls history, fear of falling, home hazards, gait deficit, balance deficit, mobility impairment, visual impairment, cognitive impairment, urinary incontinence) [4] and multivariate frailty [97]. Use individual-level data where available. 4. Maximise the range of falls health consequences modelled including the long-term impact on risks of mortality and health/functional decline.
5. For CUA, distinguish between acute and long-term impacts of fall-related events on health utility and discern whether assigning utility decrement (absolute or proportional) or level is more appropriate for each impact. 6. Maximise the range of fall-related economic consequences modelled including comorbidity care costs associated with the long-term mortality and morbidity impacts of falls. Where data permit, incorporate all-cause care costs which capture the full care consequences of falls, while also reporting fall-related care costs [32].

Falls prevention intervention features
1. Clearly describe the comparator(s); refrain from using the terms 'usual care' and 'no intervention' interchangeably and describe the usual care received [32]. 2. Clearly state the access pathway(s) -reactive, proactive or self-referred -for intervention(s) and describe the mechanisms facilitating access (e.g., marketing for self-referred pathway). 3. Use appropriate methods for modelling the falls risk screening process to identify subgroups within target populations or specific patient groups serving as target populations. Resource-use associated with screening should be appropriately characterised and costed. 4. Maximise the granularity of intervention resources incorporated and costed, including auxiliary implementation resources (see expert guideline for resource types [32]). Refrain from translating fixed costs into per-participant rates to capture interaction with implementation level. 5. Ensure that the efficacy metric (i.e., RR or RaR) and fall type match the falls incidence metric (falls risk or rate) and type. Refrain from making assumptions on long-term efficacy duration without adequate evidence [32]. 6. Where evidence is available, maximise the range of health effects of interventions modelled beyond falls prevention. These effects include intervention benefits on mortality and comorbidity reduction and intervention side-effects.

Evaluation methods
1. Assess and report the model's structural, internal and external validities. Reduce the structural uncertainty prospectively by involving stakeholder and expert group in model development and retrospectively by evaluating scenarios associated with key structural assumptions. 2. Clearly state whether parameter variation represents DSA or scenario analysis. PSA should be conducted to assess the joint parameter uncertainty.

Evaluation outcomes to formulate commissioning recommendations
1. Report per-unit (e.g., ICERs) and aggregate (e.g., total incremental net monetary benefit) outcomes separately [32]. Use aggregate outcomes to compare costeffective interventions (or combinations of interventions) of different target population sizes (normative reaches), and to evaluate implementation strategies (altering implementation reaches). 2. Evaluate the intervention impact on social inequities of health and use evaluative frameworks -such as distributional cost-effectiveness analysis (DCEA) [96] -that can incorporate the strength of decisionmaker's inequity aversion when comparing alternative intervention strategies with differing impacts on total health gain and social inequities of health.

Commissioning recommendations
Our commissioning recommendations for the general older population over the lifetime horizon are: 1. Decision-makers should examine the transferability and feasible reach of the following seven interventions in local settings within their budget and capacity constraints: (i) combined multifactorial and environmental intervention for age 65+; (ii) general physical activity promotion for women aged 65+; (iii) hip protectors for women aged 80+ and men aged 85+; (iv) home or peer-led group exercise for age 65+; (v) HAM for persons with or without MA falls history aged 65+; (vi) targeted vitamin D supplementation for age 65+; and (vii) Tai Chi for age 65 + . 2. Where significant fixed cost investments are required, auxiliary implementation strategies should be planned to achieve adequate cost-effectiveness and aggregate impact. 3. There is some evidence that exercise and HAM exacerbate existing health inequity across social subgroups. The decision-maker should consider supplementary strategies that prioritise intervention access for the local socially marginalised groups and/or increase their upstream health opportunities. The potential equity-efficiency trade-off should be quantified. 4. Results for interventions (i), (ii) and (vi) are the most credible since they are produced by validated models; (ii) and (vi) are also from individual-level simulations that incorporated age-related progression in fracture risk. The decision-maker could also commission a de novo, validated model that addresses the methodological challenges and is suited to the local context.

Discussion
This systematic review identified 46 decision models of community-based falls prevention interventions, applied a checklist specifically designed for falls prevention economic evaluations, and synthesised the modelling methods for key features of falls epidemiology, falls prevention intervention and evaluation. It also formulated (i) 16 methodological recommendations for future model development and (ii) four commissioning recommendations around seven interventions found to be cost-effective in general population, lifetime models.
A key issue in the use of reviewed model outcomes for commissioning is the generalisability or transferability of the said outcomes to the local decision-making context. The commissioning recommendations in this review adopted the cost-effectiveness threshold recommended by NICE for England and Wales [49]; decision-makers in other national settings should follow the recommendations of their respective HTA guiding bodies, such as the Pharmaceutical Benefits Advisory Committee (PBAC) in Australia [98]. As noted in the first commissioning recommendation, commissioners should actively verify the transferability and the local feasible reach of the interventions being considered. This should involve active collaboration with local stakeholders in interpreting the model evidence and even adapting an existing model to maximise its local relevance [31]. Frameworks such as the Context and Implementation of Complex Interventions (CICI) can systematically examine the influence of local context (e.g., regulation/policy on age-based targeting [99]) and supply conditions (e.g., capacity constraints) on HTA outcomes and thereby assist the assessment of transferability [100,101]. The subgroup delineator of equity relevance would also be locally specific, thus affecting the generalisability of the health equity impacts of previously modelled interventions. Further methodological features influence the generalisability, such as the evidence sources for baseline falls risk and intervention efficacy. This strengthens the rationale for the review to conduct a thorough methodological appraisal of models before formulating commissioning recommendations.
For decision-makers in England and Wales, the commissioning recommendations can be compared to those made by the existing falls prevention clinical guideline, CG161 [4]. The guideline prioritises the proactive pathway involving falls risk screening followed by multifactorial intervention. This is supported by RCN [34] findings (which informed CG161) that multifactorial intervention for high-risk individuals dominates no intervention. By contrast, Eldridge [63] found proactive multifactorial intervention to generate an unfavourable cost-effectiveness profile, likely due to the low pathway uptake that increased the per-participant cost. Hence, the simplistic costing assumptions in RCN [34] may have overestimated the cost-effectiveness of proactive multifactorial intervention. Meanwhile, the consultation of local services in Eldridge [63] to understand uptake and costs makes their result more credible. Yet multifactorial intervention for all risk groups combined with environmental modification (costed using primary data) yielded positive results in Johansson [73]. A similar intersectoral intervention was found to be cost-effective over a five-year horizon in Beard [53]. Both models do not explore to what extent the positive results can be attributed to the multifactorial rather than the environmental component. This warrants further modelling work that incorporates both components and yet isolates their respective impacts. Until then, the decision-maker should commission the CG161recommended multifactorial intervention but supplement this with environmental modifications. Specifically, local stakeholders should be consulted to verify whether the intersectoral initiatives in Johansson [73] and Beard [53] can be replicated in their local context. In addition, auxiliary implementation strategies should be planned with stakeholders to avoid the unfavourable outcomes seen in Eldridge [63]; this would involve understanding the facilitators and barriers to implementation from older persons' and professionals' perspectives [102][103][104].
Interestingly, there were several positive cost-effectiveness outcomes for interventions that had not been recommended by CG161. First, CG161 does not recommend unsupervised brisk walking for women and untargeted group exercise; although the 2019 surveillance for CG161 update [105] recommends physical activity promotion in line with the 2019 UK Chief Medical Officers' physical activities guidelines [25]. By contrast, Nshimyumukiza [80] found general physical activity (including daily walking) for inactive older women to dominate no intervention. Second, CG161 does not recommend vitamin D supplementation even for those with vitamin D insufficiency or deficiency due to insufficient clinical evidence; by contrast, Zarca [94] found targeted vitamin D supplementation to be highly cost-effective relative to no intervention. Likewise, hip protectors and CBT were not recommended by CG161 but found to be cost-effective in Honkanen [70] and Tannenbaum [89], respectively. These divergences reflect the difference in the underlying approach to statistics and probability. For example, CG161 is primarily informed by RCT evidence that takes the frequentist approach of drawing random samples to test the likelihoods of alternative hypotheses representing the true (fixed) state of the world; while decision models take the Bayesian approach of estimating the expected state of the world based on prior beliefs and diverse types of data (p. 323) [26]. The latter arguably better reflects the type of uncertainty faced by decision-makers and should be prioritised in commissioning considerations over clinical evidence alone, provided that the models are methodologically robust, validated and assessed for the impact of parameter uncertainty on expected outcomes [106].
This imbues additional importance to thorough methodological appraisal of models -conducted in this review using two complementary approaches: checklist application and narrative synthesis. The falls-specific rather than generic checklist helped identify features unique to falls and falls prevention [32], including whether the study gave the definition of a fall -only 19 (41.3%) fully didand whether the intervention(s) was classified as single, multiple-component or multifactorial -only 15 (32.6%) did. However, the checklist -designed for both models and non-modelling evaluations -did not consider important modelling features such as baseline risk characterisation and model validation which are included in the HTA model quality checklist [45]. Moreover, modelling features typically involve methodological nuances that cannot be summarised in ordinal scores. It is also unclear whether the unweighted sum of item scores accurately captures the methodological quality of models given the study-specific combination of methodological caveats; although of the 12 general population, lifetime models used to inform commissioning, the three that had been most thoroughly validated also had the highest checklist scores [73,80,94]. This illustrates the importance of supplementing the checklist application with narrative synthesis. The latter was more comprehensive in this systematic review than those conducted by previous systematic reviews in this topic area [33].
The checklist application nevertheless identified the most prevalent reporting and methodological limitations across models. The most prevalent issue was the nonincorporation of all-cause care costs as the main analysis cost outcome, with fall-related costs being reported in sensitivity analysis [32]. Older persons typically occupy a position on a continuous spectrum of frailty rather than one of binary healthy vs. diseased states [107][108][109]. A disease or fall incidence would shift the position on the spectrum and thereby incur myriad care costs only indirectly associated with the initial event [10,110]; incorporating all-cause care costs helps capture these impacts as well as the wider benefits of interventions beyond falls prevention. It is also consistent with the aim of wider geriatric health policies such as person-centred integrated care that emphasise holistic outcomes [41,111,112]. Yet only four models incorporated all-cause costs [51,52,86,87]; one of them even perceived all-cause costing as a limitation, compelled by lack of condition delineators in the routine data used [51]. The four also did not separately report fall-related costs, introducing difficulties in determining whether the cost reduction can be attributed to falls prevention per se rather than to wider intervention benefits. A major barrier is the lack of data on all-cause care consequences of falls, with costing studies focusing on fall-specific costs [17,18]. Indeed, all four models that incorporated all-cause costs relied on primary data.
An alternative, more feasible approach is to incorporate comorbidity care costs (e.g., costs of added lifeyears and costs of dying) associated with background health status and life expectancy. The proximity of these costs to intervention effect makes their inclusion particularly important for geriatric populations [39]. The inclusion of health utilities to depict the transition in background health status also demands the inclusion of matching costs [26]. The higher cost of dying for younger age at death [113,114] -as incorporated in the four BODE3 models -would improve the cost-effectiveness of interventions preventing premature mortality (e.g., of those below the average life expectancy). Yet comorbidity care costs were included by only six models (see Table A6); data availability may again be the barrier. The six models also included all-cause background costs and did not subtract fall-related costs, meaning that the latter are double-counted. Moreover, all except Honkanen [70] stratified the background costs by age and sex alone, meaning that the costs are influenced only by falls affecting mortality and not by those affecting morbidity and functional status. In all, further research is warranted to incorporate comorbidity costs in falls prevention modelling. A potential approach is to estimate the association between falls and multivariate frailty index, which in turn would determine all-cause care costs and subsequent falls risk [97,107,115].
The next two prevalent issues were reporting aggregate outcomes and detailing the comparator scenario. The importance of comparing aggregate outcomes was discussed under section 4.4. In brief, models should assist decision-makers in estimating the aggregate, population-level impact of interventions as recommended by the NICE HTA guidelines (see points 5.12.3 to 5.12.7) [49]. This is also highlighted in the NICE guideline for local public health service commissioning which shows how intervention rankings change by the metric chosen, including incremental cost per QALY, intervention reach (i.e., aggregate impact) and inequality impact (p. [118][119][120][121][122] [116]. Regarding the comparator scenario, this should closely resemble current practice in the local setting [32,117]. To this end, it should be noted that current practice in most settings is perhaps not the total absence of falls prevention, compared to under-implementation of existing clinical guidelines [102]. Likewise, the most relevant intervention scenario is perhaps not the provision of new interventions, compared to upscaling of existing capacity and improving fidelity to recommended practice. With assistance from local stakeholders, future models should pay greater attention to the features of current practice in the decision-making setting and be more specific in the causal mechanisms being altered under intervention scenarios. As done in Eldridge [63], consultation of local services can assess current referral pathways and demand levels, and detail componentspecific strategies (e.g., health promotion campaign to increase screening uptake). The greater attention would also facilitate the assessment of the transferability of specific model outcomes to different decision-making settings.

Future research: systematic reviews and decision modelling
The results of this review offer research directions for both future systematic reviews and models. First, the methodological challenges associated with falls prevention modelling are generalisable to other geriatric syndromes including delirium, frailty and urinary incontinence [3] and other geriatric public health interventions [36,39,109]. Commissioners and modellers interested in these areas would benefit from a systematic review using similar review methodology, namely detailed methodological appraisal around epidemiology, intervention, evaluation methods and consideration of evaluation outcomes beyond cost-per-unit ratios. Secondly, there is an acute scarcity of models set in the developing country context [40]. This context is likely characterised by more pronounced capacity constraints which in turn requires modelling techniques that can account for them including discrete event simulation [95] and constrained optimisation [118].

Strengths and limitations
This systematic review is a comprehensive review of community-based falls prevention models. It includes 26 models unidentified by previous systematic reviews in this area [33]. It also provides a more detailed methodological appraisal than previous systematic reviews using both checklist application and narrative synthesis. The appraisal results arranged by topic areas should facilitate the conceptualisation and cross-validation of future models. Another strength is the consideration of a broad range of outcomes for commissioning recommendations: previous reviews focused primarily on cost-per-unit ratios [33].
This review nevertheless has several limitations. First, appraisal of previous models was limited to what the studies reported. This presented difficulties in certain areas (e.g., whether Markov model incorporated tunnel states); contacting the study authors for enquiry would have reduced ambiguity. Secondly, in several ROI analyses (e.g., [56,59,72]), it was unclear whether the analyses constituted full comparative economic evaluations or non-comparative service evaluations (i.e., a partial economic evaluation) [119]. Clearer description of the evaluation aim and detailed parameterisation of the comparator scenario would facilitate future distinctions. Thirdly, unlike previous reviews, this review excluded non-modelling evaluations which still offer useful information for commissioning despite their short time horizons [40]; however, their incommensurable methodological features relative to models would have overextended the boundary of appraisal. Fourthly, the review did not test for possible publication bias; there is hence a risk that favourable cost-effectiveness results are overrepresented. Finally, the commissioning recommendations were based solely on general population, lifetime models, even though non-general population and/or non-lifetime models still offer useful information to decision-makers. Models evaluating alternative time horizons found that longer horizons improved cost-effectiveness [60,86,87,89,92], meaning that current commissioning recommendations are over-prescriptive for decisionmakers with shorter horizons. Such decision-makers are invited to utilise the outcomes gathered in Table A11 in Supplementary Materials alongside the synthesised methodological features to reach appropriate commissioning decisions.

Conclusions
There is model-based evidence that combined multifactorial and environmental intervention, general physical activity promotion for women, and targeted vitamin D supplementation are cost-effective relative to no intervention. Narrative synthesis found significant heterogeneity in modelling methods across falls epidemiology, falls prevention intervention and evaluation. This systematic review provides comprehensive catalogues of modelling methods and evaluation results for community-based falls prevention which should inform model selection and development, and commissioning strategies.