Impact of results-based financing on effective obstetric care coverage: evidence from a quasi-experimental study in Malawi

Background Results-based financing (RBF) describes health system approaches addressing both service quality and use. Effective coverage is a metric measuring progress towards universal health coverage (UHC). Although considered a means towards achieving UHC in settings with weak health financing modalities, the impact of RBF on effective coverage has not been explicitly studied. Methods Malawi introduced the Results-Based Financing For Maternal and Neonatal Health (RBF4MNH) Initiative in 2013 to improve quality of maternal and newborn health services at emergency obstetric care facilities. Using a quasi-experimental design, we examined the impact of the RBF4MNH on both crude and effective coverage of pregnant women across four districts during the two years following implementation. Results There was no effect on crude coverage. With a larger proportion of women in intervention areas receiving more effective care over time, the overall net increase in effective coverage was 7.1%-points (p = 0.07). The strongest impact on effective coverage (31.0%-point increase, p = 0.02) occurred only at lower cut-off level (60% of maximum score) of obstetric care effectiveness. Design-specific and wider health system factors likely limited the program’s potential to produce stronger effects. Conclusion The RBF4MNH improved effective coverage of pregnant women and seems to be a promising reform approach towards reaching UHC. Given the short study period, the full potential of the current RBF scheme has likely not yet been reached. Electronic supplementary material The online version of this article (10.1186/s12913-018-3589-5) contains supplementary material, which is available to authorized users.


Generation of composite score
The steps involved in deriving the composite score used to estimate effective coverage were adapted from the relevant literature [1][2][3] and are outlined here.

Step 1: Indicator selection
We ensured the content of both input and process indicators included in the composite to cover all relevant aspects related to the provision of obstetric care by consulting existing literature. We included four peer-reviewed articles published between 2013 and 2015 on the evaluation of quality of obstetric care in low-income settings [4][5][6][7]. We mapped the Indicators from each of these articles and grouped them into twelve quality of care categories (see Table 1 below). We matched each indicator with available data from the case observations and facility inventories. Only a few of the initially mapped indicators were not obtainable in the available datasets and thus excluded. The complete indicator map is shown in more detail [see Additional file 2].

Step 2: Indicator definition
For the majority of indicators identified the measurement definition was explicit and fully aligned with the variable definitions in our datasets. To match the definition of some of the process indicators, we had to combine several process variables and indicated this in the legend to Table 1 below. However, many of the identified input indicators related to the availability of supply or drug items did not provide sufficiently specific definitions as to the minimum number of units a facility should provide or have in stock. We therefore created two definitions for each of these indicators: one in which only one unit of the respective item was counted (relaxed definition), the other with an arbitrarily set minimum of ten units (stricter definition). All input items for which we generated a relaxed and a strict definition are highlighted in italics in Table 1 below. The effect of each definition type was compared in the sensitivity analysis (Step 6).

Step 3: Indicator measurements
All indicators were measured as binary variables. In aligning process (i.e. information from multiple case observations per facility) and input indicators (i.e. information from facility inventory), we first treated each case as an individual observation and matched it with the respective inventory information. Once the weighting, rescaling, and aggregation steps were completed (see below), individual case observations were then averaged across each facility and time point to attain a single facility-specific score.

Step 4: Weighting
To account for the relative contribution of each identified process or input indicator to the overall effectiveness of obstetric care, we introduced different weights. In defining weights, we considered each of the four published articles identified in Step 1 as an individual expert opinion. Depending on how many of the four "experts" suggested the same indicator, we identified weights ranging from 1 to 4. See Table 1 below for the resulting indicator weights.
Additionally, we also kept a weighting approach based on equal weights, which was later used as an alternative in the sensitivity analysis. The effect of each weighting approach was compared in the sensitivity analysis (Step 6).

Step 5: Aggregation and rescaling
We first aggregated by summing the weighted indicators for each of the twelve quality of care categories separately, resulting in twelve category scores for each observed case. As categories varied in the total number of indicators and to ensure each category contributed equally to the final composite score, we rescaled each category score to take a value between 0 and 1. To derive the final composite score, we further aggregated the category scores by adding them together and dividing them by the total number of categories.

Step 6: Sensitivity analysis
Each of the decisions taken (Steps 1-5) may have biased our composite to some degree compared to the alternative decisions forgone. We consider our composite to explicitly serve the computation of our effect measure (effective coverage) within the scope of this study, but it should not be considered a universal measure of obstetric care effectiveness applicable beyond this limited scope. Therefore, we limited our uncertainty analysis to the comparison of our "base" score (i.e. strict definition, literature derived weights) with three alternative scores (replacing strict with relaxed definition, replacing literature with equal weights, replacing both). We assessed the stability of this base score with the three alternative scenarios at baseline (i.e. prior to expected changes) using both Pearson and Spearman rank correlation, defining strong correlation as coefficients greater than 0.70. All comparison combinations resulted in coefficients of above 0.95 and we concluded that using the base over alternative scores is unlikely to bias the resulting effective coverage measures.
Measured obstetric care quality based on composite score Table 2 below summarizes the measured obstetric care quality scores for the facilities in our sample.
Overall, averaged total scores (range 0-1) were similar at baseline for both intervention arms.