Data
Health facility data
A national census of health facilities in 1998 carried out by the Malawi Ministry of Health and the Japanese International Cooperation Agency (MOH & JICA 1999) listed 719 facilities in operation. For most facilities, the data included the date of construction (589 out of the 719 (82%)), and for all facilities it included the GPS coordinates, the facility type, principle funder, and owner.
The dates of construction ranged from 1889 (when the first missionary hospital was built) to 1998. We restricted our analysis to the period from 1980 to 1998 to reduce recall bias in the mortality data (described below). Of the 589 facilities for which we have date of construction, 337 (57%) were built between 1980 and 1998. There was no apparent temporal trend in the number of facilities built per year; the number ranged from a minimum of 10 in 1987 to a maximum of 32 in 1998 (Fig. 1).
Of the 337 new facilities, 92% were one of two types: 135 (40%) are classified as “dispensary” and 175 (52%) are classified as “dispensary/maternity.” We restricted our analysis to these facilities. Dispensaries are permanent structures from which drugs are distributed. They provide outpatient care and may contain holding beds. Dispensary/maternities are similar but provide more extensive services to expectant mothers (antenatal, delivery, and postnatal care). The other facility types are district hospital, hospital, mental hospital, primary health center and urban health center. These are almost exclusively located in urban areas.
We assumed that those facilities missing the date of construction (130 out of 719), were built prior to 1980. If some of these facilities were in fact built between 1980 and 1998, then some births will be erroneously coded as being closer to a health facility than they actually were. This will bias the effect estimate towards the null if the true effect of being closer to a health facility is to reduce mortality or increase utilization.
Mortality and utilization data
The data on under-5 mortality and health care utilization came from the 2000 Malawi Demographic and Health Survey (MDHS), a nationally representative survey targeting all resident women aged 15–49 [20]. Variables collected include the date of birth and date of death (if applicable) for all children ever born to respondents (n = 40,221 children). The data also included GPS coordinates for centroids of MDHS enumeration areas, which we refer to as villages in the remainder of the paper (Fig. 2). Enumeration areas were based on the 1998 census, which identified 9213 in total. Rural enumerations areas have populations of between 800 and 1200 persons.
For births in the 5 years prior to the survey, the MDHS also contains information on the following utilization outcomes: (1) place of delivery, (2) receiving a check-up following delivery, (3) number of antenatal visits prior to delivery, and (4) receiving skilled assistance during delivery.
Migration has the potential to cause measurement error in the treatment variable since mothers’ residences in 2000 may not be in the same location as their residences in previous years. We restricted our analysis to rural births that occurred at the same location where the mother was living at the time of the survey, as reported in an MDHS question about length of time at one’s current residence.
Operationalization of treatments and outcomes
The primary outcome of interest was the hazard of death between birth and age five, estimated from retrospective birth histories in the MDHS that included the date of birth and date of death for each child. Children still alive and under age five at the time of the survey were right-censored (i.e. they have missing survival data between the age at which the survey occurred and age five). Additional file 1 includes tests showing that recall bias is not a concern in these data.
For the causal analysis, the treatment of interest is the reduction in distance to the nearest health facility caused by the construction of a new facility, conditional on distance to the nearest health facility prior to the construction of a new facility. In addition to models in which the linear reduction in distance is the treatment variable, in other models we use multiple treatment variables to reflect the intuition that the benefit of a new facility depends on both the distance from the village to the old facility and the distance from the village to the new facility.
For those models, we created four distance categories (< 2 km, 2-5 km, 5-10 km, and > 10 km to nearest facility), which correspond to six possible treatments, each representing a move from one distance category to a nearer distance category: (1) > 10 km to 5-10 km, (2) > 10 km to 2-5 km, (3) > 10 km to < 2 km, (4) 5-10 km to 2-5 km, (5) 5-10 km to < 2 km, and (6) 2-5 km to < 2 km. The reference category is no change in distance category.
For each village we calculated the distance to the nearest health facility in each year from 1979 to 1998. When a new facility was built, resulting in a change in distance to nearest facility, we assigned that village to one of the above distance change categories for all remaining years. We linked the change in distance category (including no change) for each village-year to each child-year in the mortality dataset. In villages where no new facility was built, all person-time is assigned to the ‘no change’ category. In villages where a new facility is built, all person-time before the facility is built is assigned to the ‘no change’ category. All person-time after the facility is built is assigned to the appropriate distance change category (e.g. > 10 km to 5-10 km). If a facility is built during a child’s life from age 0 to 5, the portion before the facility was built was assigned to the ‘no change’ category, and the portion after the facilty was built was assigned to the appropriate distance change category. We linked the change in distance category (including no change) for each village-year to each child-year in the mortality dataset. The dates of construction did not include the month of construction. Therefore, to avoid over-estimating exposure to new facilities, construction was assumed to have occurred on December 31.
For the secondary outcomes, we used the linear reduction in distance as our treatment variable. We do not use the multiple category distance reduction variable due to much smaller sample sizes.
Identification strategy and statistical analysis
An ideal study of the effects of new health facilities on under-5 mortality would randomly assign villages to receive a new health facility. By comparing the mortality rates before and after health facility construction in villages that did receive a facility to those that did not, the average treatment effect of a new facility could be easily calculated. In the current study, the location for new facilities may be endogenous to the under-5 mortality rate. If one were to simply carry out a cross-sectional comparison of areas with new health facilities to those without, it is unclear which direction the bias would take. For example, if facilities tend to be located in areas with higher disease burden, they may be positively associated with mortality, even if they have a beneficial effect. Conversely, if they tend to be built in wealthier areas, they may appear to be negatively associated with mortality, even if they have no effect.
We estimated the association between distance to nearest facility and mortality or utilization as a first step to investigating causality. Mortality was measured as survival time at the child-level, and many observations were right-censored (i.e. children were under five and still alive at the time of the survey). Therefore, we fit semi-parametric Cox proportional hazards models [21, 22]. Models 1–4 used linear distance to test for a non-linear relationship between distance and mortality. These include models with and without dummies for year (n = 18; to capture temporal trends in mortality unrelated to new facilities) and month (n = 11; to capture seasonality in mortality). Model 4 adds controls for child and mother characteristics. As a sensitivity analysis, we run the same models with the logarithm of distance rather than linear distance (see Additional file 1). For our secondary outcomes, which are binary and thus not right-censored, we used linear probability models.
To estimate the causal effect of changes in distance on mortality, we used stratified Cox models, with each stratum corresponding to one village. This controls for time-invariant characteristics of each village, and uses within village variation in distance and mortality to estimate effects. In some models, we again added year dummies to capture temporal trends; this is sometimes referred to as a two-way fixed effects model (two-way referring to time and space) [23]. We thus estimated the multiplicative change in the hazard ratio for mortality within these villages, before and after changes in distance. For our secondary outcomes, we take a similar approach, but using linear probability models rather than Cox models. We included fixed effects for each village and each year.
One key assumption, inherent in Cox models, is that the hazards are proportional. We tested this assumption in two ways (Additional file 1). First, we tested for non-zero slope in a generalized linear regression of the scaled Schoenfeld residuals on time [24]. Second, because that test can be “over-powered” – with many observations it may classify substantially insignificant changes in the hazard ratio as statistically significant -- we visually assessed plots of the scaled Schoenfeld residuals for the covariates that the test identified as violating the proportional hazards assumption [25].
P-values for distance reduction coefficients in the categorical distance models were adjusted for multiple testing using a false discovery rate procedure [26].
Ethical approval was obtained from Simmons University Institutional Review Board.