### Statistical analyses

This paper examines how the non-pecuniary location-specific factors associated with GP supply are distributed across geographical space by estimating reduced-form associations of location attributes. Australian GP supply data at the level of the postal area is regressed on a set of area characteristics. The geographic nature of the data is taken into account by calculating the spatial correlation of all variables. This information is used to estimate two area-level regressions assessing the geographic level on which these variables are most strongly related to GP supply.

Due to the small geographic unit of analysis, several neighbourhoods exist without a GP; approximately 30 % of postal areas in Australia have no GP. This high frequency of zeros and the absence of negative GP supply result in data that do not have the properties of a standard normal distribution. Rather than an ordinary least squares regression, the model used accounts for the large number of zero outcomes in order to produce consistent and asymptotically normal estimates [17]. Maximum likelihood estimation is employed using the Tobit specification which allows for censoring at zero:

$$ {S}_i={\alpha}_0+{x}_i\beta +{u}_i,\kern0.5em \mathrm{where}\kern0.5em {u}_i\Big|{x}_i\sim Normal\left(0,{\sigma}^2\right) $$

(1)

where: *i* = postal area

*S*
^{*}_{
i
}
= a latent variable underlying supply of GPs in the *i*^{th} postal area

$$ \left\{\begin{array}{c}\hfill {S}_i={S}_i^{*}\kern0.24em \mathrm{if}\ {S}_i^{*}\ge 0\ \hfill \\ {}\hfill {S}_i=0\kern0.36em \mathrm{if}\kern0.37em {S}_i^{*} < 0\hfill \end{array}\right. $$

*S*_{
i
} = supply of GPs in the *i*^{th} postal area

*x*_{
i
} = value of vector of independent variables in postal area of interest *i*

*α*_{
0
}*, β* and *σ*^{2} = parameters to be estimated

Given that postal areas are an arbitrary delineation of space, a problem akin to that described in the geography literature as the modifiable areal unit problem occurs [18]. To address this, we permit GP supply in a given postal area to be related to factors not only in their postal area, but also neighbouring postal areas. Accounting for the spill-over effects between neighbourhoods reduces measurement bias in the model and accounts for the fact that GPs’ location choices could be influenced by variables at a more aggregated level than the neighbourhood where they are located. Spill-over effects between neighbouring areas have largely been ignored in the GP supply literature. Existing models implicitly assume that the proximity of the observations is not important. Taking into consideration spatial dependence, a spatially weighted model accounting for spatial heterogeneity is constructed by including variables to capture the characteristics of nearby neighbourhoods (i.e. spatially lagged variables).

Spatial heterogeneity is a concern when using spatially referenced data, therefore it is expected that some of the independent variables in this model will be spatially correlated.

The Moran’s I statistic (a measure of spatial correlation) is used to calculate the degree of spatial correlation in the independent variables. Moran’s I is expressed as:

$$ I=\frac{n}{{\displaystyle {\sum}_i{\displaystyle {\sum}_{j\ne i}{w}_{ij}}}}\frac{{\displaystyle {\sum}_i{\displaystyle {\sum}_{j\ne i}{w}_{ij}}}\left({x}_i-\overline{x}\right)\left({x}_j-\overline{x}\right)}{{\displaystyle {\sum}_i{\left({x}_i-\overline{x}\right)}^2}} $$

(2)

$$ \mathrm{where}:{w}_{ij}\kern0.5em =\kern0.5em \left\{\begin{array}{c}\hfill 1\kern0.5em \mathrm{if}\kern0.5em j\kern0.5em \mathrm{is}\kern0.5em \mathrm{a}\kern0.5em \mathrm{neighbour}\kern0.5em \mathrm{t}\mathrm{o}\kern0.5em i\hfill \\ {}\hfill o\kern0.5em if\kern0.5em j\kern0.5em \mathrm{is}\ \mathrm{not}\kern0.5em \mathrm{a}\kern0.5em \mathrm{neighbour}\kern0.5em \mathrm{t}\mathrm{o}\kern0.5em i\hfill \end{array}\right\}\kern0.5em i=1, \dots,\ n;\ j = 1, \dots, n $$

\( \overline{x} \) = mean of independent variables of all postal areas

*n* = number of postal areas

The Moran’s I statistic ranges from -1 to 1, with a positive value indicating that positive correlation (i.e. spatial clustering) is present. A negative value indicates that dispersion (or a competitive force) is present. Kelejian and Pruchal [19] demonstrate that Moran’s I statistic converges in distribution to a standard normal under the assumptions of a Tobit model. Therefore the Moran’s I can be used to test for spatial dependence in Tobit models. The degree of spatial correlation can be used to predict the degree to which variables and their spatial lags are correlated across the geographic plane of the data. For variables where spatial correlation is strong (i.e. Moran’s I larger than 0.7), the mean regionally weighted value (i.e. the mean of the value for the postal area and the mean of its neighbours) is used in an extended version of Eq. 1.

Region level variables are calculated using the following formula:

$$ {\tilde{x}}_{ri}=\frac{\left({x}_i+{\overline{x}}_i\right)}{2} $$

(3)

where: \( {\tilde{x}}_{ri} \) = vector of variables measured at the region level

*x*_{
i
} = vector of variables in postal area *i*

\( {\overline{x}}_i\kern0.5em =\kern0.5em \frac{1}{n_i}\kern0.5em {\displaystyle \sum_j{w}_{ij}{x}_j} \), with *w*_{
ij
} = 0 if *i* = *j*

where: *n*_{
i
} = number of neighbours of *i*

The relationship between GP supply and area characteristics is estimated using two model specifications. The first is a Tobit model, which does not take account of the spatial correlation of the area (Eq. 1). The second is a spatially weighted Tobit model (Eq. 4), which takes account of both local area and neighbouring area values^{Footnote 1}:

$$ {S}_i={\alpha}_0+{x}_i\beta +{\overline{x}}_i\delta +{\tilde{x}}_{ri}\gamma +{u}_i,\kern0.5em \mathrm{where}\kern0.5em {u}_i\sim Normal\left(0,{\sigma}^2\right) $$

(4)

### Data

GP supply for 2008 is measured as the number of GPs per 1,000 persons. GP supply is calculated for each postal area using the main place of work for GPs in Australia. Data was purchased from Australasian Medical Directory maintained by the Australasian Medical Publishing Company [20]. Australian Bureau of Statistics (ABS) Census data for 2006 is use to capture the non-pecuniary area attributes [21]. Postal area is the lowest level of geographical aggregation in the data. Postal areas are approximately equivalent to a suburb or neighbourhood in urban areas and a region in rural areas. Postal areas in Australia have an average of approximately 8,700 residents but the numbers range widely from 56 - 85,333 residents [21].

Mean taxable income for each postal area was sourced from the Australian Tax Office [22] it is compiled using the aggregated individual income for postcodes. Labour force participation rates (percentage of the population who are employed or unemployed) were sourced from the ABS Census [21]. These are used to capture the socioeconomic status of an area. Socioeconomic status has long been found to be associated with health status [23]. For example, in the UK, economically deprived areas have been found to be relatively underserved [11].

Rural and remote locations are considered to be underserved in Australia. As a result, rurality has become a central focus of health care policies and political debate. Rurality is measured by the Accessibility/Remoteness Index of Australia (ARIA), which ranges from 0 being an urban centre to 15 being very inaccessible and remote [21].

Population health status is often used as a proxy for health care need. Areas with greater health care need are expected to have higher demand for services. The only direct measure of health status available in the Census is mortality and this is disaggregated to the Statistical Local Area (SLA) level only [21]. The spatial analysis will account for the fact that SLAs are slightly larger than postal areas. Health variables are often considered to be endogenous variables when measuring GP supply [12]. Nonetheless, mortality is used to capture population health and proxy health care need in this study. It is used as a time-lagged variable (i.e. GP supply is measured in 2008 and mortality rate in 2006) to reduce the likelihood of reverse causality.

Demographic factors such as age and gender are associated with consultation rates. Data from Australia show that females account for 57 % of all GP encounters, whereas females represent 50.4 % of the total population [24]. In 2013 patients aged over 65 years account for approximately 33 % of all encounters and represented only 14.4 % of the total population [24]. GP consultation rates show a U-shaped distribution by age, with children and the elderly seeking more frequent consultations than individuals in the middle of the age distribution [25, 26].

Another demographic variable included is the proportion of the population that is indigenous (i.e. from Aboriginal or Torres Strait Islander descent) [21]. In Australia, indigenous populations have lower health status and life expectancy than the non-indigenous population. Given the health discrepancy between the indigenous population and the general population, the government has created indigenous health incentives programs. These programs aim to attract GPs to locate in areas with a high proportion of indigenous population [27].

The attractiveness of an area, in terms of amenities, is also expected to influence GP supply. Areas with more amenities, such as private schools, are expected to have more GPs per capita. The presence of hospitals will be positively related to GP supply if GPs consider them to be amenities or complements, and negatively related to GP supply if hospitals are substitutes or a source of competition. Benham et al. [4] suggest that physicians have a preference to be near hospitals, perhaps due to referral networks. However, evidence of this is not given. Including a hospital variable provides an opportunity to investigate this relationship. Local amenities are measured as the number of hospitals and private schools in the postal area which are sourced from publically available directories [28, 29].