An equitably distributed primary healthcare workforce is key to an efficient healthcare system. Family Physicians or General Practitioners (GPs) form a vital component of this workforce. Inequities in the geographical distribution of GPs are associated with poorer health outcomes [1–3]. In Australia, where a large sparsely populated hinterland and remote communities create challenges for GP access  a small but growing literature is underscoring the importance of geographic access to GPs [5–7].
The quality of spatial GP data is integral to adequately examining geographic access to GPs. The aim of the analyses presented here is to explore the issue of spatial GP data quality by comparing various geographically explicit GP datasets in Australia with different conceptualizations of the workforce metric (headcounts and workload aware statistics). Further, in order to understand the effect rurality has on data quality we implement our analyses across different degrees of rurality. The following discussion outlines the relevant context to this analysis. We first describe the issues salient to spatial GP data quality. We then discuss geographical GP datasets in different jurisdictions followed by a short description of GP datasets in Australia. Finally, we discuss existing research on GP datasets in Australia and elsewhere.
Geographic GP datasets: what are we measuring?
Two aspects of data quality are salient to GP accessibility studies. First is the geographic resolution or scale. If the available GP data are aggregated to coarse scales, for example the state level, then locally relevant analyses cannot be performed. Second, is the conceptualization of the workforce metric. While it is common to use GP headcounts or mere presence of a GP as a metric of GP access, there is evidence that this may produce misleading results . In the Australian context, it is known that while the average GP work more hours per week with increasing rurality , there are also substantial numbers of GPs who provide short term locum services (henceforward called locum GPs) in rural Australia whose inclusion or exclusion from simple headcounts may skew workforce analyses . Further, many GPs work in more than one location, and if these locations are in different geographic areas they can be counted in one or both of these areas providing potentially misleading information. In addition female GPs in Australia are more likely to work part time . Recent research supports the use of alternative workload aware metrics such as the number of Full Time Equivalent (FTE) physicians. For example, in the United States the number of FTE GPs have been shown to be more strongly associated with health outcomes than GP headcounts . Full Time Equivalent and Full-time Workload Equivalent (FWE) are two workload aware workforce metrics commonly used in Australia. The FWE metric, unlike FTE, does not “cap” doctors providing more than a standard full-time level of services at an upper threshold, usually of 1. Thus a GP providing 20% more than a standard full time level of services will be 1.2 FWE but 1.0 FTE . However, a number of different methods of calculating FWE/FTE exist. Thus, it is important to have access to datasets that are at a high geographic resolution with pertinent GP workload information. Ideally, in order to achieve the most accurate understanding of GP workforce availability, data is needed at the individual practice address(es) level along with the total number of hours worked, patients seen and services rendereda. Such detail is rarely available.
Geographic GP and physician datasets in the USA, Canada and Europe
Many countries have multiple sources of GP or physician data, with varying degrees of overlap, strengths and weaknesses. In the United States, limited datasets on FTE physicians geocoded to postcodes can be obtained from US-Medicareb. Individual physician address information can be obtained from the American Medical Association (US-AMA)c. Physician Masterfile. These datasets have been used in multiple analyses of relationships with outcomes [1, 10–12]. Occasionally, surveys are also used to assess the geographical distribution of physicians . In Canada, the Canadian Institute for Health Information (CIHI) aggregates physician benefits information from provincial government into a comprehensive database called the National Physician Database. This database offers a wealth of information on physicians geocoded to postcode of main activity. The information can also be used to calculate FTEs. CIHI also maintains the Scotts Medical Database which can be used to obtain physician headcounts. As in the United States, these datasets have been used to study relationships with various outcomes [14, 15], and the geographical distribution of physicians [16, 17]. Note that while the US-AMA Masterfile and Scotts Medical Database are privately sourcedd, the US-Medicare data and National Physician Database are organized by public bodies. Given, the diversity of datasets, and the possibility of overlapping uses of these datasets, it is important that the degree of agreement or disagreement between them be understood. However, there has been limited effort in this direction either in the United States or Canada. High quality data on GP locations are available in the United Kingdom which have been used in a number of analyses [18, 19]. GP address data are available from different sources in Ireland  and have been used to study issues of geographic access.
Geographic GP and physician datasets in Australia in the context of Australia’s healthcare system
Similar to Canada and the United States, multiple sources of physician and GP data sources exist in Australia. However, unlike their North American counterparts data custodians in Australia operate a relatively restrictive data access regime and some data custodians do not release data at small geographies either to researchers or the public (see discussion). Also, unlike the CIHI in Canada, no Australian body functions as a centralized aggregator of physician data. These complications result in a greater multiplicity of datasets in the context of Australia’s health system.
The backbone of Australia’s healthcare system is Medicare. Medicare is tax-payer funded and offers universal insurance for private medical services. Almost all GP services in Australia are privately provided under a fee-for-service scheme with a rebate provided by Medicare at a level set by the Medicare Benefits Schedule (MBS) . GPs may charge at the level of this rebate with no payment at point of service (approximately 80% of services in 2012) or may charge at a higher level with the patient paying the gap. This information on services provided can be used to infer GP workload (see Methods). Updated information on GPs registered for the Medicare program (which is almost all GPs in Australia) are held by the body that administers Medicare, the Department of Human Services. The Department of Health and Ageing (DoHA) also holds this data, and in addition to the data on headcounts of GPs derives measures of full time equivalence. These data are not publicly released at small geographies . They are occasionally released for research [9, 22] and other reports  but were not made available for these analyses .
In the absence of small area data from the Medicare data custodians, GP workforce data can be a) obtained from GP workforce surveys, b) obtained at relatively coarse geographic scales from the data custodian, c) derived indirectly from datasets reflecting numbers of services provided by GPs that are released by the Medicare data custodians, and d) obtained from private sources that use both internet based and traditional data gathering tools to create mailing lists of GPs. In the Australian context, all of these data sources are salient to enumerating the GP workforce. These datasets are discussed in greater detail in the methods-data section.
Research on geographic GP and physician datasets
Different studies on the geographic distribution of GPs and related health workforce use different datasets. For example, while some studies use data from the Australian census [4, 24], others use data from surveys , or state or territory health workforce registries [5, 26]. While survey data may provide workload and other hard to obtain information, they may be less complete than registry data. In contrast, data from registries or established mailing lists are likely to be more comprehensive but lack workload information. Recently a number of studies of geographical access have made use of mailing list data [6, 27]. While some studies attempt to take GP workload into account [6, 25], other studies do not . A majority of these studies are localized to specific geographic areas making comparisons across datasets difficult.
Some researchers have attempted to describe GP data sources [28–31] in Australia. One Australian , and one American study  have attempted to quantify the quality of physician datasets. The American study compared US-AMA data from a single state with records from the state registry and found the US-AMA database to be almost 100% complete. The Australian study used expert local knowledge of all GPs in the Northern Tasmania DGP, to compile a master/authority/baseline list of 139 active GPs. They arrived at this number by starting with a larger list compiled from various datasets and then culled all inaccurate entries. The researchers then attached two quality scores, sensitivity and predictive value positive with each GP dataset. While this is a valid approach to ascertaining the accuracy of a dataset, it also requires names and addresses to be present in multiple databases, a difficult proposition in a restrictive data access environment. Moreover, researchers are often interested in the quality of a dataset insofar as it affects the outcome of their analyses.
Aims and objectives
Health researchers across jurisdictions are interested in investigating the relationship of GP access and availability to various health outcomes [33, 34]. While there are a number of approaches to quantifying GP availability, GP density in a geographical area is a commonly used metric [33, 34]. In Australia GP densities by geography have been used as a metric of GP demand and supply . A relevant research question in this context is whether the choice of one GP dataset over another affects the results of an analysis. If the same outcome were being studied, this would be equivalent to studying the level of agreement between the various datasets. The aim of the analysis presented in this paper thus, is to explore how the various GP datasets in Australia compare across different geographies. More specifically, we are interested in evaluating the correlation of GP headcounts and total FTE/FWE GPs at different geographic scales, and in observing how these correlations vary with rurality or remoteness. We also compare total headcounts and FTEs/FWEs from the various datasets across states and territories.
This is intended to be an exploratory analysis of GP datasets, and it is anticipated that the results of our analyses will assist health services researchers in Australia to make informed choices about GP datasets. These analyses can be easily extended to other jurisdictions that have multiple sources of physician data and to other data sets if and when they become available. While some of the conclusions of the study are clearly limited to the Australian context and particular data sets, the broad conclusions of the study relating to the relative interchangeability or otherwise of data sets from different public and private sources, and data sets using different measures of GP workforce, for analytic purposes may be of relevance in other jurisdictions. In addition, countries such as Canada, Australia and to a lesser extent the USA, which have a large rural hinterland, may face similar issues with geographic data from rural areas.