Predicting the demand of physician workforce: an international model based on "crowd behaviors"

Background Appropriateness of physician workforce greatly influences the quality of healthcare. When facing the crisis of physician shortages, the correction of manpower always takes an extended time period, and both the public and health personnel suffer. To calculate an appropriate number of Physician Density (PD) for a specific country, this study was designed to create a PD prediction model, based on health-related data from many countries. Methods Twelve factors that could possibly impact physicians' demand were chosen, and data of these factors from 130 countries (by reviewing 195) were extracted. Multiple stepwise-linear regression was used to derive the PD prediction model, and a split-sample cross-validation procedure was performed to evaluate the generalizability of the results. Results Using data from 130 countries, with the consideration of the correlation between variables, and preventing multi-collinearity, seven out of the 12 predictor variables were selected for entry into the stepwise regression procedure. The final model was: PD = (5.014 - 0.128 × proportion under age 15 years + 0.034 × life expectancy)2, with R2 of 80.4%. Using the prediction equation, 70 countries had PDs with "negative discrepancy", while 58 had PDs with "positive discrepancy". Conclusion This study provided a regression-based PD model to calculate a "norm" number of PD for a specific country. A large PD discrepancy in a country indicates the needs to examine physician's workloads and their well-being, the effectiveness/efficiency of medical care, the promotion of population health and the team resource management.


Background
Physicians are the key personnel who make medical decisions and deliver medical treatments to patients. The adequacy of a country's physician workforce greatly influences the quality of healthcare. The literature indicated growth in health worker density significantly reduced the burden of disease, especially the burden associated with communicable diseases [1]. On the contrary, physician shortages translated into inadequate care [2,3] and greater costs for the treatment of disease [4,5]. However, physician size has been reported not always positively related with healthcare quality.
Physicians may induce demands, and physician surpluses may drive unnecessary utilization of healthcare [6]. As the rapid progression of globalization, physician migration across country borders has become more intense than ever [7]. During the past years, there have been 384 citations with the tag of "physician supply and/or demand" [8]. Appropriately matching physician supply and demand is now a critical worldwide concern.
Estimating the number of physicians a country requires is a complex task given the many contributing factors that have impacts on physicians' productivity, people's expectation of healthcare quality and the utilization of healthcare resources [9,10]. These factors are theoretically divided into four domains: population, physician, healthcare system, and economics. Within the population domain, one needs to consider the age, birth rate, death rate, infant mortality, life expectancy, population growth rate, incidence and prevalence of diseases, health demands by age, literacy, and health expectations. Within the physician domain, it is necessary to take into account the practicing physician's age (as a measure of the number of physicians retiring), gender, specialty and subspecialty, number of work hours per week, and clinical competence. Within the healthcare system domain, there is a need to consider healthcare accessibility, number of hospital beds, availability of resources, structure of payment, availability of support personnel (i.e., nurses, midwives, and technicians) and overall treatment capacities. Final considerations need to be on the economics of a country, often expressed as GDP (gross domestic product), GNP (gross national product), GNPPC (GNP per person), or PPP (Purchasing Power Parity). The above factors may be correlated with the size of a country's physician workforce, but not necessarily have a causal relationship. As population size is known to be an essential factor that determines physician demand, physician density (PD, defined as the number of practicing physicians per 10,000 population) is often used to estimate physician needs.
Upon reviewing the literature, many factors have been reported to have significant relationships with physician density. Using a regression analysis of the World Bank data on 250 countries, physician density was reported to be influenced by GDP, female literacy, the percentage of female population aged over 60 years, and female life expectancy [10]. Similarly, a regression analysis for both LMICs (Low and Low-Middle Income Countries) and MHICs(middle and high income countries) showed that physician density was significantly associated with several health indicators of infant mortality rate, under 5 (year of age) mortality rate, maternal mortality rate, and life expectancy [1]. The number of hospital beds (per 1,000 inhabitants) had a positive impact on the growth rate of specialists [6], and the disability-adjusted life years (DALYs) had a significantly inverse relationship with the density of health workers [11]. As the above factors were observed to interact with one another, and with other hidden factors (e.g., economic factors underlying the child population factor), added by the difficulty in measurement (e.g., physician competency), the prediction of manpower for future needs became very difficult, especially in a rapidly changing health care environment. Currently, there has been no tool or formula that can accurately predict the optimal PD for a given country.
To predict reality, "the Wisdom of Crowd theory", suggested to aggregate information in "groups" rather than in "individual" [12]. In the book, entitled "The Wisdom of Crowds", Surowiecki argued that aggregating information in groups results in predicting reality better than by a single members of the groups. The opening anecdote in the book described "Francis Galton's surprise that the crowd at a country fair accurately guessed the weight of an ox; when the individual guesses were averaged; the average was closer to the true butchered weight of the ox than the estimates of individual crowd members." With the rapid growth of information technology, abundant data on health indices and population demography has accumulated across country boundaries. Using aggregated international data of "crowd behaviors", the aim of the present study was to develop a PD prediction model for evaluating the needs of physician manpower that closely reflects the reality. When calculating the discrepancy between the observed and the predicted PD, the model may be used to screen the appropriateness of physician manpower in a nation, and provide warnings to prevent from the damage to healthcare in its early stage.

Methods
Twelve variables that were readily accessible on the world wide web, and having a possible impact on physicians' demands, were chosen. The variables consisted of health indicators, population demography, health care system, and socioeconomic status. The indicators were under age 5 year mortality rate, adult mortality rate, life expectancy, fertility rate, literacy, population density, proportion under age 15 years, proportion over age 60 years, gross domestic product, gross national income, purchasing power parities, and expenditure on health.
Data on PD and 12 variables were extracted from the World Health Organization (WHO), United Nations

Statistical analyses
A six-step procedure was used to derive the international prediction model for PD, which consisted of: 1) reducing data by eliminating highly correlated variables, 2) selecting countries with complete data in the analysis, 3) dividing the countries randomly into two halves, 4) generating a prediction equation from the first half of the countries and using it to predict the observed PD for countries in the second half, 5) subsequent to the split-sample validation in step 4, the countries were combined into one dataset to derive an overall international PD model, and 6) the model was then used to predict PD and to calculate country-specific PD discrepancies.
Multiple stepwise-linear regression was used to derive the model that best predicted PD. A p-value of 0.15 was used for both the variable entry and variable removal criteria [13]. The sample squared multiple correlation (R 2 ) was used to quantify the strength of the relationship in terms of the percentage of data variation explained by the regression model. Adequacy of model assumptions were assessed by a normal probability plot and by a plot of standardized residuals versus predicted values.
For the purpose of analyses, the mean PD across the 3 years (2004, 2005, and 2006) was calculated to be the outcome variable as the Pearson correlation (r) among the years was high (r > 0.97). Similarly, country-specific predictors were calculated as the mean of sex-specific and year-specific data because the correlations were also high (r > 0.96). Literacy was excluded from all the analyses because information was available only on half of the countries. Among the 11 remaining predictors, several were observed to be highly correlated with each other: under age 5 years mortality rate, adult mortality rate, and life expectancy (r > 0.93), proportion under age 15 years and fertility rate (r > 0.94), gross domestic product and gross national income (r > 0.94). As a result, the following 4 predictors were excluded from all analyses to prevent multi-collinearity: under age 5 years mortality rate, adult mortality rate, fertility rate, and gross national income. The remaining 7 predictor variables were considered for entry into the stepwise regression procedure: population density, proportion under age 15 years, proportion over age 60 years, life expectancy, gross domestic product, expenditure on health, and purchasing power parities.
A split-sample cross-validation was performed to assess the generalizability of the results [14]. The process consisted of splitting the original sample into a training set and validation set using random sampling. A regression equation was derived in the training set and the R 2 between the observed and predicted response values was calculated. The regression coefficients from the training set were then used to calculate predicted values in the validation set. The cross-validation coefficient (R 2 *) between these predicted values and observed values in the validation set was calculated. The shrinkage coefficient was calculated as the difference between R 2 and R 2 *of the training and validation sets. The smaller the shrinkage coefficient, the more confidence one can have in the generalizability of the results. Shrinkage coefficient values less than 5% indicate a generalizable model [14]. Given a satisfactory shrinkage coefficient, the data were combined from both sets and a final regression equation was derived based upon the entire sample. The final model was then applied to all the countries to calculate country-specific PD discrepancies (predicted PD minus observed PD) and the predicted number of required physicians using 2009 country populations.
The countries were then stratified by area and economic status. Analysis of covariance (ANCOVA), adjusting for observed PD, was used to examine whether the country-specific PD discrepancies differed by continent, membership in the OECD (Organization for Economic Cooperation and Development), and by economic status (low income, middle income, high income). Least squares means (with corresponding 95% confidence intervals) were calculated. Differences among categories were tested for statistical significance using Scheffé's adjustment for multiple comparisons.

Results
Among the 195 countries, 130 that had complete data on PD for the years 2004-2006 were included for analyses. The 130 countries were randomly and equally split into the training set and the validation set. Descriptive statistics of the variables are shown in Table 1 and  Table 2. By ANOVA, the descriptive data in the training and validation sets were found not significantly different. As PD was positively-skewed and the plot of standardized residuals versus predicted values showed increasing error variance, a square root transformation was used to stabilize the regression variance of PD.
The stepwise regression procedure retained the same two variables in both the training and validation sets: proportion under age 15 years and life expectancy. The univariate relationships between PD and each retained predictor variable are illustrated in Figures 1a and 1b. The regression coefficients from the multivariate analyses are shown in Table 3. The R 2 s were virtually identical in both sets and none of the regression coefficients were statistically different between the two sets. The shrinkage coefficient was 1.5%, indicating a high level of model generalizability. Given a low level of shrinkage, the data were combined from both sets and a final regression equation was derived based upon the entire sample of 130 countries: PD = (5.014 -0.128 × proportion under age 15 years + 0.034 × life expectancy) 2 . The R 2 of 80.4% from the final 2-variable model was virtually identical to the R 2 of 80.3% from a full model consisting of all 7 variables. (Note: The 7 predictor variables were: population density, proportion under age 15 years, proportion over age 60 years, life expectancy, gross domestic product, expenditure on health, and purchasing power parities).
The present study used 2009 population data and the two model variables (the proportion under age 15 years and the life expectancy) to calculate a "predicted" (the norm) PD for a country. The "predicted" PD was then compared to the observed PD for each specific country, resulting in a calculated PD discrepancy. Table 4 ranks the countries from the highest to the lowest level of PD discrepancy. For a "negative discrepancy" (the observed PD less than the predicted PD), physician manpower can be considered as "under the norm", rather than "a deficit". In contrast, a "positive discrepancy" indicates the observed PD is greater than the predicted PD, and can be considered as "above the norm". There were 70 countries (70/130, 53.8%) with observed PDs that had "negative discrepancy", and 58 that had "positive discrepancy" (58/130, 44.6%). Figure 2 shows the relationship between PD discrepancy and observed PD in the 130 countries. It's interesting to note from Figure 2 that the breakpoint for "above the norm" in PD occurs at approximately 30 physicians per 10,000 population. The scatter-plot graph is divided into four quadrants by the vertical line of 30 PD and the horizontal line of "zero" discrepancy. Few countries with PD greater than 30 (per 10,000 population) were found to be "below the norm" in PD.
Statistically significant differences in PD discrepancies were observed when more "broad stroke" comparisons were made among the countries' continents, membership in the OECD, and economic levels ( Table 5). In general, countries grouped within continents which were more 'westernized' (Americas and Europe) had a greater mean "deficit" of physicians (-4.3 and -6.3 physicians per 10,000) than other continents. This result was congruent with being a member of the OECD and being considered a high income country.

Discussion
Physician density itself is not the only factor determining health outcomes. Evaluating physician manpower for appropriateness is a complex task that should simultaneously consider many influencing factors on population, physician productivity, healthcare system, and the economics. This study developed an international PD regression model that applied "Crowd Theory", which incorporated many factors affecting observed PDs in 130 countries. The "predicted PDs", derived from the regression model was used as the "norm" for the PD that countries commonly used in maintaining their healthcare. The final regression model retained two variables (proportion under age 15 years and life expectancy), which accounted for 80.4% of the total variance of PD. In predicting PD, "proportion under age 15 years" had an inverse relationship, while "life expectancy" had a positive relationship. However, correlational relationships do not imply that one factor causes the other. Children under age 15 years actually tend to utilize more medical services than the average age, thus it was assumed that the larger the child population a country had, the more physicians were needed. The inverse relationship may have been due to a hidden third variable, such as a country's economics. Kwame found quantitative evidence on the relationship between certain Notes. Data are partitioned into two subsets in a cross-validation analysis. The "training set" is the subset used for analysis, and the "validation set" is used for validating the analysis  [15]. An inverse relationship was also reported between birth rates and the economy that was known positively correlated with physician density [16,17]. "Life expectancy", referring not only the length but also the quality of life, has been recognized as a standard measure in the world for measuring population health [18]. Therefore, it is of no surprise to have "life expectancy" retained in the regression model. The relationship between physician density and healthcare quality has been a long time research focus. The shortage of physician manpower will increase physician workloads and hamper patient safety. The criterion for determining appropriate physician number is the level that physicians can provide acute healthcare and guarantee patient safety in a hospital setting. For example, in pediatrics, patient care with safety consideration is the care delivered to all children who visit, stay or are born in a hospital, day and night, attended by health care providers under reasonable workloads. Besides patient safety, when added to the requirements of timeliness, effectiveness, efficiency, equitableness and patient-centered care [19], physicians' workloads often increase and shortage of physician manpower emerges. It has been reported that heavy workloads and stress significantly impacts on patient care quality, physician performance, absenteeism, turnover and organizational performance [20]. Adequate physician manpower is one of the strategies to prevent physicians from burnout.
Early detection of physician shortage has been a great challenge. Soon after 2000 when physician supply was considered to exceed demand [21], physician shortages emerged in the United States and Canada [4]. Similar reports of physician shortages have also been reported in other developed countries, such as Japan [22,23], Australia and Singapore [24]. These countries have increased student enrollment and established new medical schools to overcome the deficit. Unfortunately, the correction of physician shortage within a country has been taking an extended period of time, e.g., nearly 25 years in the United States.
The regression model in this study is derived from data of 130 countries around the world that reflects the    Table 4 The predicted and observed physician density (PD), continent, economic status, and analysis set in 130 countries, rank-ordered by the predicted-observed PD discrepancy (Continued)   PD most in a country. Thus, the predicted PD would be better used as a warning sign rather than an absolute number suggested for correction. A discrepancy between the predicted and the observed PD in a country indicates the physician manpower is either in surplus or in deficit, deviated from the norm of crowd behavior. A large negative discrepancy will highlight the needs to survey their physician's workloads and their well-being, and to improve quality of medical performance as well. There seems to be a potential to increase physician number to improve health outcome, especially when physician shortages became a global concern. As early as 1986, Perrin reported simply increasing physician supply may not have much effect on healthcare quality [6]. To improve health status, benefits will also come from the focus on improving adherence to the standards of best medicine and from preventive efforts to diminish personal risk factors of disease (smoking, diet, and exercise). More focused efforts should also be put to improve physicians' competency, the services that they provide, and the team resource management, rather than just increasing physician supply [6,25]. The above factors, being difficultly translated into "data" for analyses, may explain why some countries with negative discrepancy of PD had good health outcome.

Conclusion
An appropriate size of physician workforce is vital to maintain a nation in good health. When facing the crisis of physician shortage, the correction of manpower always takes an extended time period, and thus both the public and health personnel suffer. The regression PD model which provides information on how the observed PD deviates from the "norm" can be used to screen the appropriateness of physician manpower in a nation. To prevent damage to healthcare system when discrepancy appears between the observed and the predicted PD, we should examine not only the status of physician manpower, but also the physicians' workloads, the quality of medical performance, the physicians' well-being, the effectiveness of health promotional program and the team resource management. TC established the research design, carried out the data collection, participated in data interpretation and composed the manuscript. ME did data analyses, revised the manuscript, and gave final approval. DF Figure 2 Relationship between discrepancy between the predicted and the observed PD (y axis) and observed physician density (x axis). The breakpoint for "above the norm" in physician density occurs at approximately 30 per 10,000. Table 5 Comparison of physician density (per 10,000) discrepancies by continent, OECD status, and economics from analysis of covariance