Appendix A
Statistical Methods for predictive models
The aim of this analysis was to evaluate the functional form of the effect of lead time and to test its relationship with empaneled providers, while respecting the hierarchical structure of the data, as patients were nested inside clinics, and came from different boroughs in New York City. This was achieved by fitting a set of hierarchical generalized linear models, and generalized additive models and comparing their out-of-sample predictive accuracy (OSSPA) with five-fold cross validation. Two measures of OSSPA, classification rate and GINI coefficients were used. GINI coefficients are a signal-theoretic measure that captures how well a model can discriminate between two options, where chance performance gives a coefficient of 0 and perfect discrimination gives a coefficient of 1, it is directly linked to AUROC so that GINI = 2*AUROC – 1. Analyses were conducted in R (v 3.5.2.). Generalized linear models were fitted using the lme4 (v 1.1–20) package, generalized additive models were fitted with mgcv (v 1.8–26). Before analyzing the data, patients who lived outside of NYC were excluded, as did patients from Staten Island, as there were too few of them to get reliable coefficient estimates.
The eight fitted models were as follows:
Logistic Models.
M0: Predicting no show status from global intercept only.
M1: Predicting no show status for random intercepts for healthcare facility and random intercepts for borough.
M2: Retains the random intercepts and adds linear fixed effects for lead time and empaneled provider.
M3: Retains the random intercepts and adds random effects for lead time and empaneled provider (varying both by healthcare facility and borough).
M1 provided a modest but reliable improvement in OSSPA compared to M0, inspection of the variance decomposition suggested that this improvement was almost entirely driven by variation in intercepts by healthcare facilities. M2 provided a large improvement relative to M1, but M3 had slightly lower OSSPA relative to M2. In other words, the linear models provided strong evidence that both lead time and empaneled provider predicted no-show, but no evidence that these effects differed between health facilities or boroughs, and the general weight of evidence suggested that the home borough of patients did not meaningfully predict no-show. For this reason, we focused on exploring non-linear effects based on health-care facilities (though we retained random intercepts for boroughs, to ensure comparability with the generalized linear models).
Generalized Additive Models.
M4: As M2 but rather than assuming a linear effect for lead time, the effect of lead time was estimated with a thin plate spline.
M5: As M4 but lead time was now allowed to interact with empaneled provider, testing whether the effect of lead time was influenced by provider type.
M6: As M4 but allowing lead time to vary by medical facility, with a shrinkage parameter that limits between-facility variation of the mathematical function. This is conceptually similar to a random effect in the glm framework, where variation between clusters are allowed, but are constrained by a general trend.
M7: As M6 but without a shrinkage parameter. Meaning that the mathematical function of lead time was independently fitted for each facility.
Though the raw data for these analyses cannot be shared, because of patient privacy concerns, the annotated r-code of these analyses are supplied below, to allow other public health researchers to run similar analyses, or extend this paradigm to other datasets and problems.
# #Rcode and results for predictive models and five-fold cross-validation are in the same typeface here. This should be noted if testing reproducibility.
### Loading packages.
library(caret).
library(glmnet).
library(pROC).
library(lme4).
library(mgcv).
### Session info.
sessionInfo().
R version 3.6.1 (2019-07-05).
Platform: x86_64-w64-mingw32/× 64 (64-bit).
Running under: Windows 10 × 64 (build 17,763).
Matrix products: default.
locale:
[1] LC_COLLATE = English_United Kingdom.1252.
[2] LC_CTYPE = English_United Kingdom.1252.
[3] LC_MONETARY = English_United Kingdom.1252.
[4] LC_NUMERIC=C.
[5] LC_TIME = English_United Kingdom.1252.
attached base packages:
[1] tools stats graphics grDevices utils datasets methods.
[8] base
other attached packages:
[1] lme4_1.1–21 pROC_1.15.3 glmnet_2.0–16 foreach_1.4.4.
[5] Matrix_1.2–17 caret_6.0–83 ggplot2_3.1.1 lattice_0.20–38.
[9] R.utils_2.8.0 R.oo_1.22.0 R.methodsS3_1.7.1 mgcv_1.8–26.
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 nloptr_1.2.1 pillar_1.3.1 compiler_3.6.1.
[5] gower_0.2.0 plyr_1.8.4 iterators_1.0.10 class_7.3–15.
[9] boot_1.3–20 rpart_4.1–15 ipred_0.9–8 lubridate_1.7.4.
[12] tibble_2.1.1 gtable_0.3.0 nlme_3.1–139 pkgconfig_2.0.2.
[16] rlang_0.3.4 prodlim_2018.04.18 stringr_1.4.0 withr_2.1.2.
[20] dplyr_0.8.0.1 generics_0.0.2 recipes_0.1.5 stats4_3.6.1.
[25] nnet_7.3–12 grid_3.6.1 tidyselect_0.2.5 data.table_1.12.2.
[29] glue_1.3.1 R6_2.4.0 survival_2.44–1.1 minqa_1.2.4.
[33] lava_1.6.5 reshape2_1.4.3 purrr_0.3.2 magrittr_1.5.
[37] ModelMetrics_1.2.2 splines_3.6.1 MASS_7.3–51.3 scales_1.0.0.
[41] codetools_0.2–16 assertthat_0.2.1 timeDate_3043.102 colorspace_1.4–1.
[45] stringi_1.4.3 lazyeval_0.2.2 munsell_0.5.0 crayon_1.3.4.
### Setting seed to ensure reproducibility.
set.seed(278).
### Creating the folds.
Folds = groupKFold(rownames(df), k = 5).
### Specifying the generalised linear models.
Model_0 = “NoShow ~ 1” #Null model, predicting no-shows from global intercept.
Model_1 = “NoShow ~ 1 + (1|Bourough) + (1|Facility)” #Hierarchical null model, allow intercept to vary by borough and facility.
Model_2 = “NoShow ~ 1 + Visit_with_empanel_prov + lead_time + (1|Bourough) + (1|Facility)” # Introduces empanelled provider and lead time as fixed effect predictors.
Model_3 = “NoShow ~ 1 + Visit_with_empanel_prov + lead_time + (Visit_with_empanel_prov + lead_time|Bourough) + (Visit_with_empanel_prov + lead_time|Facility)” # Allows empanelled provider and lead time to vary by facility and borough.
### Fitting the generalised linear models.
fit_list = NULL.
gini_list = NULL.
sensitivity_list = NULL.
specificity_list = NULL.
accuracy_list = NULL.
for (i in 1:5){.
### Specify the training and test subset for this fold.
Training = unlist(Folds[i], use.names = F).
Test = setdiff(rownames(df), Training).
### Fit the model on the training data.
fit = glm(Model_0, data = df[Training,], family = “binomial”) # Replace Model_0 with the desired model.
### Store the fit object for later inspection.
fit_list = append(fit_list, fit).
### Evaluating the OOS Performance of the model, and store the metrics.
OOS_Y = as.numeric(df[Test, “NoShow”]).
OOS_PRED = rep(round(mean(df[Training, “NoShow”])), length(df[Test, “NoShow”])).
OOS_roccurve = roc(OOS_Y ~ OOS_PRED).
OOS_gini = 2*auc(OOS_roccurve)-1.
gini_list = append(gini_list, OOS_gini).
OOS_sensitivity = mean(OOS_Y[OOS_Y==1] == OOS_PRED[OOS_Y==1]).
sensitivity_list = append(sensitivity_list, OOS_sensitivity).
OOS_specificity = mean(OOS_Y[OOS_Y==0] == OOS_PRED[OOS_Y==0]).
specificity_list = append(specificity_list, OOS_specificity).
OOS_accuracy = mean(OOS_Y == OOS_PRED).
accuracy_list = append(accuracy_list, OOS_accuracy).
}
# Specifying the generalised additive models.
Model_4 = “NoShow ~ Visit_with_empanel_prov + s(lead_time, bs=‘tp’) + s(Bourough, bs=‘re’) + s(Facility, bs=‘re’)” # Allows lead_time to have a fixed non-linear effect.
Model_5 = “NoShow ~ Visit_with_empanel_prov + s(lead_time, by=Visit_with_empanel_prov, bs=‘tp’) + s(Bourough, bs=‘re’) + s(Facility, bs=‘re’)” #Includes a fixed-effects interaction term between lead time and empanelled provider (as well as main effects).
Model_6 = “NoShow ~ Visit_with_empanel_prov + s(lead_time, bs=‘tp’) + s(Bourough, bs=‘re’) + s(lead_time, Facility, bs=‘fs’)” #Includes a non-linear random effect for lead_time, and a fixed effect for empanelled provider, as well as random intercepts for borough and facility.
Model_7 = “NoShow ~ Visit_with_empanel_prov + s(lead_time, by=Facility, bs=‘tp’) + s(Bourough, bs=‘re’) + s(Facility, bs=‘re’)” #Includes a non-linear random effect for lead_time, that is unconstrained by hyper parameters (i.e. there is no assumption that the effect of lead time is consistent in any way across facilities), and a fixed effect for empanelled provider, as well as random intercepts for borough and facility.
### Fitting the generalised additive models.
fit_list = NULL.
gini_list = NULL.
sensitivity_list = NULL.
specificity_list = NULL.
accuracy_list = NULL.
for (i in 1:5){.
### Specify the training and test subset for this fold.
Training = unlist(Folds[i], use.names = F).
Test = setdiff(rownames(df), Training).
### Fit the model on the training data.
fit = gam(Model_4, data = df[Training,], method = “REML”,family = “binomial”) #Add models.
### Store the fit object for later inspection.
fit_list = append(fit_list, fit).
### Evaluating the OOS Performance of the model, and store the metrics.
OOS_Y = as.numeric(df[Test, “NoShow”]).
new = df[Test, c(“Bourough”, “Facility”, “lead_time”, “Visit_with_empanel_prov”)].
OOS_PRED = predict(fit, type = ‘response’, newdata = new).
OOS_roccurve = roc(OOS_Y ~ OOS_PRED).
OOS_gini = 2*auc(OOS_roccurve)-1.
gini_list = append(gini_list, OOS_gini).
OOS_sensitivity = mean(OOS_Y[OOS_Y==1] == round(OOS_PRED[OOS_Y==1])).
sensitivity_list = append(sensitivity_list, OOS_sensitivity).
OOS_specificity = mean(OOS_Y[OOS_Y==0] == round(OOS_PRED[OOS_Y==0])).
specificity_list = append(specificity_list, OOS_specificity).
OOS_accuracy = mean(OOS_Y == round(OOS_PRED)).
accuracy_list = append(accuracy_list, OOS_accuracy).
}
Appendix B
Cost analysis
To build a complete cost profile, five variables were included in the model: (a) costs incurred and payments received per encounter, (b) no-show status per appointment, appointment length, (c) payer types, and (d) visit dates. Hourly wages paid to staff members in clinics were used to estimate the staff costs associated with each appointment.
To quantify the financial impact and present the potential cost savings of reducing no-shows in this FQHC setting, available cost data and patient information on the 63,842 individuals were analyzed to retrieve precise estimates. The cost profile was set up based on an itemization of particular cost factors combined with assumptions on how costs should be derived. A decision tree model (see Fig. 4), which allows for reconstruction of scenarios based on payer category (Medicaid, Medicare, Medicaid Managed Care (MMC), Private Insurance and Uninsured), appointment type (high (H), medium (M)or low (L)cost) and no-show status (Show or No-show), was developed. The decision tree model incorporates probabilities and costs for various scenarios.
To quantify the financial impact of no-shows on the healthcare clinic and present the impact on total revenues after decreasing no-show rates, reductions of no-shows in the given setting were simulated, and the revenues for attended encounters (shows) (Rae) and no-shows(Rns) were calculated based on the decision tree model. The following equations were used respectively, including one additional variable (Cns) to account for the costs incurred in the case of a no-show:
Rae = (PP + NP–C)*X.
Rns = (PP + NP – (C + Cns))*X.
Rae = Revenue for attended encounters, Rns = Revenue for no-show, PP = Prospective Payment, NP = Net Payment, C = staff costs, X = number of appointments.
Figure: Decision tree model on costs associated with no-shows across various appointment and payer categories. Categories are mutually exclusive total numbers sum to 101 because of rounding errors.
The five most common payment types for the participating health centers are Medicaid, Medicare, Medicaid Managed Care (MMC), Private Insurance and Uninsured/Self-Pay. MMC represents various insurance arrangements, combined here for simplicity. Appointment types H, M and L (high/medium/low) occur with probabilities of 20, 53 and 27% in each payer type. Show and no-show probabilities are 56.93 and 43.07% for each appointment type.
In order to account for inaccuracies in the estimation of cost values, probabilistic and deterministic sensitivity analyses were conducted. All analyses and simulations were run in R.
Cost Analysis.
FQHCs operate under a unique reimbursement scheme, allowing them to provide services to patients regardless of their ability to pay. Depending on the payer category (Medicaid, Medicare, Medicaid Managed Care (MMC), Private Insurance), the clinic receives different payments, including a flat payment rate per visit. Uninsured patients without the ability to pay are covered through federal grants, received annually by the health center. The costs per appointment differ depending on the types of appointments, the length of the appointment, as well as the specialty of the physician. Our findings suggest that the marginal cost of a no-show at an FQHC, considering all payer and appointment types, is $29.82, and the average revenue per patient is $89.66.