Drivers of hospitalization cost after craniotomy for tumor resection: creation and validation of a predictive model

Background The economic sustainability of all areas of medicine is under scrutiny. Limited data exist on the drivers of cost after a craniotomy for tumor resection (CTR). The objective of the present study was to develop and validate a predictive model of hospitalization cost after CTR. Methods We performed a retrospective study involving CTR patients who were registered in the Nationwide Inpatient Sample (NIS) database from 2005–2010. This cohort underwent 1:1 randomization to create derivation and validation subsamples. Regression techniques were used for the creation of a parsimonious predictive model. Results Of the 36,433 patients undergoing CTR, 14638 (40.2%) underwent craniotomies for primary malignant, 9574 (26.3%) for metastatic, and 11414 (31.3%) for benign tumors. The median hospitalization cost was $24,504 (Interquartile Range (IQR), $4,265-$44,743). Common drivers of cost identified in the multivariate analyses included: length of stay, number of procedures, hospital size and region, and patient income. The models were validated in independent cohorts and demonstrated final R2 very similar to the initial models. The predicted and observed values in the validation cohort demonstrated good correlation. Conclusions This national study identified significant drivers of hospitalization cost after CTR. The presented model can be utilized as an adjunct in the cost containment debate and the creation of data-driven policies. Electronic supplementary material The online version of this article (doi:10.1186/s12913-015-0742-2) contains supplementary material, which is available to authorized users.


Background
The recent seismic changes in US healthcare are driven by the push for economic sustainability of the system [1,2]. Several value-based initiatives aim to minimize cost in areas of increased spending and promote rationalization of resource allocation [1]. Neurosurgical procedures are associated with significant risks and high hospitalization costs. Craniotomy for tumor resection (CTR) is one of the most common such procedures, and will be part of the cost containment debate. The estimation of the hospitalization cost for each individual CTR patient, and the identification of modifiable drivers of cost could allow physicians to understand the economic aspects of CTR, and modify their practice accordingly.
Future attempts at cost containment could focus on these factors, rather than follow an arbitrary path.
Several studies have analyzed the cost-effectiveness of different treatment modalities for brain tumors [3][4][5][6][7][8][9][10]. Others have examined the cost or charges of the hospitalization after CTR [11,12]. The latter have limited generalization since they are referring to single institutions or regional experiences, demonstrating significant selection bias. There is a paucity of national data on the hospitalization cost of patients undergoing CTR, the drivers of this cost, and predictive models at the level of the individual patient.
The National Inpatient Sample (NIS) [13] is an all payer, hospital discharge database that represents approximately 20% of all inpatient admissions to nonfederal hospitals in the United States. It allows the unrestricted study of the patient population in question. Using this database, several socioeconomic variables, as well as patient and hospital level factors associated with cost variability after CTR were identified. Based on these data, a predictive model of cost after CTR was developed and validated in an independent cohort.

National Inpatient Sample (NIS) Database
All patients undergoing CTR, who were registered in the National Inpatient Sample (NIS) [13] Database (Healthcare Cost and Utilization Project, Agency for Healthcare Research and Quality, Rockville, MD) between 2005 and 2010, were included in the analysis. The NIS is an allpayer prospective hospital discharge database that represents approximately 20% of all inpatient admissions to nonfederal hospitals in the US. More information about the NIS is available at http://www.ahcpr.gov/data/hcup/ nisintro.htm. This database contains de-identified data (consents cannot be obtained), and has been deemed exempt from IRB approval.

Outcome variable
The primary outcome variable was the total hospitalization cost after CTR. Cost data were obtained by conversion of the hospital charges using the group-average cost-tocharge ratio for each hospital in the database. Group-average cost-to-charge ratio and hospital charges are available in the NIS database. All costs were adjusted to their 2010 dollar value using the national consumer price index.

Exposure variables
The association of the outcome with the pertinent exposure variables was examined in a multivariate analysis. Age was a continuous variable. Gender, race (African American, Hispanic, Asian, or other, with Caucasian being the reference value), insurance (private insurance, self pay, Medicaid, with Medicare being the reference value), and income (defined as the median income based on zip code; income was divided into quartiles, with the lowest quartile being the reference value) were categorical variables.
The hospital characteristics used in the analysis as categorical variables included hospital region (West, South, Midwest, with Northeast being the reference value), hospital location (urban teaching, urban non-teaching, with rural being the reference value), and hospital bed size (medium, large, with small being the reference value). More information of the definitions of the various categories of hospital characteristics can be found at http:// www.hcup-us.ahrq.gov/db/vars/nis_stratum/nisnote.jsp.

Statistical analysis
Continuous variables were presented with the mean and standard deviation or median and interquartile range, whereas categorical values were presented as percentages. Continuous variables were compared using t-tests or Mann-Whitney test, and categorical variables were compared using Chi-square tests.
Initial analysis of cost data revealed significant positive skewness and kurtosis and linear regression analysis using cost resulted in a heteroskedastic variance of errors. In order to achieve normality the data were transformed using the natural logarithm (ln) transformation. Other transformations attempted included square root, cube root, and inverse transformation. These were not eventually used because the ln transformation provided the best fit for the data. The ln transformation significantly improved the skewness and kurtosis of the distributions (skewness = 0.12, kurtosis = −0.049). Normality was also assessed using histograms and Q-Q plots. The distributions of LOS, NDx, and NPx demonstrated significant positive skewness and kurtosis as well, and were also ln transformed before the analysis to achieve normality.
Our cohort was then randomized (1:1 randomization, in order to create two 50% sub-samples) to a derivation and a validation cohort. Subsequently, patients with missing values were removed from the cohort using listwise deletion. A parsimonious model was then developed in the derivation cohort by performing a stepwise linear regression including all the variables discussed previously. Dummy variables were created for nonbinary categorical variables. The level of significance used for retention in the model was 0.05. No colinearity was observed by assessing tolerance and variance inflation factor (VIF). The regression diagnostics performed were the coefficient of determination (R 2 ) and analysis of the residuals. Normality among the distribution of residuals was verified with histograms (Additional file 1: Figure  S1 and S2), and P-P plots (Additional file 1: Figure S3 and S4). Further diagnostics included scatter plots of the standardized predicted values versus the standardized residuals, which revealed a random, symmetric distribution of values very close to zero (Additional file 1: Figure S5), therefore suggesting a linear fit of data.
The model created in the derivation cohort was applied on the validation cohort, the R 2 was calculated and residual analysis was performed. The predicted values for the validation cohort were plotted against the observed values and goodness of fit was assessed. No heteroskedasticity was observed. For reporting purposes, we back transformed the data to demonstrate the percentage of the contribution of each variable to the cost value.
All probability values are the results of two-sided tests, and the level of significance was set at P < 0.05. Statistical analyses were performed using SPSS version 20 (IBM, Armonk, NY), XLSTAT version 2013.6.02 (Addinsoft, New York, NY).

Patient characteristics
In the selected study period there were 36,433 patients (median age was 56.0 years, 53.3% females) undergoing CTR who were registered in NIS. Of these patients, 14,638 (40.2%) presented with primary malignant brain tumors, 9574 (26.3) with metastatic tumors, and 11414 (31.3%) with benign tumors (Table 1). Following 1:1 randomization and subsequent listwise deletion, derivation and validation cohorts were created. Randomization resulted in no significant differences in exposure factors between these two subgroups (Table 1).

Model derivation
Several factors were included in our parsimonious model after stepwise linear regression (Table 3). Hospitals in the West and Midwest (45.5% and 16.1% more respectively, in comparison to the Northeast), African-Americans (3.9% more, in comparison to Caucasians), hydrocephalus (9.3% more), coagulopathy (8.4% more), post-operative neurologic complications (10.3% more), and higher income (6.2% more for the highest income quartile, in comparison to the lowest quartile) were associated with increased hospitalization cost. A 1% increase in LOS, and number of procedures was associated with a 0.5%, and 0.2% increase in cost, respectively. On the contrary, hospitals in the South (5.7% less, in comparison to hospitals in the Northeast), private insurance coverage (4.0% less, in comparison to coverage by Medicare), urban non-teaching hospitals (5.7% less, in comparison to rural hospitals), and medium bed size (8.8% less, in comparison to small hospitals) were associated with decreased cost. Our model could explain a significant portion of the variance in cost with an R 2 of 0.62.

Model validation
The model was validated in a random cohort of patients, and the final R 2 did not differ more than 5% from the initial values (R 2 = 0.60). There was very good association of the predicted values with the observed values in the validation cohort ( Figure 2) (Pearson's rho = 0.77, P < 0.001).

Discussion
In this retrospective analysis of the NIS we developed a predictive model of hospitalization cost after CTR, and validated it in an independent cohort. The relative contribution of individual drivers of hospitalization cost after CTR have been identified. In a nation that spent $2.4 trillion on health care in 2008 alone, expenditures are under increasing scrutiny. A major component of the overall economic burden of healthcare is the initial hospitalization cost [14], especially in the setting of expensive, high-risk procedures, such as CTR. Although regulatory bodies have set general targets for cost containment [15], their applicability in specific procedures is still vague. This is particularly challenging, given the limited literature on factors associated with hospitalization cost variability. Although some studies have described the regional cost after CTR [11,12], there has been no particular focus on the identification of drivers of hospitalization cost, or the prediction of its magnitude.
To address this, we identified and quantified factors associated with cost variability after craniotomy for tumor resection. The major contributor to the observed changes in cost was length of stay, after controlling for patient and hospital characteristics. Although this finding is not surprising, since more cost is incurred with longer hospitalization, its relative contribution to the overall cost has not been studied before. Despite LOS being a major target for cost containment, the focus should be on excessively lengthy hospitalizations, not justified by patient comorbidities. The comorbidities contributing to increased LOS, in the setting of CTR, have been identified in prior studies [16], and should be taken into account to avoid penalizing the care of sicker patients.
Several other factors were identified. Most importantly, location of the hospital was crucial in determining  the cost after CTR. The effect of regional variation on healthcare spending is widely recognized across medical specialties [17,18]. Geographic and racial disparities reflect the efficiency of local healthcare delivery systems, and the practices of individual physician groups. Cultural characteristics, litigation environment, and established local practices guide these trends. These result in differential resource utilization, which rarely translates in improved outcomes, whereas it is associated with higher cost. Minimizing regional disparities could contribute to reduced spending [17,18]. In regards to CTR, it appears that the West and Midwest were associated with significantly higher hospitalization cost in comparison to the Northeast, whereas the South was associated with lower cost. Additionally, we quantified the association of number of procedures with increased cost. Higher income was associated with higher cost, possibly secondary to the utilization of more expensive hospitals by this population. Lastly, hydrocephalus, other postoperative neurologic complications, and coagulopathy were identified as the most significant comorbidities contributing to higher cost. The magnitude of these associations was described. The proposed predictive model for hospitalization cost after CTR was created and validated in a statistically rigorous way. Particular attention was given to normalizing the distribution of the primary outcome, and the  continuous exposure variables in order to minimize errors in our regression analysis. In addition, residual analysis confirmed the linear fit of data. The diagnostics demonstrated that in both cohorts a significant portion of the cost variation could be explained by the variables included in our regression model. The model demonstrated good predictive ability in an independent validation cohort, with the predicted and observed values demonstrating good correlation. Although our model cannot account for the full extent of cost variation, since it is limited by the data available through NIS, this is a first step in the direction of healthcare economics at the national level. It can be utilized as an adjunct in the cost containment debate, and the creation of data-driven policies. Our model can fuel further studies in the field and provide elements for the design of prospective investigations.
The present study has limitations common to administrative databases. First, indication bias and residual confounding could account for some of the observed associations. The 1:1 randomization of the cohort, and the validation of the model in an independent cohort aimed to minimize this bias. Second, several coding inaccuracies can affect our estimates, as in other studies involving the NIS. In addition, the number of admission diagnoses depends on the coding accuracy for each case and is therefore subject to the same limitations, which are inherent to administrative data. Third, the NIS during the years studied did not include hospitals from all states [13]. However, the creation of the 20% sample is done in such a way by HCUP that the hospitals included are still diverse with respect to size, region, and academic status. In addition, the structure of NIS, and the de-identification of the data do not allow patient follow up overtime in a longitudinal fashion, and therefore readmissions cannot be studied. Fourth, we are lacking the degree of neurologic impairment at presentation of the brain tumor patients. Fifth, some data categories were not available for all patients. To avoid the introduction of further bias we excluded those patients from any analysis. Sixth, we recognized postoperative neurologic complications based on one ICD-9 (997.00), which does not allow the identification of specific subcategories of complications. Seventh, causality is very hard to establish based on ecologic data. Our target was different though, and was focused on the identification of drivers of cost and the creation of a predictive model for it.

Conclusions
The Nationwide Inpatient Sample (NIS) is a prospective all-payer, hospital discharge database that contains a representative sample of all inpatient admissions to nonfederal hospitals in the United States. By using this, several socioeconomic variables, as well as patient and hospital level factors associated with hospitalization cost variability after CTR were identified. Based on these data, a predictive model of cost after CTR was developed and validated in an independent cohort. Although the generalization of these predictions should be done with caution, the model can be utilized as an adjunct in the cost containment debate and the creation of data-driven policies. This can fuel further studies in the field and provide elements for the design of prospective investigations.

Availability of supporting data
All supporting data are provided within this manuscript, tables, figures, and supplemental files Additional file Additional file 1: Table S1. Coding definitions. Table S2. The 10 most common procedures performed during hospitalization for our cohort. Figure S1. Histogram of the distribution of standardized residuals in the derivation cohort. Figure S2. Histogram of the distribution of standardized residuals in the validation cohort. Figure S3. P-P plot demonstrating the association of predicted and observed residuals in the derivation cohort. Figure S4. P-P plot demonstrating the association of predicted and observed residuals in the validation cohort. Figure S5. Scatter plot of the standardized regression residuals versus the standardized predicted values for the derivation cohort.