Early winners and losers in dialysis center pay-for-performance

Background We examined the association of dialysis facility characteristics with payment reductions and change in clinical performance measures during the first year of the United States Centers for Medicare & Medicaid Services (CMS) End Stage Renal Disease Quality Incentive Plan (ESRD QIP) to determine its potential impact on quality and disparities in dialysis care. Methods We linked the 2012 ESRD QIP Facility Performance File to the 2007–2011 American Community Survey by zip code and dichotomized the QIP total performance scores—derived from percent of patients with urea reduction rate > 65, hemoglobin < 10 g/dL, and hemoglobin > 12 g/dL—as ‘any’ versus ‘no’ payment reduction. We characterized associations between payment reduction and dialysis facility characteristics and neighborhood demographics, and examined changes in facility outcomes between 2007 and 2010. Results In multivariable analysis, facilities with any payment reduction were more likely to have longer operation (OR 1.03 per year), a medium or large number of stations (OR 1.31 and OR 1.42, respectively), and a larger proportion of African Americans (OR 1.25, highest versus lowest quartile), all p < 0.05. Most improvement in clinical performance was due to reduced overtreatment of anemia, a decline in the percentage of patients with hemoglobin ≥ 12 g/dL; for-profits and facilities in African American neighborhoods had the greatest reduction. Conclusions In the first year of CMS pay-for-performance, most clinical improvement was due to reduced overtreatment of anemia. Facilities in African American neighborhoods were more likely to receive a payment reduction, despite their large decline in anemia overtreatment.


Background
In 2012, the United States Centers for Medicare and Medicaid Services (CMS) reported outcomes for its End-Stage Renal Disease Quality Incentive Program (ESRD QIP), a pay-for-performance (P4P) program for dialysis facilities. The ESRD QIP is instructive as a casestudy of the ability of financial incentives to improve quality and impact disparities, and the challenges of creating policy in changing clinical and policy contexts.
The ESRD QIP builds on prior efforts to improve the quality and value of ESRD care. In 2001, CMS began publicly reporting dialysis performance measures on Dialysis Facility Compare, a public report card of facility performance. In 2008, US Congress passed the Medicare Improvements for Patients and Provider Act of 2008 (MIPPA) which established the ESRD QIP, a value-based purchasing or P4P program for Medicare that began in 2012 [1]. MIPPA also bundled payments to dialysis facilities starting in 2011 to reduce incentives to provide expensive erythropoietin stimulating agents (ESAs) due to high ESA costs and evidence of potential harms of ESAs at high doses [2][3][4][5].
The ESRD QIP reduces Medicare ESRD payments to dialysis facilities whose total performance scores do not meet or exceed national standards. Facilities with a total performance score less than 26 (out of a possible 30) have their Medicare payments for dialysis services reduced on a sliding scale, ranging from 0.5% to a maximum of 2%. Total performance scores and payments for 2012 are based on the dialysis facilities' outcomes from 2010 for dialysis adequacy (percentage of patients with urea reduction ratio ≥ 65) and inadequate anemia management (both undertreatment, the percentage of patients on ESAs with hemoglobin (Hgb) < 10 g/dL, and overtreatment, the percentage of patients with Hgb > 12 g/dL).
Supporters of value-based purchasing or P4P programs contend that by measuring and reporting outcomes and linking them to reimbursement, dialysis facilities will have the data and motivation to improve their quality of care. Achieving targeted hemoglobin and dialysis adequacy is associated with decreased morbidity, mortality, and improved quality of life [6][7][8][9][10]. Similarly, randomized trials have shown that overtreatment of anemia (Hgb greater than 12 g/dL) in patients with chronic kidney disease is associated with increased mortality, partly due to increased erythropoietin [3]. In addition, because these outcomes are publicly reported online as part of Dialysis Facility Compare [11], consumers could make informed choices about dialysis facilities.
Critics counter that P4P might be ineffective in improving quality of care or, worse, have unintended negative consequences. P4P could be ineffective if the money at risk is not sufficient to encourage dialysis facilities to improve quality [12]. P4P could also exacerbate disparities in dialysis quality [13]. Dialysis facilities in neighborhoods with a higher proportion of African American residents have performed less favorably on quality indicators for anemia management and dialysis adequacy [14,15]. P4P could worsen disparities if dialysis facilities "cherry-pick" healthier patients, leaving patients with more medical or social disadvantage in low-performing centers. P4P could also worsen racial or socioeconomic disparities if poorly performing dialysis facilities are chosen less by well-informed, well-insured patients, receive less reimbursement, and improve less than their already high performing counterparts.
Thus we examined if quality improved and disparities changed under QIP. First we explored what dialysis facility and neighborhood characteristics are associated with total performance scores that led to payment reductions under QIP. Then we investigated changes in QIP's three clinical performance measures between 2007 and 2010.

Data
The Centers for Medicare and Medicaid Services (CMS) 2012 ESRD QIP Facility Performance contains information on all CMS certified dialysis facilities about anemia management and dialysis adequacy for 2010, metrics used for P4P in fiscal year 2012.To obtain demographic information for the neighborhood where the facility is located, as well as for state and region, we linked the CMS ESRD QIP file to the 2007-2011 American Community Survey (ACS) five-year summary file facility census tract by geocoding facility street address. We excluded facilities that did not have complete QIP score information (n = 483, 8.7%). For the addresses for which a street address could not be geocoded (n = 70, 1.3%), we used the census tract based on the zipcode XY centroid [16]. This study was deemed exempt by the IRB.

Outcomes and co-Variates Outcome variables
In the ESRD QIP, facilities were given a 1-10 rating for each of three criteria: the proportion of patients with Hgb less than 10 g/dL, the proportion of patients with Hgb greater than 12 g/dL (for the hemoglobin values, a lower proportion gives a better score), and the proportion of patients receiving adequate dialysis (urea reduction rate [URR] > 65%). Per QIP, the scores were summed so each facility received a score between 1 and 30 which was translated to a five-category measure of payment reduction for fiscal year 2012: (1) 0%, score 26-30; (2) 0.5%, score 21-25; (3) 1%, score 16-20; (4) 1.5%, score 11-15; and (5) 2%, score 10 or less.
We based our outcome measures on the ESRD QIP. We reverse coded adequate dialysis to measure inadequate dialysis, URR < 65, so that the change between 2007 and 2010 would move in a consistent direction. We collapsed 2% payment reduction into a category of 1.5% or more since there were few observations (n = 32). To determine which dialysis facility characteristics were associated with good and poor quality measures, we transformed categorical variables to two dichotomous outcome variables. Our first measure was 'any payment reduction' versus 'no payment reduction, ' where any payment reduction represented the payment reduction percentage of 0.5% or more (coded as 1) and no payment reduction indicated 0% payment reduction (coded as 0). Our second dichotomous outcome was 'most payment reduction' versus 'other payment reduction, ' where most payment reduction was defined as payment reduction percentage of 1.5% or greater (coded as 1) and other payment reduction was defined as 1% or smaller (coded as 0).
An additional outcome was change over time of the clinical outcome. To assess direct effects of QIP on quality improvement, we compared 2007 and 2010 performance for each dialysis facility on each of the three QIP outcomes.

Co-variates
We examined dialysis facility and neighborhood characteristics while accounting for ESRD network. Network describes the 18 ESRD Networks which are regional entities that contract with CMS and are responsible for organization, health planning, and quality improvement for ESRD care. Dialysis facility characteristics included profit status (i.e., for-profit versus non-profit), chain type (i.e., non-chain (independent), Chain 1, Chain 2, versus all other chains), facility size (i.e., the total number of dialysis stations at the dialysis facility by tertile), and years of operation (i.e., years since CMS certification). Chain membership is an organizational structure where a single firm owns a number of dialysis facilities, defined by USRDS as 20 or more freestanding facilities. Measures of dialysis facility neighborhood context were based on census tract. Census tract is the most commonly accepted proxy for neighborhood within social science and health services research [17]. Dialysis facility neighborhood-level characteristics included percentage of African American and percentage of population below Federal Poverty Level to represent neighborhood socioeconomic context [14,15]. These variables were rightskewed; therefore, we divided them into quartiles to create categorical variables. All categorical variables (number of stations by tertile, percent African American by quartile, and percent poverty by quartile) were analyzed as indicator variables.

Statistical analysis
Data were summed using descriptive statistics. We used a generalized linear mixed effects model to examine associations between each outcome and dialysis facility characteristics (facility type, length of operation, total number of stations) and facility neighborhood demographics (percent African American and percent of population below Federal Poverty Level by quartile). Each covariate of interest was evaluated separately in bivariable analyses; then they were examined simultaneously in multivariate analyses. We used logistic regression for binary outcomes (any payment reduction and no payment reduction); we used linear regression for paired data for change in clinical outcomes over time. All models included a random effect for ESRD network to account for clustering. We considered an interaction term between neighborhood demographics (proportion African American and proportion below poverty) for any payment reduction and large payment reduction because prior work showed different outcomes in African American and non-African American poor neighborhoods [15,18]. All analyses were conducted using Stata, version 12.0 (Stata Corp, College Station, TX).

Baseline characteristics
Of 5089 CMS certified dialysis facilities, the majority were for-profit (82.6%) and part of a chain (79.0%), Table 1. The average dialysis facility had 18 stations and had been operating for 15 years, although there was wide variation. On average, facilities were located in neighborhoods that were 18.2% African American, though this distribution was right skewed; mean proportion African American was less than 1% in Quartile 1 and 54% in Quartile 4. The facilities had a mean QIP score of 26.3, and 70.4% of facilities had 0% payment reduction.
Bi-variable and multi-variable association between any payment reduction and facility and neighborhood covariates In bivariable analysis compared to other chains (

Bi-variable and multi-variable association between greatest payment reduction and facility and neighborhood
Appendix Table 5 shows factors associated with being in the greatest payment reduction group ≥1.5%, the worst category. In bivariable analysis, facilities with longer length of operation ( , respectively) had lower odds of being in the greatest payment reduction group compared to other chains. There was no significant interaction between neighborhood demographics (proportion African American and proportion below poverty) and between these variables and ESRD network (analysis not shown). Table 3 shows mean change in clinical outcomes between 2007 and 2010 by facility and neighborhood. Across most categories, there was a small increase in percent of patients with Hgb below 10 g/dL, which indicates anemia undertreatment. There was a significant difference between change in Hgb < 10 between the no payment reduction group which showed a small reduction in proportion with Hgb < 10 and any payment reduction groups that show a successive increase in proportion with Hgb < 10. In addition, most facilities saw a small decrease in percent of patients with inadequate dialysis. The largest change in clinical outcomes from 2007 to 2010 is reduced anemia overtreatment shown by a decrease in proportion of patients with Hgb greater than 12 g/dL. Two large national chains had larger mean change compared to other chains or non-chains (independent) (−43.6 and −32.3 versus −30.9 and −27.4, respectively, p < 0.05).  Table 4 shows multivariable results of facility and neighborhood characteristics associated with clinical outcomes over time. After adjusting for other factors, chain type (i.e., Chain 1) was associated with clinical improvement across all categories. Similarly, neighborhoods with greatest proportion of African Americans had significant improvement in two categories, decrease in percent of patients with Hgb > 12 g/dL and decreased percentage of patients with urea reduction rate less than 65. An analysis (not shown) examining random effects within our two-level model demonstrated larger variation our outcomes between facilities than between network. For example, for urea reduction rate less than 65 the network variance is 0.17 while the residual (facility) variance is 33.4. For Hgb > 12 g/dL, network variance is 0.03 and facility variance is 7.2; for Hgb < 10 g/dL network variance is 18.4 and facility variance is 408.1. The residual variance for facility was larger than for network across all three outcomes.

Discussion
The first year of P4P for dialysis facilities has been a qualified success in the US. Several positive results are suggested by our data. The majority of dialysis facilities met CMS quality benchmarks related to dialysis adequacy and hemoglobin management and did not receive a payment reduction in 2012. However, total performance standards set for the 2012 payment year were modest at best. The measurement standards are based either on 2008 national benchmarks for each outcome or the actual 2007 results for each provider, whichever is lower. Thus, lower performing facilities were able to use their 2007 results for the baseline rather than being held to national performance standards.
Other clinical and regulatory forces were also at work. Announcements in 2008 about ESRD QIP and bundled payment, which reduced incentives to provide expensive erythropoietin stimulating agents (ESAs), occurred at the same time as evidence mounted of potential harms of ESAs at high doses [2,3,5]. High doses of ESAs to reach high hemoglobin levels were associated with greater risk for stroke, thromboembolic events, cardiovascular events, all-cause mortality and potentially cancer [2,3,5]. In 2007, the US Federal Drug Administration (FDA) issued a black-box warning for ESAs which called for using the lowest possible dose. As a result, many professional bodies changed their guidelines for ESA dosing for ESRD-related anemia [19][20][21]. Additionally, since 2001 CMS had been requiring dialysis facilities to monitor and report their outcomes. It is unclear whether the process of monitoring and reporting, or anticipating the potential financial penalties for QIP or bundled payment, led to improvement. Regardless of that uncertainty, a larger proportion of patients in dialysis facilities received guideline recommended care, which will likely lead to better clinical outcomes.
For-profit facilities were less likely to have received any payment reduction and were less likely to be in the largest payment reduction group. Those successes were largely due to two large national chains that represent more than half of US dialysis facilities. These large chains may have been better able to develop and disseminate effective clinical monitoring systems and clinical protocols. Our work found that the largest improvement was reduction in anemia overtreatment, and this change was significantly larger for Chain 1, a large for-profit national chain. Chain 1 also had small improvements in the two other categories (anemia undertreatment and dialysis adequacy). In prior work, for-profit facilities were associated with anemia overtreatment, higher than recommended hematocrit, and larger doses of erythropoietin-stimulating agents (ESAs), even after the FDA black box warning about ESAs' harms at high doses [22,23]. Adherence to clinical guidelines for ESAs, anticipation of bundled payments to facilities which now include ESAs, and QIP may have all contributed to this decline in anemia overtreatment [3,5,24,25]. However, this result demonstrates how aligning payments-both bundled payments and ESRD QIP-with desired clinical outcomes can motivate capable organizations to change their practices.
When evaluating whether ESRD QIP increases disparities in quality, the evidence is also mixed. Facilities in predominately African American communities showed the greatest reduction in percent of patients overtreated for anemia and a more modest increase in percent of patients with adequate dialysis. Despite these improvements, in absolute terms, facilities in predominately African American neighborhoods still had lower total performance scores. ESRD QIP rewards absolute performance rather than relative improvement. Thus, the likelihood of a facility receiving any payment reduction increased as proportion of African Americans in the neighborhood increased, even after controlling for neighborhood poverty. Our work is consistent with the growing body of literature that recognizes neighborhood context (i.e., poverty or racial composition) as a contributor to disparities in quality of care for patients with ESRD [14,18,26]. The challenge of ESRD QIP, similar to other P4P programs, is how to incentivize quality improvement without widening quality or payment disparities [27,28].
Efforts are underway to determine if and how to account for social factors that may lead to poor health care outcomes (e.g. low education, racial minority status, residence in disadvantaged neighborhood) in P4P programs without accepting lower quality for individuals in those groups [28]. Two distinct methods-risk adjustment and payment adjustment-attempt to measure and improve quality, while taking social factors into account to avoid unfairly penalizing particular providers or institutions.
Risk adjustment determines what clinical and social factors should be accounted for when reporting outcomes; that is the quality benchmarks may be differ but payment for reaching one's designated benchmark is the same. Payment adjustment sets the same quality benchmark for every institution, but adjusts payment based on accounting for clinical or social factors. The payments can be based risk adjusted, increased after accounting for social factors, or can directly fund programs to improve the quality of care for disadvantaged patients. Our analyses have limitations. First, our study was observational. While our work shows that facility and neighborhood variation exists in quality of care, the etiology of this variation is still unclear. Further work is needed to examine how staff attitudes and practices, patient characteristics, and facility-level processes vary by high-and low-performers as well as by facility type, neighborhood, and region. In addition, our measures of dialysis facility neighborhood context were limited to census tracts. Dialysis facilities may be affected by patients and employees from a larger catchment area, and our outcomes may be more dependent on larger community infrastructure and resources [29]. In addition, patients in dialysis facilities may have greater disadvantage than the surrounding area as measured by percent African American and poverty [14]. Finally, we accounted for the effect of ESRD network overall, but we did not examine payment reductions by specific ESRD networks, states, or counties. While variation within these smaller areas may exist, we found that a greater proportion of variation in facility outcomes was explained by facility-level characteristics rather than region.

Conclusions
Pay-for-performance has arrived for dialysis facilities in the US. Centers are likely reassured that this policychange is not draconian; the majority of centers received no payment reduction. CMS policy makers are likely pleased at the improvement in anemia management. Most of the improvement appeared to be due to reduced anemia overtreatment without a large increase in anemia undertreatment. However, reports of the early successes of ESRD QIP must be tempered with caution. The large improvement in anemia management occurred with the concomitant incentives of P4P, anticipation of bundled payment for ESAs, and mounting clinical evidence of the potential harms of ESAs. In addition, facilities in largely African American communities still fared worse in this new era of P4P, despite significant quality improvement on dialysis adequacy and anemia overtreatment. In subsequent years, disparities may widen in the ESRD QIP as CMS increases the number of clinical outcomes measured and raises the thresholds that need to be met [30]. Outcomes must continue to be monitored to ensure that the program effectively improves quality of care for all patients with ESRD regardless of type of facility or neighborhood.