Comparative evaluation of hospital performance is a useful tool for improving health care quality [1, 4]. The U.S. experience has shown that public disclosure of comparative evaluation results should be managed as one component of an integrated quality improvement strategy, and that the public release of performance data is most effective at the level of the provider organization [3, 4, 37]. Regular feedback seems to increase the accountability of providers, which are sensitive to public image and potential legal risks; it can also spur quality improvement activities in health care organizations, especially when underperforming areas are identified . However, providers that are identified as poor performers are more likely to question the validity of the data, particularly when the results are first released .
In Italy, initiatives aimed at assessing the outcomes of hospital care have been undertaken at the national and regional levels only in the last decade [13, 17, 40, 41]. Based on these experiences, we developed the Regional Outcome Evaluation Program, called P.Re.Val.E. . The high numbers of patients investigated, the accuracy in the selection of the cohorts and the study outcomes, the consolidated statistical strategy, and the replication of similar findings for different clinical conditions are important elements of internal and external validity. For 2006-2009, results were obtained using direct risk adjustment for comparative evaluation of outcomes for hospitals and areas of residence. Hospital league tables obtained by indirect standardization procedures should not be used for hospital-to-hospital comparisons . This technique can lead to biased conclusions unless the distribution of risk factors or their effects do not vary between the hospitals being compared.
After receiving comparative reports based on standardized performance measures, hospitals that began as low-level performers tended to improve faster than those that started at higher levels of performance .
Since studies have shown little correlation between measured quality of care and Standardized Mortality Ratios (SMRs) [43, 44], we used different outcome and/or process measures for each studied condition to better identify the hospitals/geographic areas that needed health care quality improvement. This study used data from several health care information systems, and most of the indicators were based on the concept of first hospital access, corresponding to patient admission to an acute inpatient facility or to emergency department access. The indicators for which time to death or to surgery was calculated from first hospital access provide a measure of the appropriateness and efficacy of the health care process that begins when a patient arrives to a given facility. Use of the MIS and EIS databases, in addition to the HIS database, allowed more accurate identification of 30-day mortality.
To monitor the time trends of the different outcomes, we used VLAD charts, a type of quality control chart that is a good tool for measuring the variability of an event adjusting for patient risk [34, 35, 45, 46]. VLAD charts provide an easy-to-understand and up-to-date view that allows early detection of runs of good or bad outcomes and thus can prompt timely intervention for critical situations. The VLAD charts also highlight small variations over time for observed events compared to expected events; this information is often obscured by the corresponding synthetic indicator.
The P.Re.Val.E. results for 2008-2009  provided an overview of the hospital care heterogeneity in the Lazio region. The results do not constitute a "league table" of performance, but instead reveal numerous instances of high-quality care as well as problem areas that merit further analysis and internal and external auditing as part of an increasingly well-developed program of clinical governance.
P.Re.Val.E. is an outcome research program conceived mainly as a tool for promoting discussion among healthcare managers and professionals in the Lazio region. Given the complexities of accurately comparing provider outcomes, we published the methods used for developing the program in detail, using others' suggestions for the public reporting of comparative health outcome evaluations  so that the face validity of the results could be evaluated. We also used various tools to present the results in order to make the results accessible; in particular, we used bar graphs to clearly display adjusted estimates, as well as tables that included the number of admissions, the crude and adjusted estimates, and the statistical significance of the results.
Public disclosure of the 2006-2009 results to clinicians, health care managers, and policy makers was aimed at creating an incentive to improve results. In fact, studies describing the effect of public reporting on consumers' choices, effectiveness, patient safety and patient-centeredness have shown that the public release of performance data mainly stimulates change at hospital level . There have been some negative reactions, but in most cases, the results have stimulated increased review among health care organizations and professionals. Clinicians and program developers have met several times to discuss the methodology as well as negative findings, such as poor performance or poor coding accuracy. Physicians have made suggestions about more accurate selection criteria for some indicators. As a consequence of the extremely low proportion of interventions for hip fracture in the elderly within 48 hours in most facilities, the regional authority decided that hospitals with performance results below a given standard would be penalized economically by a reduction in fees corresponding to specific Diagnosis Related Groups (DRGs) .
P.Re.Val.E. is a program in progress, and it will be updated and further developed by the definition and calculation of additional indicators, e.g., those aimed at evaluating health care quality for oncology patients, and by the use of regional drug dispensing registries to more accurately identify patient comorbidities. The impact of health care performance information disclosure to the general public should be evaluated. Evidence suggests that this information has only a limited impact on consumer decision-making  since people have limited access to data on health care providers ; however, studies suggest that people are interested in comparative information [51, 52].
These analyses have explicit limitations, especially with regard to the marked variability in the coding accuracy of current health care information systems. This issue is critical for ensuring accurate risk adjustment, and, correspondingly, reliable comparative quality ratings . In the past, administrative databases have too frequently been used exclusively as tools to claim financial reimbursement for services provided without concern for their roles as epidemiologic sources and as essential instruments for clinical governance. There are some important advantages to routinely collecting administrative data: it is inexpensive to do this, and the data provide information about large populations, do not depend on voluntary participation by individual clinicians and providers, and can be used to predict risk of death with discrimination comparable with that obtained from clinical databases . However, the use of routinely collected administrative data in comparative outcome evaluations has been criticized for the following reasons: there is an absence of clinical information needed to adequately adjust for patients' conditions [55, 56]; there is an inability to distinguish between a disease present at admission (a comorbidity, i.e. a true patient risk factor) and one that occurred during the hospital stay (i.e. a complication) [56, 57] and some chronic comorbidities, such as hypertension and diabetes, are known to be currently under-reported at admission, mainly in more severely affected patients [58, 59]. The first problem could be overcome in the P.Re.Val.E. update for 3 conditions, namely AMI, aortocoronary bypass, and hip fracture, since some clinical information (e.g., systolic blood pressure, ejection fraction, creatinine) have been recently added to the HIS . Moreover, in these 3 conditions, intrinsic illnesses at the time of patients' admissions can be distinguished from complications since "present on admission" (POA) flags have been added to discharge diagnoses . The problem of under-recording is also partially solvable by using prior patient hospitalization records to identify comorbidities independent of patient severity at the current admission  as well as emergency department visits to collect additional information about patient risk factors. However, the coding accuracy may differ widely among the facilities , and this could lead to biased comparisons. Even though the possibility of gaming of the data in response to the performance evaluation cannot be excluded, previous studies did not find evidence of gaming . Some studies have reported that changes in data accuracy may partially explain quality improvement . However, we did not find relevant changes in recording of co-morbidities in our study population over the years (data not shown). In agreement with previous reports, the prevalence of certain co-morbidities and risk factors was relatively low in our study population, indicating underreporting of co-morbidities and detailed clinical information in the administrative database . However, underreporting was non-differential in the years included in our analysis. Moreover, a previous Italian study  assessing clinical performance in cardiac surgery demonstrated that the use of an administrative database provided similar league tables as a more complex specialized database.
Finally, since health care services can only be evaluated by empirical measurements, inevitably there will be errors (systematic and random). This represents a clear limitation of our analysis as well as others of this type. We agree with Shahian et al. that hospital mortality estimates could vary, sometimes widely, based on the different case-selection criteria and statistical methods, leading to divergent inferences about relative hospital performance. Despite these concerns, some findings could be useful to potential users or to facilities . P.Re.Val.E. openly declares the data sources and methods, allowing external review of biases and distortions implicit to the evaluation process. The next P.Re.Val.E. analysis, expected in November 2011, will include improvements in methods and procedures; of course, we cannot say that other biases will not be introduced, simply that they will be different. It is our conviction, however, that P.Re.Val.E. is an important operative tool that should be used to promote clinical and organizational monitoring of health care providers, to support political decision-making processes, and to stimulate a sense of healthy, productive competition aimed at improving healthcare efficacy and equity. We hope that this program will encourage a value often neglected within the Italian NHS: accountability.