Introduction
Administrative data routinely collected at hospitals are attractive for researchers: they are large, often exhaustive, and of relatively easy access. However, they are not intended for research, and they lack the clinical details of observational studies or clinical trials. Researchers thus face a trade-off between using large but incomplete databases versus using detailed but often poorly representative ones. One of the major limitations of missing information in administrative data is that endogeneity cannot be corrected due to the non-observability of the characteristics of some patients.
Let us suppose that we seek to evaluate the impact of a given treatment on a patient's health. The decision to treat a patient is not random in real practice, contrary to what occurs in clinical trials. In the "real world", patients are selected into treatment arms based on their expected outcomes. Hence, the explanatory variable (treatment) is endogenous, as it is explained by the dependent variable (outcome). This problem would be solved if one could control for a large array of patients' characteristics, in order to estimate the differences between the treated and the untreated. Unfortunately, this is not the case with administrative data.
In the present study, however, we postulate that appropriate statistical techniques can help reduce this problem. To do so, we examine the impact of invasive treatments for cardio-vascular disease - percutaneous coronary intervention (PCI) - and coronary artery bypass grafting (CABG) on in-patient mortality, using administrative data from Portuguese NHS hospitals. We examine how outcomes vary whether we account for endogeneity or not. Then, we examine how the selection bias spreads to other indicators, namely, the differences between men's mortality and women's mortality following invasive treatments.