Feasibility and validity of International Classification of Diseases based case mix indices

Background Severity of illness is an omnipresent confounder in health services research. Resource consumption can be applied as a proxy of severity. The most commonly cited hospital resource consumption measure is the case mix index (CMI) and the best-known illustration of the CMI is the Diagnosis Related Group (DRG) CMI used by Medicare in the U.S. For countries that do not have DRG type CMIs, the adjustment for severity has been troublesome for either reimbursement or research purposes. The research objective of this study is to ascertain the construct validity of CMIs derived from International Classification of Diseases (ICD) in comparison with DRG CMI. Methods The study population included 551 acute care hospitals in Taiwan and 2,462,006 inpatient reimbursement claims. The 18th version of GROUPER, the Medicare DRG classification software, was applied to Taiwan's 1998 National Health Insurance (NHI) inpatient claim data to derive the Medicare DRG CMI. The same weighting principles were then applied to determine the ICD principal diagnoses and procedures based costliness and length of stay (LOS) CMIs. Further analyses were conducted based on stratifications according to teaching status, accreditation levels, and ownership categories. Results The best ICD-based substitute for the DRG costliness CMI (DRGCMI) is the ICD principal diagnosis costliness CMI (ICDCMI-DC) in general and in most categories with Spearman's correlation coefficients ranging from 0.938-0.462. The highest correlation appeared in the non-profit sector. ICD procedure costliness CMI (ICDCMI-PC) outperformed ICDCMI-DC only at the medical center level, which consists of tertiary care hospitals and is more procedure intensive. Conclusion The results of our study indicate that an ICD-based CMI can quite fairly approximate the DRGCMI, especially ICDCMI-DC. Therefore, substituting ICDs for DRGs in computing the CMI ought to be feasible and valid in countries that have not implemented DRGs.

be masked or distorted. Therefore, the ability to measure the heterogeneous severity of illness of hospitals has been recognized as a critical ingredient for improving the management of hospitals and achieving equity in hospital reimbursements under case payment schemes [1].
Severity of illness measurements can be separated into two levels: individual level and hospital level. At the patient level, the severity of each individual patient is assessed independently based upon clinical findings and personal characteristics, which are mostly extracted from medical records; whereas hospital severity of illness describes the aggregate difficulty in the treatment of the diseases presenting themselves at each hospital as compared to other hospitals and most of the information utilized for this purpose comes from administrative data. What we are interested in, from the perspective of healthcare system management and health service research, is how the aggregate severity of illness at the hospital level can be easily derived from the administrative data.
There are various perspectives in the definitions of severity. Resource consumption has been applied as a proxy of severity, especially at the hospital level for various purposes. The most commonly cited and acclaimed for hospital resource consumption measure is the case mix index (CMI). CMI is an administrative data-driven approach based upon patient classifications. The term "case mix" has emerged to reflect the fact that, within a population, individual patients may have a range of risks, and the aggregate outcome reflects the aggregate risks; as a result, case mix is a useful concept when comparing the performance of hospitals and clinicians [2].
The best-known illustration of the CMI as a resource consumption adjustment standard is the Diagnosis Related Group (DRG) CMI used by Medicare in the US. A hospital's DRG CMI measures the complexity of cases treated at that particular hospital relative to the average complexity in a peer group of hospitals [3]. There are two ways to derive a DRG-based CMI: costs and length of stay (LOS) [3]. The DRG-based costliness CMI, which is actually based on charges instead of costs, is used in Medicare's Prospective Payment System [4]. Medicare DRG cost-based CMIs are most well known for their use in estimating resource consumption in prospective payment systems and are presumed to be the criterion standard of resource consumption measures that can be derived from administrative data in our study. The typical equation used is as follows [3]: where h is the hospital for which the index is being calculated; W g is the weight associated with the DRG g , i.e. the amount of basic case payment before adjustment for each DRG respectively in our study; N gh is the number of cases in the DRG g in hospital h; and N gn is the number of cases in the DRG g of the entire country.
One of the primary advantages of classifying patients before determining resource consumption as shown in the calculation process above is that the inefficiency of each individual hospital will have less influence on the determination of the index since the weight for each group of patients is based upon the national average instead of a hospital average. Although Medicare DRGs do not differentiate patients' severity gradients in detail, Medicare DRG CMI has been shown to be able to approximate hospital severity level to a satisfactory degree. For instance, a study conducted in Philadelphia to estimate hospital inefficiency indicated that the addition of the Medicare DRG CMI reduced estimated inefficiency by more than 50% and the incremental effect of a severity of illness variable to an equation with CMI was very small [5]. It was relatively easy for researchers in the US to account for severity from a resource consumption perspective by using Medicare DRG CMI. However, for healthcare researchers around the world who did not have a readymade CMI at hand, the search for a valid substitute has always been a painstaking process. However, the idea of CMI as a resource consumption and severity adjustment standard does not belong exclusively to DRG-like systems. So long as patients can be classified to a certain extent, one can derive a CMI specific to that pertinent classification method.
For the purpose of patient classification, the diagnosis is always the starting point. The standard diagnostic lexicon originated from the International Classification of Diseases (ICD) code, which is applied worldwide. Although the ICD does not take severity of illness or resource consumption into consideration in classifying patients [6], if averaging based upon classification can diminish the variation of treatment patterns, one should be able to apply the ICD to derive the CMI as well. That is to say, we should be able to compute ICD-specific CMIs in a hospital population if the mathematical algorithm is determined and manageable within our computer's computing power. The Medicare DRGs were originally derived by reducing the ICD codes to smaller subsets, and weighted the patients in each subset by cost. Therefore, we can reasonably expect correlations between ICD CMI and DRG CMI to a certain degree. DRGCMI The research objective of our study is to ascertain the construct validity of CMIs derived from ICD in comparison with DRG CMI. Our hypothesis was CMI derived from the ICD classification should correlate with the Medicare DRG CMI to a certain degree that justifies the application of ICD based CMIs.
Taiwan has been able to accumulate a significant amount of computerized billing files since the initiation of the National Health Insurance (NHI) program in 1995. Therefore, it would also be interesting to determine whether Taiwan's administrative data set can be utilized to provide meaningful information for computing ICDbased CMI in the search for a valid substitute for Medicare CMI in non-American healthcare settings.

Methods
One of the main thrusts of this study is to apply the US Medicare DRG system to Taiwan's inpatient claims to come up with a Medicare DRG CMI so that it can be compared with the performance of all the other alternative indices. Since the Medicare DRG system has not been formally adopted in Taiwan, we had to import this tool for our research purposes. In this study, the 18th version of GROUPER, the Medicare DRG classification software, was used for assigning each patient a DRG code. GROUPER is a software programmed according to the classification guidelines issued annually by the US Centers for Medicare & Medicaid Services (CMS).
During the process of computing DRG weights, we deviated from the original Medicare design in two aspects. First of all, outliers which are defined as cases with costs more than three standard deviations away from the geometric mean for each DRG are not included in the computation. The purpose of leaving outliers out of the calculations with Medicare is so that they can be reimbursed separately, which does not serve our purpose of approximating severity. Therefore, outliers were included in our calculations. Second, low-volume DRGs are also deleted in the process. Low-volume DRGs are defined as DRGs for which the number of cases is less than a DRGspecific constant based on the mean and standard deviation, and they are chosen so as to guarantee estimate precision of plus or minus 10% of the mean value for a 90% confidence interval around the average cost for each remaining DRG [7]. According to the literature, the practice that DRGs with fewer than 5 to 10 cases are dropped from weighting computation is roughly compatible with the original design [7]. For the sake of simplicity, the cutoff point was set at 5 cases in this study. The DRG costliness CMIs were obtained subsequently.
The next step was to determine the ICD case mix. For the purpose of this study, principal diagnoses and procedures in the claim data were respectively used in the grouping process. In the first approach, the first three digits of the ICD of the principal diagnosis were used to group inpatient cases. Merely adopting the first 3 digits is not a random and spur of the moment practice. It has been applied in other occasions as well. For example, in order to define what constitutes unexpected readmission of the same diagnosis within 14 days of discharge, the Bureau of National Health Insurance (BNHI) mandated if the first three digits of the patient's ICD-9-CM principal diagnoses were the same in the sequential two admissions, the second instance was considered a readmission [8].
For coding diagnosis, there are ICD disease codes from 001 to 999 and supplementary V codes ranging from V01 to V82. Codes with fewer than 5 patients were not used for the calculation. The ICD CMI can be derived through the same equation used for the DRG CMI calculation as follows: where h is the hospital for which the index is being calculated; W g is the weight associated with the ICD g , i.e., national mean for LOS and payments for each three-digit ICD group respectively in our study; N gh is the number of cases in the ICD g in hospital h; N gn is the number of cases in the ICD g for the entire country.
The same principles applied in computing the DRG weighting can also be used in computing the ICD weighting. Both the ICD principal diagnosis LOS CMI and the ICD principal diagnosis costliness CMI can be calculated. A second possible ICD classification is based on the procedure code only. We only took into account cases that received ICD-coded procedures to come up with a procedure CMI. Both the ICD procedure LOS CMI and the ICD procedure costliness CMI were also calculated.
The administrative data used in this study was the reimbursement claim dataset of the NHI. The BNHI provided the 1998 NHI claim data. The total data set from the BNHI included two files: DD199801_0.DAT and MAB_HOSBSC.DAT. DD199801_0.DAT is the inpatient claim file which records the summary reports of all inpatient discharges islandwide. The total number of entries is 2,462,006. Most variables applied in our study came from this file. MAB_HOSBSC.DAT records the contracting health care providers' characteristics pertinent to the use of health insurance claim processing.
The study population included all hospitals of the general acute care type that have participated in the Department of Health (DOH) accreditation, excluding chronic and specialty hospitals. 551 hospitals were included. Of the 551 hospitals, private hospitals constituted the largest portion of the population, at 73.3%, in terms of ownership. According to Taiwan's Medical Care Act sections 3, 4, and 5, there are three classes of ownership: public hospital, non-profit hospital, and private hospital [9]. According to this classification, public hospitals only constituted 15.4% and non-profit hospitals 11.3% of the total (see Table 1). On average, non-profit hospitals took care of the most patients. Non-profit hospitals as a whole accounted for 42.7% of the inpatient market and the distribution of inpatient volumes among them are less skewed as compared to the other two ownership categories (see Table 1).
The other hospital category is accreditation level. The three basic accreditation levels are medical center, regional hospital and local hospital. Most hospitals were accredited; only 6.5% hospitals were not accredited. Of the total, 2.5% were medical centers; 9.4% regional hospitals; and 81.5% local hospitals. On average, at the individual facility level, medical centers took care of the most patients, whereas local hospitals as a whole comprised 40.6% of the inpatient market. Due to the advantage of accessibility, local hospitals obviously took a larger market share not only in the inpatient department but also in the outpatient department and the ER. However, the inpatient market share percentage is the lowest among the three. That is too say, local hospitals are more outpatient oriented than inpatient oriented. Another feature worth noting is that the distribution of service volumes in each accreditation level is less skewed as a whole compared to all the other classifications (see Table 1).
The other important hospital classification is teaching status. Teaching hospitals constituted 24.5%, whereas nonteaching 75.5% of the total. In addition, teaching hospitals see more patients than non-teaching institutions both in terms of average service volume and market share (see Table 1). All the operational definitions and codings of CMIs and hospital characteristics variables are listed in Table 2.

Results and discussion
After the construction process, the spectra of all the CMIs were analyzed and are displayed in Table 3. The DRGCMI ranged from 0.32 to 2.83 for the whole hospital population. The ICDCMI-DC ranged from 0.39 to 3.20. The ICD-CMI-DL ranged from 0.35 to 4.60. The ICDCMI-PC ranged from 0.27 to 3.06. The ICDCMI-PL ranged from 0.32 to 1.63. Another phenomenon worth noting is that the averages of CMIs are found to be in the following descending order across the board: medical centers, regional hospitals, local hospitals and non-accredited hospitals. The distributions of none of these CMIs were normal according to the one sample Kolmogorov-Smirnov (K-S) test.
The first step of the analyses was conducted based on the entire dataset as a whole, namely all 551 hospitals, without further stratification. The second step of the analyses was performed after stratifying the hospitals according to their teaching status, ownership, and accreditation level. The distribution of the values for each severity variable was not normally distributed. As a result, we had to apply distribution-free statistics, i.e. non-parametric statistics, in our analyses.
Overall, DRGCMI was significantly correlated with all other CMIs. The highest was between DRGCMI and ICD principal diagnosis based costliness CMI (ICDCMI-DC) at 0.688 (see Table 4). The correlations are fairly high in general and the correlations for CMIs based on cost are higher than those involving length of stay. Also, correlations based on diagnosis are higher than those based on procedure.
The second step of the analyses was to stratify the hospitals according to their teaching status, ownership, and accreditation level (see Table 5). In the teaching stratification, the general pattern applies to both teaching and non-teaching hospitals but correlations are higher in the teaching hospitals. With respect to ownership, the general pattern still holds true and correlations are much higher in the non-profit sector. For instance, in the non-profit category, the highest was between DRGCMI and ICD principal diagnosis based costliness CMI (ICDCMI-DC), which reached 0.938. Analysis by accreditation status produces somewhat more ambiguous results. For instance, the highest correlation at the medical center level was found between DRGCMI and ICD procedure based costliness CMI (ICDCMI-PC) at 0.767 instead of between DRGCMI and ICD principal diagnosis based costliness CMI (ICDCMI-DC). But the highest correlations tend to be at medical centers and the lowest at non-accredited institutions. The biggest contrast is found with respect to LOS where the medical center correlation is 0.534, compared to 0.046 at non-accredited institutions.
An overview of subgroup analyses (see Table 5) reveals that diagnosis based indices produce higher correlations than procedure based indices in all cases except for medical centers, and there the differences are slight and probably non-significant. With this rather unimportant exception, it is noteworthy that the correlations for ICD principal diagnosis based costliness CMI (ICDCMI-DC) are always higher than any of the others. In the LOS analyses, the diagnosis based indices generally produce higher correlations than the procedure based indices with three exceptions: teaching hospitals, regional hospitals, and non-accredited hospitals.
In summary, generally speaking, the best substitute for DRGCMI is ICD principal diagnosis based costliness CMI (ICDCMI-DC). After stratification, the best substitute for DRGCMI in both the teaching and non-teaching categories is still ICD principal diagnosis based costliness CMI (ICDCMI-DC). The same applies to the public sector, the non-profit sector, and the private sector. For medical centers, the best substitute for DRGCMI is ICD procedure based costliness CMI (ICDCMI-PC). For regional, local and non-accredited hospitals, the best substitute for DRGCMI is still ICD principal diagnosis based costliness CMI (ICDCMI-DC).
In our final ranking, correlation coefficients above 0.8 were deemed as good, between 0.8 and 0.5 as fair, and below 0.5 as poor. The reasoning behind this classification is that a correlation coefficient of 0.5 only explains 25% of the variation. Any approximation catching less than 25% of the variation can hardly be described as having good performance. On the other hand, when assessing multicollinearity for regression analyses, a frequent practice is to examine the bivariate correlations among the independent variables, looking for coefficients of about 0.8 or larger [10]. As a result, if the correlation coefficient was higher than 0.8, we were inclined to conclude that the measure was a good proxy or substitute for the well-established standard which was DRGCMI in this instance.
If we simply look at the sample hospitals as a whole, the best alternative to DRGCMI was ICD principal diagnosis based costliness CMI (ICDCMI-DC) as indicated in Table  4. As the performance assessment standard indicated, all of the ICD-based CMIs were shown to be fairly well correlated with DRGCMI. However, among non-DRG methods, the ICD principal diagnosis based costliness CMI (ICDCMI-DC) appears to be the best among those fair substitutes for the DRGCMI. In the comparison between the ICD principal diagnosis based CMIs and ICD procedure based CMIs, the ICD principal diagnosis based CMIs perform better in terms of correlation coefficients under most circumstances. Since not all inpatient admissions have been accorded procedure codes, for instance, most medical inpatients will not have procedure codings, that's probably why ICD principal diagnosis based CMIs correlate better with DRGCMI. The other finding worth noting is that cost based indices performs better than LOS based indices. Since our gold standard, DRGCMI, is in effect a cost based index, it is reasonable that it correlates better with cost based CMIs than LOS based CMIs.
On the other hand, the performances of all indices could also be compared from different angles through stratification. After stratification, all in all, ICDCMIs are mostly fair substitutes for DRGCMI, and ICD principal diagnosis based costliness CMI (ICDCMI-DC) still appears to be the best among those non-DRG based CMIs. ICD principal diagnosis based costliness CMI (ICDCMI-DC) even had good performance in the non-profit sector. Non-profit hospitals have the largest market share in Taiwan. The patient profiles of the non-profit sector might be more representative of the general patient population, which adds to the validity of substituting ICDCMI-DC for DRGCMI. ICD procedure based costliness CMI (ICDCMI-PC) only outperforms ICD principal diagnosis based costliness CMI (ICDCMI-DC) in medical centers. Medical centers are supposed to be tertiary care hospitals, and as a result likely to be more procedure intensive. So it makes sense that we can differentiate severity gradient better among medical centers from the procedure perspective in our study. All in all, the correlations are lowest for the non-accredited hospitals. This is likely due to the fact that non-accredited hospitals comply with fewer standards and there is less consistency across their operations. However, the different performances of the same method in various strata indicate the possibility of the irony that there is no panacea in our search. We simply have to apply the most appropriate method under given circumstances.   [11]. The APR-DRG is considered to be an enhancement of DRG structure by adding four severity subclasses to each DRG [12]. The ICISS proved to be superior to both the DRG and APR-DRG in predicting all three.
The foremost limitation of this study is the reliability of coding by hospitals. We relied heavily on the administra-tive data submitted by hospitals to the BNHI. There are several potential problems with the ICD coding. First of all, hospitals might upcode intentionally to gain reimbursement advantages. Second, hospital staff might not be trained sufficiently well to code correctly. Third, some of the data we needed in our study were not mandated by the government to be coded. For instance, the procedure code is not required to be recorded other than for some specific procedures. Most hospitals tend to code only major procedures, such as operations. Therefore, we might have underestimated the procedure intensity in many facilities. Nonetheless, administrative coding data are the most convenient health services research data sources. The validity of applying administrative coding data has been established to a certain extent in various settings. For instance, a Canadian study indicated that administrative data generally agree with patient chart data for recording of comorbidities in calculating the Charlson index, κ value ranged from 0.87-0.34, although comorbidities tend to be under-reported in administrative data [13].
Applying Medicare DRGs to a non-American system also has inherent problems that limit the generalizability of our results. First of all, Medicare DRGs primarily apply to the aged, disabled, and patients with end stage renal disease (ESRD). If we simply based our analyses on the subset of Taiwanese patients that were aged, disabled or with ESRD, we might be able to anticipate stronger correlations in our comparisons. However, since our goal was to come up with a generic measure at the hospital level and Medicare DRG is merely the tool we chose to validate our approach, we opted for not leaving out any groups of inpatients so as to achieve the objective of finding a solution that can represent the whole hospital. Secondly, the reality that the different versions we used cannot be 100% matched poses another threat to reliability. For instance, Taiwan was still using the 1992 ICD-9-CM for reimbursement coding in 1998, and we had to apply the 2000 GROUPER to a 1998 Taiwan data set. However, in dealing with a huge data set in an international comparative study, we can only try to safeguard the reliability as much as possible. If all the above-mentioned coding problems can be more appropriately dealt with in future studies, it would be all for the better.
Another limitation warranting further study is the substitution of cost for charge in our analyses. As indicated above, this study basically followed Medicare's approach in coming up with costliness CMI, i.e., Medicare uses charges to estimate the severity of the case mix instead of costs, even though it operates under the guise of a costbased system. Correlations between cost -and chargebased weights had been proved to be very high in numerous studies. It has been said that within each year, the correlation between these two bases exceeds 0.997 [14]. However, we still cannot ignore the fact that charge is not cost per se. That is exactly why under the same reimbursement scheme, assuming all other factors being equal, some hospitals can have profits and some cannot. Medicare's current approach only accounts for the changes of case mix in reimbursement adjustment, yet fails to take into account real cost fluctuations. Therefore, if we were able to construct CMIs based on real cost, the CMI-based adjustment should be much more realistic and equitable.
Although it is quite difficult to acquire cost data, it is certainly a better approach worthy of acknowledging.

Conclusion
This study does not advocate that purely ICD-based CMIs can replace all the other more sophisticated systems, such as the Medicare DRG system or 3M's APR-DRG. However, our findings certainly strengthen the legitimacy of apply-ing the ICD system plainly for the purposes of establishing a resource consumption index to approximate severity. The results of our study indicate that an ICDbased CMI can quite fairly approximate the Medicare DRG CMI, especially the ICD principal diagnosis based costliness CMI (ICDCMI-DC). Therefore, the idea of substituting ICDs for DRGs in computing the CMI ought to be feasible and valid in countries that have not implemented DRGs.

Declaration of competing interests
The author(s) declare that they have no competing interests.
Publish with Bio Med Central and every scientist can read your work free of charge