The data consist of episodes of patient care derived from two separate six-month periods, the first of which is used to derive the analysis sample, with the second providing a follow-up sample for evaluating consistency over time. The same six months from adjacent years are used to minimize any seasonal bias across samples. This study is based on deidentified data and as such was granted exemption from ethical review by Case Western Reserve University.
The start of any given episode of care is defined as the first recorded OASIS assessment during the sample time frame for start of care or resumption of care. Resumption of care occurs when patients are returning to home health care after a transfer to an institutional setting. The end of a given episode of care is defined as the first discharge OASIS assessment occurring on or before 60 days following the start of the episode; if no discharge has occurred by 60 days, the closest OASIS assessment available at or before 60 days is used. Following the implementation of the home health prospective payment system for Medicare home health, CMS required agencies to conduct OASIS assessments every 60 days for patients still in care. Thus, the maximum length for an individual episode is 60 days. Because the time frames for each sample is 6 months, patients may experience multiple episodes within a sample.
Sample
There were a total of 54,732 and 51,560 patient episodes for the prognostic and follow-up datasets, respectively. Of these, 5,295 were excluded from the prognostic dataset and 3,876 were excluded from the follow-up dataset because they did not have outcome data (e.g. died, hospitalized) yielding a total of 49,437 and 47,684 episodes of care, respectively for the prognostic and follow-up datasets (see Figure 1).
Dependent variables
As a first step to developing alternatives to the conservative CMS-based index, we evaluate two-unit change frequencies for each individual ADL (e.g., minimum two-unit declines in each category: grooming, toileting, bathing, transferring, and bathing). Incremental change for a one-unit decline for the five ADLs and for a three-unit decline for the five ADLs are also evaluated. These univariate analyses are done for two reasons:
-
to determine which ADLs are driving the indices (i.e., comprising the largest frequency);
-
to evaluate the frequencies of the magnitude of functional decline (e.g., to determine if most of the declines were one unit, two-unit, or three unit changes).
The univariate analyses provide additional information. The nature of home health care agencies allows for many nurses and practitioners to provide care. Therefore, a one-unit change is not considered enough of a difference since it may be attributable to different clinicians' views of dependency and hence may lead to more measurement error. While physical functioning skills are considered among the most reliable outcome measures in the OASIS [7–9], a one-unit change may be attributable to measurement error based on clinical judgment and the methods by which the data are coded (i.e. observation versus interview).
When evaluating declines in three or more ADLs exhibiting a three-unit change, the frequencies are extremely low (less than 0.20 percent). Declines in two or more ADLs exhibiting a three-unit decline are more frequent (more than 0.50 percent) but, still very small and incapable of providing meaningful statistical analyses. A three-unit decline is considered to be too large of a change, resulting in even fewer episodes experiencing ADL declines. Consequently, for all three indices, substantial indicates a minimum two-unit decline.
In addition to the incremental change issues aforementioned, another issue is the number of ADLs which need to have a substantial decline, where substantial is a minimum two-unit decline. One of the alternative indices is defined as substantial decline in two or more ADLs and is referred to as gt2_ADLs. A second alternative index is defined as substantial decline in one or more ADLs and is referred to as gt1_ADLs.
Independent variables
There were 112 available covariates (including referent categories) between OASIS and other agency administrative data. These covariates fall into seventeen main categories. The number in parentheses indicates the number of covariates for each category. The constructs are: demographics (11), physician availability (2), financial factors (4), language (3), payment source (5), referral and personal health at baseline (16), pain assessment (4), integumentary status (3), respiratory status (1), sensory status (3), elimination status (5), housing and living arrangements (4), support and assistance in the home (8), neurological/behavioral/emotional status (5), medication assessment (7), ICD-9 primary diagnosis codes (22), ICD-9 primary diagnosis codes for the main diagnoses for this particular home health care agency (9). Physician availability is defined as the number of physicians available per 100,000 people per zip code and is divided into quintiles.
Univariate analyses
Because it is unclear which of the five ADLs is most appropriate, it is not desirable to use an individual ADL outcome [10, 11]. However, incremental changes in ADLs are evaluated to determine if any specific ADL is a major contributor for declines experienced in the indices. An additional concern when examining ADL outcomes is the number of cases excluded due to 'floor' effects. If an ADL as measured at the start of an episode is already at a level of functioning such that it is impossible for a decline to be identified by an index, it is dropped from consideration. However, this introduces a potential bias in excluding from the calculation of the respective indices those episodes beginning with greater severity in ADL functioning. Thus, gt3_ADLs can only be calculated for episodes that begin with the possibility of worsening in three of the five ADLs; therefore, episodes at the 'floor' for gt3_ADLs are those unable to worsen in three or more ADLs. Likewise, for gt2_ADLs, 'floor' episodes are unable to worsen in at least four ADLs; and 'floor' episodes for gt1_ADLs are unable to decline in any of the ADLs.
Multivariate analyses
We estimate a series of multivariate logistic regression models using SAS version 8.0 [12]. Variables included in the initial models are selected among the full set of independent measures, using three criteria: 1) a minimum of 5% frequency criterion; 2) risk factors with a correlation of 0.80 or higher presumably measure the same item [13] and when this occurs, one of the variables is randomly eliminated; and 3) bivariate analyses including the weighted baseline ADL status [14] due to its well documented association with functional decline [15–17], with p < 0.10 used as the selection criterion because the association between functional decline and many risk factors is unclear. Baseline ADL scores are weighted by dividing the score for each individual ADL by the total possible and then the five weighted scores are summed [18]. In the final step, forward stepwise multivariate logistic regression is configured to use a selection criterion of p < 0.05 [19].
Analytical measures
For the multivariate logistic regression analyses, the c-statistic and the Hosmer-Lemeshow Goodness of Fit test are compared for each model. The c-statistic, calculated as the area under the Receiver Operating Characteristic (ROC) curve, represents the concordance between predicted probabilities and observed outcomes for all possible pairs of patients [20, 21]. A c-statistic of 1.0 indicates perfect predictive discrimination while a value of 0.50 indicates the model performs no better than chance alone. A c-statistic over 0.70 is considered acceptable while values higher than 0.80 are considered excellent [22].
The Hosmer-Lemeshow Goodness of Fit test examines model calibration across a range of predicted probabilities [13], producing a single summary measure of the match between predicted and actual outcomes within deciles of the data. Within each decile, deviations between observed and expected number of declines in functional status are observed and expected numbers of episodes not experiencing functional decline are measured. In order to determine whether these deviations are larger than expected, these deviations are summed over the ten groups and compared to a χ2 with 8 degrees of freedom [22]. For the Hosmer-Lemeshow test, a p-value that is not statistically significant is indicative of reasonable model fit. Consistency over time is assessed by comparing the models' results from the prognostic and follow-up datasets.