Health and resource burden of a cancer diagnosis on the caregiver: an analysis of administrative claims data

Purpose Cancer diagnosis is known to affect the family; however, administrative claims data are not commonly used to evaluate the broader impact of cancer diagnosis. This study was designed to evaluate the feasibility of using claims data to explore the impact of cancer diagnosis on the caregiver. Methods IBM Marketscan data were used to identify eligible cancer patients, who were required to have a second adult over the age of 18 (defined as “caregiver” for this study) covered by the same the healthcare policy. Eligible control pairs included any two adults in the same policy with no evidence of cancer; for each pair one adult was randomly assigned to be the “patient control” while their partner was assigned as “caregiver control”. Probabilistic stratified sampling was used select control pairs for analysis by matching the relative frequencies within sex and age group strata to those of patient/caregiver pairs. Eligible control pairs were probabilistically sampled without replacement until the stratum with at least 0.5 % relative frequency had been completely sampled. Caregiver and caregiver control healthcare resource utilization (HCRU), new diagnoses, and healthcare costs were compared during the 12-month post-diagnosis period. Subgroup analyses were conducted by cancer subtypes (breast, colorectal, lung, gastric, sarcoma) and by sex of the patient and caregiver. Results A total of 62,893 patient/caregiver pairs and 449,177 control pairs were included. Overall, caregivers used slightly fewer healthcare resources and expended less costs during the 12-month period after the cancer diagnosis than controls (physician visits; 85.8 % vs. 95.7 %; hospitalizations 5.4 % vs. 7.0 %; emergency room visits 15.7 % versus 16.2 %, all p ≤ 0.001). This finding was consistent in all subgroup analyses. New diagnoses were lower in the caregiver cohort, except for mental disorders, which were higher than controls (14.3 % vs. 9.9 %, p < 0.0001). Psychotherapeutic/antidepressant utilization occurred among 21.0 % of caregivers versus 17.2 % of caregiver controls during this period. Conclusions It is feasible to use administrative claims data to evaluate the impact of a cancer diagnosis on the caregiver to evaluate outcomes such as HCRU, diagnoses and costs. These findings raise hypotheses about deferment of health care and increased mental distress during the caregiving period.


Background
In the United States (U.S.), there are approximately 1,898,160 new cancer diagnoses and 608,570 cancer deaths expected in 2021 alone [1]. Unfortunately, a cancer diagnosis not only affects the person receiving the diagnosis but has an impact on the entire family unit. The burden of a cancer diagnosis to the broader family, and particular to the adult partner or caregiver, is often underrecognized in retrospective observational research, largely due to challenges related to limited real-world data that may be used to quantify the broader impact of a cancer diagnosis.
Caregiving is typically defined as informal support from informal family members whose time and efforts are not covered by insurance. These efforts may include increased financial responsibilities, driving to and from health care appointments, increased responsibilities in the home, such as cleaning and meal preparation, as well as ensuring medication and nutrition intake is maintained. Unlike home health or nursing support, the cost of caregiving is not a reimbursable expense, and individuals caring for a family member with cancer have been documented to suffer loss of employment, reduced productivity, and working extra hours and at lower paying jobs to accommodate the schedule needed to care for a loved one with cancer [2,3].
Caregivers overall are generally female (65 %) with an average age of 69.4. Only 9 % of caregivers have selfidentified as lesbian, gay, bisexual or transgender (LGBT) [4]. While research has quantified the potential costs of time and resources used for informal caregiving [5], few studies have evaluated the impact of caregiving at a population level. What is known about caregivers has been obtained through surveys or qualitative interview data, which have established the range of challenges faced by caregivers. Caregivers of cancer patients have reported anxiety, depression, sleep disturbances, as well as declining quality of life and mental health [6][7][8]. The evidence related to physical health is less consistent, with only about half of all studies in a review of the literature finding associations between morbidity and caregiving [8].
This study was designed to explore the feasibility of using large administrative claims databases to quantify the impact of cancer diagnosis on an adult caregiver using data resources that allow for large, representative samples of unselected individuals. The goal of this research was to determine if the quantification of caregiver burden associated with cancer may be improved by using large databases for research. This study therefore investigated the hypothesis that health care resource utilization, new diagnoses, and costs would be higher among caregivers compared to matched controls.

Database
This was a retrospective observational study that utilized the IBM (International Business Machines) Health, formerly Truven, MarketScan® databases, which were used under license for the current study. These databases contain de-identified HIPAA-compliant fully integrated patient-level inpatient, outpatient, and drug data from commercial, Medicaid and employer-sponsored Medicare supplemental plans. The databases reflect the real-world healthcare experience of employees, retirees, and dependents covered by the health benefit programs of large employers. The data are collected from approximately 350 different insurance companies and thirdparty administrators. Marketscan databases have been used in over 300 peer-reviewed articles published in leading journals since 1990. De-identified data are not considered human subjects research according to the U.S. Code of Federal Regulations [9].

Cohort identification and inclusion criteria
Family units were identified in the database as two adults over the age of 18 recorded within the same health care policy (one adult was required to be the primary policy holder). To be considered for the cancer cohort, one of the two adults in the family unit was required to have at least two cancer codes reflecting the same anatomical site (e.g. breast, lung, colorectum) on different dates within a 91-day window. The initial cancer diagnosis was required to occur between January 1, 2011 andDecember 31, 2018. Additionally, the cancer patient was required to have ICD codes for metastatic disease. The second adult in the family unit covered by the health policy was defined as the caregiver. The index date was defined as the first date when a metastatic cancer diagnosis code was observed for the patient. A minimum of 180 days of pre-index continuous enrolment was required for both the cancer patient and the caregiver. Caregivers were further required to have no evidence of cancer within ± 12 months of the index date and to have at least 180 days of post-index continuous enrollment. No a priori sample size was fixed since the intention of the study was to include the maximum number of eligible due to the high number of patients in the Marketscan database.
Control pairs were selected according to a two-step process. First, similar to the cancer cohort, a set of eligible adult control pairs (family units) were identified, with one of each pair randomly assigned as the "patient control" and the other assigned as "caregiver control". To be eligible, each control pair must have similarly consisted of two adults who were part of the same health care policy. Each control pair was required to have records within the same time period as specified for the patient/caregiver pairs. Each caregiver control was required to have 180 days of continuous enrollment both prior to and following a randomly selected index date. Control pairs were excluded from eligibility if either adult had any evidence of cancer at any time in the database. Second, a probabilistic stratified sampling method was applied to match control pairs to patient/caregiver pairs. Strata were identified among the patient/caregiver cohort based on 4 variables: the sex and age of each of the patient and caregiver. Age in years was categorized into 4 groups (< 40, 40-54, 55-64, or ≥ 65), so that there were potentially up to 64 strata as determined by all possible combinations of age category and sex within pairs. Probabilistic stratified sampling of control pairs was therefore intended to replicate the relative percentages of patient/caregiver pairs between strata, while also maximizing the total quantity of control pairs included in the study. Therefore, the probability of sampling control pairs within any particular stratum was set equal to the relative percentage of patient/caregiver pairs from that stratum, with all eligible control pairs sampled without replacement until the first sufficiently large stratum (those with at least 0.5 % relative frequency) had been completely sampled. This procedure ensured nearly identical relative frequencies of representation between patient/caregiver pairs and control pairs within all large strata, reasonably balanced representation between patient/caregiver pairs and control pairs within smaller strata, while approximately maximizing the sample size of control pairs overall. Index dates for the control pair were randomly selected between January 1, 2011 and December 31, 2018, as no metastases were available to define the index date in the control pair as they were required to have no evidence of cancer. For all cohorts, if there was more than one adult family member holding the same policyholder value, these cases were excluded due to potential data entry errors and lack of ability to clearly define a single 'caregiver.' Follow-up data were available through December 31, 2019 at the time of analysis.

Statistical analysis
The overall goal of the analysis plan was to compare health care resource utilization (HCRU), any new diagnoses, and costs between caregivers versus matched controls (caregiver controls). The study objectives were designed to test the hypothesis that each of these would be higher among caregivers than among caregiver controls.
Baseline demographic characteristics of the patient, patient control, caregiver, and caregiver control were compared using unadjusted comparisons from Student's t-test for continuous measures and chi-squared test for categorical measures.
HCRU outcomes were compared between the caregiver and caregiver control, including medication utilization, physician visits, emergency room, urgent care, hospitalizations, and surgical procedures. Comparisons of the matched cohorts were conducted using Student's t-test for continuous measures and Chi-squared test for categorical measures.
New health care diagnoses were evaluated by grouping diagnostic codes consistent with the Current Procedural Terminology (CPT) manual. New diagnoses were those that first appeared on or after the index date, and were compared between the caregiver and caregiver control using Student's t-test for continuous measures and Chisquared test for categorical measures.
Costs (both payer and patient out of pocket) were compared between the caregiver and caregiver control using T-test. Additionally, the non-parametric Wilcoxon Rank Sum test was conducted as costs are often not normally distributed, with all costs adjusted to 2018 U.S. dollars using the Medical Care Component of the Consumer Price Index. The primary time of analysis was limited to the 1-year post index period; however, additional analyses were conducted throughout the follow up time period. Due to differential follow up after the initial 1-year period, costs were also evaluated as average monthly costs.
Lastly, while the actual dates of death of cancer patients in this study were not a part of the database, the last activity date in the database might reasonably be assumed to approximate date of death for many of the cancer patients with advanced (metastatic) disease. For those caregivers who remained in a health care plan after the last activity date of the cancer patient, HCRU, new diagnoses, and costs were described to allow evaluation of caregiver outcomes after the possible death of the cancer patient. No imputation was made for missing variables. All analyses were conducted using SAS Enterprise Guide 7.1.

Subgroup analyses
Due to the heterogeneity of the set of diseases within the broad definition of 'cancer,' a series of subgroup analyses were planned a priori, and included analyses by primary cancer site (breast, colorectal, gastric, lung cancer, and sarcoma) as well as by sex (male cancer patient, female cancer patient, and by cancer patient-caregiver pairs of the same sex). For each cancer subtype, control pairs were re-selected following the same general procedure as described above for the selection of control pairs generally for the overall study. Within each cancer site, the probability of sampling control pairs within any particular stratum was set equal to the relative percentage of patient/caregiver pairs from that stratum within that cancer subtype, with all eligible control pairs sampled without replacement until the first stratum (those with at least 0.5 % relative frequency) had been completely sampled.

Results
A total of 62,893 patient/caregiver pairs and 3,054,094 control pairs were eligible for inclusion in this study. The cohort eligibility diagram is presented in Fig. 1.
After applying the selection process there were 449,177 control pairs included in the study. Of the eligible patient/caregiver pairs: 13,174 were included in the breast cancer subgroup; 7,128 in the colorectal cancer subgroup; 1,308 in the gastric cancer subgroup; 9,600 in the lung cancer subgroup; and 907 in the sarcoma subgroup. There were 29,841 in the male patient subgroup, 33,052 in the female patient subgroup, and 458 where the cancer patient and caregiver were the same sex. The results of the selection process for control pairs are summarized in Table 1 and the results by subgroup are summarized in Table 2.

Health care resource utilization and costs
The most common medications used by caregivers and by caregiver controls during the 6-month pre-index period as well as during the 12-month post index period are summarized in Table 3.
During the pre-index period, caregivers and caregiver controls had similar medication use (all within 1-2 % points). During the 12-month post-index period, slight numeric differences were observed, with psychotherapeutics/antidepressants utilized among 21.0 % of caregivers versus 17.2 % of caregiver controls, and benzodiazepines used among 12.9 % of caregivers versus 9.0 % of caregiver controls. Differences in medication utilization were most pronounced in the gender subgroups, with 28.9 % of caregivers of male patients using psychotherapeutics/antidepressants during the 12-month post index period versus 23.0 % of caregiver controls to male patient controls. For caregivers and caregiver controls of female cancer patients/controls, these drugs were used by 13.8 and 11.9 % during the 12-month post-index period. Among same-sex patient/caregiver and control pairs, utilization was nearly identical, with 27.5 % of caregivers versus 27.0 % of caregiver controls receiving psychotherapeutics/antidepressants during the 12-month post-index period.
There were slightly fewer health care encounters among caregivers versus controls during the 12-month post-index period. Physician office visits occurred among 85.9 % of caregivers versus 95.7 % of controls (p < Fig. 1 Cohort eligibility diagram. Each N represents a pair of individuals within the same healthcare policy  Not including systemic cancer therapies; generic drug names were used to identify unique drugs

New diagnoses
There were statistically significant differences in several new diagnoses between caregivers and caregiver controls, with caregivers having a greater number of new diagnoses in the code range for mental disorders versus controls (14.3 % versus 9.9 %, p < 0.0001). All other categories of diagnoses did not vary more than 1-2 % between caregivers and controls. The raw difference in new mental disorder diagnoses varied by subgroup, but all followed a similar pattern with caregivers having a greater proportion of new mental disorder diagnoses; all p < 0.0001 versus caregiver controls other than the same-sex patient subgroup, which was not statistically significant (Fig. 3).

Caregiver outcomes after the cancer patient death
Of all the caregivers, 19,823 remained in the health plan after the cancer patient death (estimated based on last activity date of the cancer patient). The characteristics of this caregiver group was similar to the overall caregiver cohort; 49.3 % were female and surviving caregivers had a mean age of 59.2 (SD = 11.1) ( Table 4). The median duration of follow-up after the  estimated death of the cancer patient was 18.1 months (interquartile range, IQR = 7.0-36.1). Healthcare resource utilization during this period is summarized in Fig. 4. During the initial 12-month period after the death of the cancer patient, physician office visits were observed among 78.7 % and emergency room visits occurred among 15.6 %. New diagnoses are summarized in Table 5 for the overall caregiver cohort both during the caregiving period as well as after the death of the cancer patient.
The most common diagnoses during the 12-month post-death period observed were symptoms, signs, and abnormal clinical and laboratory findings (27.9 %), diseases of the musculoskeletal system and connective tissue (21.8 %), endocrine, nutritional/metabolic and immunity disorders (20.5 %), and diseases of the nervous system and sense organs (20.2 %).

Discussion
This study examined the ability of administrative claims data to be used to examine the impact of a cancer diagnosis on caregivers (defined as adult co-policy holders of the cancer patient in this study). While the amount of time invested in informal caregiving is not recorded in claims databases, the findings from this study suggest that during the year following diagnosis, adult caregivers may forego health care for themselves as their focus is on the health and wellbeing of the cancer patient. This was demonstrated by the consistent lower rate of health care resources and costs expended versus a matched control cohort overall as well as across tumor site-and sex-specific subgroup analyses.
Despite less frequent health care encounters, caregivers had significantly greater diagnoses in the range of mental disorders during this time period. These findings are consistent with prior published literature that have reported caregiver anxiety, depression, and declining mental health [6][7][8]. In the current study, the observed differences were largest among caregivers of male patients, most of whom were female caregivers. This pattern was also observed among cancers that were more often diagnosed among men, such as lung cancer. The only caregiver subgroup that did not show any significant difference in mental disorders versus controls was Table 3 Most commonly used medications (by class) among caregivers and caregiver controls during the 6-month pre-index and 12-month post-index period, respectively a

Medications
Six  Table 4 Characteristics of caregivers remaining in health plans after the death a of the cancer patient Characteristics at index date among the same sex cancer patient/caregiver pairs. This may in part be due to the small sample size that limits the ability to detect differences, or simply due to the true lower frequency of diagnoses among caregivers during this one-year period. Same-sex caregivers did not appear to have different rates of healthcare encounters (e.g. physician visits) than the other caregiver groups during the observation period. The exact diagnoses observed were not evaluated in this study, but warrants further investigation. The diagnoses within the range of mental disorders and specific medications prescribed should be explored in future study to better understand what is occurring. Due to limited variables in the database, some assumptions had to be made when interpreting the variables in this dataset. There are reasons why the cancer patient may no longer be covered by insurance after diagnosis for reasons other than death. It is possible the cancer patient discontinued their health care plan to receive Medicare without continuing the commercial supplement, while other household members stayed on the commercial plan. If this were the case, the cancer patient may no longer be observed in the database and could have incorrectly been assumed to have died. In this study, the assumption was that most people with cancer who were no longer observed in the database while the caregiver partner continued to have claims submitted would be due to death, but no data were available to further clarify if a death had occurred. Therefore, the cohort of patients followed after the death of the cancer patient could have included some individuals who were continuing to be caregivers for the cancer patient but whose care was no longer being recorded in claims.
Additionally, the relationship of the adult caregiver to the patient is unknown. It would be expected that for a health care policy to be shared among adults that most of these individuals would be spouses or domestic partners; however, adult children could have been included in the policy. In the case of multiple adults within the same policy, the caregiver was selected as policy holder 2, which is typically the spouse/partner. However, an adult child could have been policy holder 2 in the case of a single-parent household with coverage through the Affordable Care Act, which extended health care coverage through age 26. While the assumption was made that all caregivers are likely spouses or adults in domestic partnerships, and the age distribution of caregivers suggests this assumption was not incorrect, the nature of the cancer patient-caregiver relationships could not be verified in this database. While the risk of including a child age 18-26 was low due to the higher age of onset of metastatic cancer diagnoses as observed in this study, future research of diseases more common in younger adults may wish to exclude or further evaluate the cases that include a partner who is younger than 27 years of age to determine the risk of inclusion of a child-parent relationship.
These data also do not contain information to verify actual caregiving activities. There may have been other formal or informal caregivers who performed these activities for the individual diagnosed with cancer. Therefore, the caregiver in this study can only be verified as an adult member of the household. Attributing a caregiver role to this individual assumes that Fig. 4 Healthcare resource utilization among caregivers after the death a of the cancer patient (n = 19,823). a Death was assumed based on the last activity date of the cancer patient in the database some responsibilities were taken for the care of the patient during this time, but also is not verifiable in this database.
The strengths of a large database provide more representative and generalizable data about the impact of a metastatic cancer diagnosis on adult caregiver family members than previously published. This study suggests that even with the limitations of the variables collected, this can be investigated. However, the strength of a large dataset also leads to many significant findings that may not have meaningful values simply due to the power of a large dataset to detect very small differences. In this study, we did not report all significant findings, but those that were also associated with a difference in rates or point estimates that are may be large enough to represent meaningful differences between groups to avoid overstating the role of statistical significance.

Conclusions
It is feasible to use administrative claims data to evaluate the impact caring for a patient with a metastatic cancer diagnosis. These findings raise hypotheses about the potential deferment of health care during the caregiving period, and the increased distress of this time as observed by mental disorders diagnosed and medications utilized. This study establishes the strengths of claims data to further investigate the challenges of caregiving to provide data that can inform the development of novel solutions to care for the caregiver during a time when their own wellbeing may be neglected.
Abbreviations CPT: Current Procedural Terminology; HCRU: Health care resource utilization; IBM: International Business Machines; LGBT: Lesbian, gay, bisexual or transgender; SD: Standard deviation Table 5 New healthcare diagnoses among caregivers during the 12-month post-index period and after the death of the cancer patient Diagnostic Category a During the 12-month post-index period (n = 62,893) During the initial 12-month period after the death of the cancer patient (n = 19,823) Any time after the death of the cancer patient (n = 19,823) n (%) n (%) n (%)