Development of a multivariable prediction model to identify patients unlikely to complete a colonoscopy following an abnormal FIT test in community clinics

Background Colorectal cancer (CRC) is the 3rd leading cancer killer among men and women in the US. The Strategies and Opportunities to STOP Colon Cancer in Priority Populations (STOP CRC) project aimed to increase CRC screening among patients in Federally Qualified Health Centers (FQHCs) through a mailed fecal immunochemical test (FIT) outreach program. However, rates of completion of the follow-up colonoscopy following an abnormal FIT remain low. We developed a multivariable prediction model using data available in the electronic health record to assess the probability of patients obtaining a colonoscopy following an abnormal FIT test. Methods To assess the probability of obtaining a colonoscopy, we used Cox regression to develop a risk prediction model among a retrospective cohort of patients with an abnormal FIT result. Results Of 1596 patients with an abnormal FIT result, 556 (34.8%) had a recorded colonoscopy within 6 months. The model shows an adequate separation of patients across risk levels for non-adherence to follow-up colonoscopy (bootstrap-corrected C-statistic > 0.63). The refined model included 8 variables: age, race, insurance, GINI income inequality, long-term anticoagulant use, receipt of a flu vaccine in the past year, frequency of missed clinic appointments, and clinic site. The probability of obtaining a follow-up colonoscopy within 6 months varied across quintiles; patients in the lowest quintile had an estimated 18% chance, whereas patients in the top quintile had a greater than 55% chance of obtaining a follow-up colonoscopy. Conclusions Knowing who is unlikely to follow-up on an abnormal FIT test could help identify patients who need an early intervention aimed at completing a follow-up colonoscopy. Trial registration This trial was registered at ClinicalTrials.gov (NCT01742065) on December 5, 2012. The protocol is available. Supplementary Information The online version contains supplementary material available at 10.1186/s12913-020-05883-2.


Background
Colorectal cancer (CRC) is the 3rd leading cancer killer in the United States. Mailed fecal immunochemical testing (FIT) outreach programs can effectively increase CRC screening rates among underserved populations [1][2][3]. The Strategies and Opportunities to STOP Colon Cancer in Priority Populations (STOP CRC) project aimed to increase CRC screening among patients in Federally Qualified Health Centers (FQHCs) through mailed FIT outreach [4]. The FIT looks for hidden blood in the stool, which may be a sign of polyps or cancer in the colon or rectum. An abnormal test means that blood was found in the stool. For these patients, a follow-up colonoscopy is recommended, yet rates of completion of the follow-up colonoscopy remain low [5][6][7]. Among patients who receive care in community clinics in the United States, follow-up colonoscopy rates are consistently as low as 50% [7][8][9][10]. Liss and Chubak have identified rates of achieving a follow-up colonoscopy after an abnormal FIT test of 54 and 50% respectively in community health center patients [11,12]. This is of concern because delaying a follow-up colonoscopy up to 12 months following an abnormal fecal test is associated with increased cancer diagnoses and advanced cancer stage at the time of diagnosis [13].
Barriers can inhibit patients' ability to complete this follow-up colonoscopy. Patient-level barriers to completing a follow-up colonoscopy may include fear of results, inability to take time off of work, the cost of preparation supplies or the colonoscopy, inability to complete adequate bowel prep, difficulty finding a driver on the day of the procedure, having competing health concerns, and lack of understanding that the procedure was necessary [7,10,14,15].
Provider and system-level barriers may include limited colonoscopy capacity, failure to refer the patient to the specialist or schedule the procedure, failure to communicate expectations about the procedure or preparation for the procedure, and lack of adequate workflows to complete the referral [6][7][8]. Interventions like patient navigation, where a navigator helps address these barriers to screening, can close the gaps and improve follow-up rates [15,16].
While it is likely cost-prohibitive to "navigate" all patients with an abnormal FIT, stratifying the patients in the greatest need of navigation could target resources to close gaps in screening. The use of the electronic health record (EHR) to identify patients at risk for failure to follow-up on abnormal screening, who may be candidates for personalized interventions, may improve the precision of healthcare delivery [17]. Therefore, we aimed to develop a multivariable prediction model using patient level data only available in the EHR to identify patients who are unlikely to undergo colonoscopy following an abnormal FIT test. We hypothesized that we could accurately predict which patients have a low probability of obtaining a colonoscopy.
Knowing who may be at risk for not adhering to recommendations for a follow-up colonoscopy after an abnormal FIT test could help providers and clinics identify patients in need of early interventions (including patient navigation) aimed at completing a colonoscopy. Precision delivery of interventions to those most likely to benefit might optimize patient outcomes and enhance opportunities to sustain successful interventions in lowresource settings.

Methods
To predict each patient's probability of obtaining a colonoscopy, we developed a risk prediction model using data from patients with an abnormal fecal test at the 26 STOP CRC clinics. We followed guidelines set forth by the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnoses statement [18,19]. This model was designed to be put into practice at community clinics using data available in the EHR. Our objective was to predict patients who may benefit from interventions to complete the recommended follow-up.

Setting and participants
This retrospective analysis used data from the STOP CRC project and included eligible patients who have returned a FIT with an abnormal result during the study period. OCHIN, formerly the Oregon Community Health Information Network, is a nonprofit health information technology services provider that provides a centrally hosted EHR for primary care clinics. The STOP CRC project included 26 clinics in Oregon and California that served as the setting for this cohort. This project was approved by the Kaiser Permanente Northwest IRB (Protocol #4364). Clinics operated in diverse settings were diverse in size and were part of 8 health centers.
To be eligible in the STOP CRC study, patients had to have been 50-75 years old and not up to date with CRC screening including fecal testing in the past 11 months or colonoscopy in the past 9 years. Patients were excluded from STOP CRC if they had co-morbid conditions that would make screening inappropriate, such as a history of CRC, colectomy, or dialysis. Our complete inclusion criteria are described elsewhere [20]. We then assembled a complete retrospective cohort of STOP CRC patients who subsequently completed CRC screening by FIT test and obtained an "abnormal" result. All patients with at least one abnormal FIT result from February 4, 2014, through February 28, 2016, were identified (n = 1723). For patients with more than one abnormal test result, the date of the most recent result was time zero, the start of follow-up.

Outcome and duration of follow-up
The outcome measure of interest was whether a patient received a colonoscopy within 6 months of receiving their abnormal FIT test result. For the Cox model, the outcome was determined if a colonoscopy was completed within 180 days following the abnormal FIT test. Patients were not censored for loss to follow-up, as community clinics do not track membership. Completed colonoscopies were determined through procedure codes in the EHR.

Predictor characteristics
We selected variables for our risk prediction model based on previous studies that identified predictors of failure to complete CRC screening or colonoscopy, but limited variables to those that would be available in the EHR in these community clinics (Table 1). Predictor characteristics were measured during the year before time zero unless otherwise specified. Predictors included clinic systems, patient demographics, community level characteristics, selfreported behavior (e.g., smoking history), clinical findings (e.g., body mass index, and the number of missed appointments), medications (e.g., antihypertensive medications), and diagnoses (e.g., history of cardiovascular disease). All coding and measurement of variables are described in the Additional file 1. Community data variables were collected at the Census tract level for all variables except for emergency department (ED) visits per 1000 enrollees; this was collected at the county level. Community-level variables were obtained from the ADVANCE Clinical Data Research Network, which is a data-source integrated into the OCHIN data [21].

Statistical analysis
We evaluated the characteristics predicting follow-up colonoscopy using a Cox proportional hazards model and a logistic model in SAS® System Software. We fit a full model of patients with complete data and used a step-down process to manually remove the weakest characteristics one covariate at a time to simplify the model so that the final model retained at least 90% of the variation explained of the full model. For the final model, we calculated the mean observed risk of completing the colonoscopy and plotted mean observed and predicted risks in quintiles using risk predictiveness curves that showed the distribution of predicted risks of completing the colonoscopy [22]. Discrimination was measured by a bootstrap corrected C-statistic. Variation explained was measured with an R 2 statistic. The Cox regression coefficients were then translated into a simplified point-based risk scoring system to improve use in the clinical setting. A higher number of points mean a higher likelihood of completing a follow-up colonoscopy [23]. This allows clinicians to translate the model into practice without calculating the regression equation exactly. Table 2 shows the expected and observed probability of completing a colonoscopy within 6 months of an abnormal FIT test by points. The points assignment reflects the variations in hazard ratios across patient characteristics. The clinician could add up the points to determine likelihood of completing the follow-up colonoscopy.

Results
Of 11,622 patients with a completed fecal test, 1723 (14.8%) were abnormal, and 699 (40.6%) of those had a subsequently completed colonoscopy in their EHR record within 12 months (Fig. 1). However, only 597 (34.6%) of those patients had a record of a completed a colonoscopy within 6 months of their abnormal FIT test. For this analysis, one small clinic system was excluded due to low numbers of patients with abnormal FIT results (n = 13). We also only included patients with nonmissing data for all predictors (n = 1596). Of the 1596 patients included in the final model, 34.8% (n = 556) had recorded completed colonoscopies within 6 months. Table 1 illustrates all baseline characteristics for the entire cohort and the subgroup that had a recorded completed colonoscopy within 6 months. Overall, patients were typically white (83.3%), aged 50-64 (81.5%), and had a low rate of preventive screenings: flu shots (14.3%); prior CRC screening (38.3%)). Only eight variables were retained for the final model as they contributed to the explained variation in risk.
The eight characteristics retained in the final Cox regression model included age, race, insurance, GINI income inequality, long term anticoagulant use, receipt of a flu vaccine in the past year, frequency of missed clinic appointments, and health center (Table 3). No notable differences were determined when the model was run for men and women separately, so therefore we combined men and women to develop one model. Table 3 also shows hazard ratios, confidence intervals, and the number of risk points assigned to each characteristic. The hazard ratios and risk score points for the final prediction model indicated that health center, age, long term anticoagulant use, and receipt of a flu vaccine in the past year were the variables with the highest points assigned in the model.
The mean predicted risk of completion of colonoscopy was 34.8%, and the model was able to accurately predict the patients who were least likely to receive a follow-up colonoscopy (lowest two quintiles, 15.9, and 28.5% respectively). The likelihood of obtaining a follow-up colonoscopy within 6 months varied across quintiles: patients with the highest predicted risk of non-adherence (bottom quintile) had an estimated 16% chance of obtaining a colonoscopy; whereas, patients with the lowest predicted risk of non-adherence (top quintile) had a greater than 55% chance of obtaining a follow-up colonoscopy. Figure 2 shows the predictiveness curve for colonoscopy completion. The open circles are the observed proportions (o) and the line represents the predicted probability of colonoscopy completion. Risk score points can be assigned to a patient to determine their risk of completing a colonoscopy. For example, we can score a patient who is on Medicaid (15 points), white (34 points), 54 years old (83 points), receives his care at health center 3 (100 points), has not missed appointments (31 points), has received a flu shot (40 points), isn't on anticoagulants (54 points) and lives in an area with low-income inequality (21 points). His total point count is 378, which predicts that he has an 81% probability of completing a colonoscopy, compared to the 35% likelihood of the average patient (data not shown).
The model showed modest separation of patients across risk levels for non-adherence to follow-up colonoscopy (C-statistic> 0.66, bootstrap-corrected C-statistic> 0.63) and excellent calibration or high agreement between observed and predicted risk. The R 2 statistic, derived from the D-statistic, showed only 14% of the variation in outcome was explained in this model (R 2 (95% CI) = 14.03 (10.17-18.18), D (95% CI) = 0.83 (0.69-0.96)). A logistic regression, predicting the completion of a colonoscopy, showed similar results for non-adherence to follow-up colonoscopy (C-statistic = 0.66, bootstrap-corrected C-statistic> 0.64).

Discussion
This model was created to identify patients at the greatest need for targeted interventions, such as patient navigation, to complete the screening process for CRC. We recognize that the performance of the model has limitations. The Cstatistic, while suboptimal, shows the adequate separation of patients across risk levels for non-adherence to followup colonoscopy, yet the R 2 indicates the discrimination and calibration could be further improved. However, focusing efforts on improving follow-up colonoscopy among patients in the lowest quintiles could provide value in the population most in need of understanding the importance of follow-up. Identifying the barriers among these patients and targeting interventions could produce improvements.  When putting the model into practice, targeting the lowest probability groups could result in the greatest improvements. Both patient and system level barriers were used in the final model, indicating the importance of recognizing multilevel barriers in adherence to colonoscopy following an abnormal FIT. The strongest predictors were age, health center, anticoagulant use, and flu shot vaccination. The youngest patients were more likely to obtain colonoscopies, which may be explained by newer entry into screening eligibility and they are healthier. However, recognizing a lower likelihood of follow-up screening among older patients could help in efforts to close the gap in colonoscopy completion. Patients' likelihood of receiving colonoscopy varied by the health center, and this is a complex variable that could represent a variety of systems and patient level factors. While the system level factors impacting colonoscopy completion can include access to colonoscopies, location, and community characteristics, the referral process, scheduling, waitlists, and capacity, it is also affected by many patient level barriers reflected in the health system like transportation barriers, inability to take time off work and mistrust of the system. The health system variable is important in this model and should be further examined.
Patients indicated as users of anticoagulant medications were less likely to obtain a colonoscopy following an abnormal FIT. Removing a patient from an anticoagulant may be required before a colonoscopy and could be a deterrent for completing the test. Patients who were vaccinated for the flu in the past year were more likely to complete colonoscopy, indicating a trend of compliance for recommended preventative care. Patients with Medicaid insurance, a free or low-cost health coverage for lowincome patients, were more likely to complete the colonoscopy. This may indicate program effectiveness or the effect of removing cost barriers. The GINI Income Inequality Ratio is a statistical measure of income inequality where a measure of 1 indicates total inequality and a measure of 0 indicates total equality. This analysis shows that living in an area of inequity (closer to 1.0) decreases one's chances of completing a colonoscopy. Finally, patients who were non-white were less likely to complete a colonoscopy. Addressing issues among non-white populations through interventions is necessary to close the gap in disparities in CRC screening. We believe this is the first model to predict the likelihood of follow-up after an abnormal fecal test. Further research is needed to test the effectiveness of interventions for patients who have a low and moderate probability of completing follow-up colonoscopy.

Limitations
There are known deficiencies in capturing completed colonoscopies and referrals in the EHR, and especially in the community clinic setting where most colonoscopies are referred to outside providers or specialties without a Table 3 Hazard ratios and risk score points for the final prediction model

Variable
Hazard Ratio (95% CI) Likelihood ratio p-value Points direct link to the EHR. Therefore, updating records relies on clinic processes. While analyzing chart abstracted colonoscopies could be the gold standard, it only explains why patients may or may not have colonoscopies recognized by the physician in obscure data points in the EHR [24]. This population is a primarily FQHC population and is therefore not generalizable to patients who obtain care in other types of clinics or healthcare settings. Further, this population is primarily in Oregon and Northern California, indicating regional limitations to generalization. Other populations and settings may not have EHR records to capture the predictors in our model. The health center variable is highly collinear with the other variables except for age. System-level predictors may matter more than patient-level predictors and these models may need to be validated for each population and setting where they will be put into practice. We sought to develop a model that will transport to other health systems, clinics, and populations. The validity of the analysis would be increased with external validation and could support widespread use.