We found that the models that better predicted non-attendance in dermatology and pneumology outpatient services were based on decision trees and included the following variables: patient’s history of previous attendance, major ambulatory surgery, status of the last appointment, number of previous visits, and age, for dermatology, and patient’s history of previous attendance, lead time, status of the last appointment, number of previous visits, and number of days since the last visit, for pneumology. The use of the prediction models to identify individuals at high risk of non-attendance for further selective phone call reminders allowed reducing in approximately 50% and 40% the non-attendance rate in dermatology and pneumology services, respectively.
The systematic review conducted by Carreras et al. showed that at least half of the studies on no-show prediction identified age, gender, distance from home to the healthcare center, weekday, visit time, lead time, and history of previous attendance as predictors of non-attendance; marital status and visit type (first or successive) were also frequently used [14]. Our findings were mostly in line with the results reported by Carreras et al., although we did not find an association between gender and non-attendance, as reported elsewhere [18, 19]. Other studies described that non-attendance was associated with the number of previous appointments [20, 21], the status of the last appointment [22, 23], and the treatment category (e.g., surgery) [24], which was also consistent with our results. Regarding the relative importance of each variable in the model, the status of the last appointment, age, time of the day, lead time, and history of previous attendance are among the most important variables in the non-attendance predictive models presented in various analyses [12, 22, 25]. In our study, the history of previous attendance and the status of the last appointment also had a high weight in both models. In contrast, lead time and age were mainly important in pneumology and dermatology models, respectively. The time of the day had a small weight in both models.
Based on the performance results of the training algorithms, we chose decision trees to build our models, which was the second most frequently used algorithm to develop predictive models in the review of Carreras et al., after logistic regression [14]. The accuracy values reported in the review for models based on decision trees ranged from 76.5% to 89.6%, higher than the accuracy found in our analysis. However, most studies had a limited sample size and/or used the same dataset for training algorithms and assessing their performance. Alternatively to this approach, which may lead to overfitting, we used an independent dataset for model validation. Therefore, although lower than reported elsewhere, we think our results may better reflect the expected accuracy of the model when applied to the real-world.
Regardless of the validation approach, most studies reported accuracy values lower than the attendance rate [14]. This trend, also observed in our analysis, may be explained by the lack of data from other domains such as social, cultural, and socioeconomic factors that might have a relevant contribution to non-attendance behavior. Finally, we observed a poorer performance of the pneumology model compared with the dermatology model, which might also be due to differences in outpatient procedures and patient complexity between services. These findings suggest that service-specific characteristics and predictors from other domains should be included in the development of prediction models for non-attendance.
Like in our pilot study, other authors have reported non-attendance reductions after implementing reminding strategies based on phone calls [26] or, most frequently, short message services (SMS) [9,10,11]. However, phone calls are more expensive than SMS [9, 27], and both interventions have high costs for healthcare centers. Irrespective of the type of reminder, predictive algorithms may help to prioritize patients at higher risk of non-attendance, which is likely to improve the cost-effectiveness of the intervention. Furthermore, the quantitative approach to the prediction of non-attendance allows combining more or less compelling interventions based on different thresholds of non-attendance risk (e.g., SMS at risk between 50%-90%, and phone calls at risk ≥90%).
A remarkable consequence of our intervention for reducing non-attendance was the overloading of hospital agendas, highlighted during the debriefing held after the pilot study. This perception, which is consistent with the effectiveness of the measure, indicates that medical appointments were routinely scheduled on an overbooking basis, assuming certain level of non-attendance. Hence, the potential consequences of improving efficiency in healthcare systems should be considered before implementing these types of solutions. Another concern raised during the debriefing session was the cost (in terms of time spent by administrative staff) associated with phone calls to individuals at higher risk of non-attendance. The economic impact of this solution can be minimized by implementing call centers shared by various centers or investigating the optimal cut-off of non-attendance risk for a patient to be included in the intervention. For cut-off selection, other approaches like the efficiency curve (similar to the Lorenz curve used in economics) could be explored [28]. Nevertheless, cost-effectiveness analyses that consider the cost associated with non-attendance should be conducted before drawing conclusions on the actual economic impact of this intervention.
The interpretation of our results is limited by the simultaneous assessment of the predictive model and the intervention itself (i.e., phone call reminder), which precluded appraising the contribution of each feature to the non-attendance reduction. However, the main purpose of our pilot study was to assess the applicability of the whole concept to day-to-day practice. Another limitation was the unavailability of data with potential influence on the non-attendance rate, such as the economic status [29, 30], education level [31, 32], or certain medical conditions [20, 33]. As discussed previously, the lack of social information is common in the development of predictive algorithms elsewhere. Regardless of the future inclusion of these data, the model should undergo continual learning by retraining to assure its validity through time, including the seasonal perspective, which is likely to influence the outcomes. The model has to be aware of new patients or categorical features, as well as considering up-to-date data to include the latest trends of non-attendance in each hospital service. Alternative analytical approaches, such as logistic regression analysis, could also be explored.