Skip to main content
  • Research article
  • Open access
  • Published:

An analytical approach to aggregate patient inflows to a simulation model over the radiotherapy process



In meeting input data requirements for a system dynamics (SD) model simulating the radiotherapy (RT) process, the number of patient care pathways (RT workflows) needs to be kept low to simplify the model without affecting the overall performance. A large RT department can have more than 100 workflows, which results in a complex model structure if each is to be handled separately. Here we investigated effects on model performance by reducing the number of workflows for a model of the preparatory steps of the RT process.


We created a SD model sub-structure capturing the preparatory RT process. Real data for patients treated in 2015-2016 at a modern RT department in Sweden were used. RT workflow similarity was quantified by averaged pairwise utilization rate differences (%) and the size of corresponding correlation coefficients (r). Grouping of RT workflows was determined using two accepted strategies (80/20 Pareto rule; merging all data into one group) and a customized algorithm with r≥0.75:0.05:0.95 as criteria for group inclusion by two strategies (A1 and A2). Number of waiting patients for each grouping strategy were compared to the reference of all workflows handled separately.


There were 128 RT workflows for 3209 patients during the studied period. The 80/20 Pareto rule resulted in 14/8/21 groups for curative/palliative/disregarding treatment intent. Correspondingly, A1 and A2 resulted in 7-40/≤4-36/7-82 groups depending on r cutoff. Results for the Pareto rule and A2 at r≥85 were comparable to the reference.


The performance of a simulation model over the RT process will depend on the grouping strategy of patient input data. Either the Pareto rule or the grouping of patients by resource use can be expected to better reflect overall departmental effects to various changes than when merging all data into one group. Our proposed approach to identify groups based on similarity in resource use can potentially be used in any setting with variable incoming flows of objects which go through a multi-step process comparable to RT where the aim is to reduce the complexity of associated model structures without compromising with overall performance.

Peer Review reports


Radiotherapy (RT) is used for approximately 50% of cancer patients to cure the disease or to ameliorate associated symptoms [1, 2]. With a growing cancer incidence, demands on RT are increasing. One immediate problem is how to maximize the utilization rates of available resources whilst maintaining high treatment quality and staff satisfaction [3]. RT is one of the most complex disciplines of healthcare and understanding departmental responses to various changes can be challenging. Simulation models, as suggested by system dynamics methodology or other methods within the field of operations research (OR), can help to increase this understanding [4, 5]. To create such a model referral of patients to RT, which determines the input data format, is the first step of the RT process and must be thoroughly understood.

The RT process involves numerous tasks from treatment preparation to treatment delivery. The treatment is typically fractionated, i.e. given once daily during a 5-7 week period, and delivered using linear accelerators “linacs” [6]. Preparations are undertaken to assure that the desired treatment is given to the intended anatomical region. Different imaging modalities are used to obtain a 3D-image representation of the patient’s anatomy on which contours of the tumor to be treated and the organs that must be avoided during treatment are overlaid. Based on this information, a team of physicians, physicists, and nurses (specialized in radiation oncology or radiation physics) then decide the treatment setup and calculate the dose distribution. Each RT task needs to be coordinated in time with additional treatments such as surgery or chemotherapy with the aim to assure that the scheduled activities meet the need of each patient. The referral pattern of patients to RT largely impact if this is feasible given diagnosis- and treatment-intent-specific dependencies in tasks as well as the balance between existing workload and available capacity at the department.

When addressing patient referral patterns to an RT department from a modelling perspective with the ultimate aim to understand the overall structure, it may be challenging to keep the model parsimonious. Patients are referred to RT in inherently variable volumes [7]. Non uniformity can be expected with respect to time and with respect to composition of diagnoses and of treatment intents. Temporal variations mainly depend on availability of staff, both for clinical assessments as well as for diagnostics; these effects are to some extent predictable. Effects due to vacation periods during summer or longer holidays are also predictable with the number of referrals tending to increase before affected dates whilst decreasing to reach a steadier state after the period in question has passed. The composition of referrals has a more random structure in the short perspective but also tends to stabilize over time given a particular RT departments profile. Complicated patient cases typically demand non-standard approaches and such cases are often directed to larger hospitals for centralized care. The mix of cancer diagnoses and treatment intents at a particular RT department is therefore a consequence of the available treatments. Taken together, some treatments are common at certain RT departments but may not exist at all at others. In addition, the number of patients within a specific group needs to be acknowledged.

In order not to inflate the size of a simulation model more than necessary and make it difficult to build, analyze and understand, applying the 80/20 Pareto rule to the input data is one strategy [8]. For the current problem, this would mean to use the distribution of incoming referrals for 80% of the patients and rescale this volume to match the total number of patients. Another strategy could be to combine the remaining 20% of patients into one group. A drawback with these approaches is that effects for patients who are treated according to rare diagnoses may not be properly acknowledged. This is problematic if different operational or capacity policies involving them are to be tested. To compensate for this, reducing model complexity without changing effects by the overall input data could be achieved by grouping patients which are comparable from some aspect, e.g. which utilize the same RT resources to a similar degree.

OR methods in RT have historically mainly focused on resource planning and resource use for purposes such as optimizing staff allocation, scheduling of patients or understanding the RT process from a strategic perspective [9]. Work on OR models for RT done by Vieira et al. used the referral pattern of patients to RT for one month and, based on this, they could assume a daily patient volume based on the Poisson distribution with a mean number of patient arrivals corresponding to observed rates of a particular weekday [7]. Discrete-event simulations by Kapamara et al. used data for one year (2005), uniform and exponential probability distributions were used to estimate input values for various variables including number of patients and their characteristics [10]. Using the same discrete-event simulation computer package (Simul8 Corporation, Boston, MA, U.S.A.), Proctor et al. also designed a model to identify factors affecting how a patient moves through the RT department from initial consultation/referral to the last treatment fraction [11]. They used data from a two-linac department, also in the U.K. (1997/2001), to estimate the performance of the department with an increased level of demand. Patient characteristics including requirements of RT resources were determined from a probability (profile) distribution. These are some of only a few examples, which address the referral pattern of patients to RT in some detail, and to the best of our understanding it is unknown how different approaches to group patients of an RT department compare to each other in an OR setting.

In this work, we took an analytic approach to systematically quantify the input data to a sub-structure of a system dynamics (SD) model which mimics the preparatory steps of a seven-linac RT department in Sweden. This is one of the larger departments in this country, located at one of our university hospitals, and it offers treatments for all kinds of cancer diagnoses and treatment intents. Our aim was to identify an aggregated input data set, which reproduced and simulated effects according to existing patient referral patterns without having to handle each diagnosis and treatment intent separately. Understanding the effects by aggregating the input data for this problem will increase the knowledge about how to create future full-scale system dynamics models of the whole RT process or other systems with similar characteristics.

Materials and methods


Patient data were retrieved from the Oncological Information System ARIA® (Varian Medical Systems, Inc., Palo Alto, CA, U.S.A.) during January 1st 2015 to April 30th 2016. All patients referred to RT at Sahlgrenska University Hospital, Gothenburg, Sweden during this period (70 weeks) were sorted according to cancer diagnosis, treatment intent, and utilization of departmental resources for the preparatory steps of the RT process. Patients who had been registered in the system during the investigated period, but where no treatments had been delivered, were also included.

Cancer diagnoses were sorted according to ICD-10 codes [12] and treatment intent was acknowledged as “curative” or “palliative”. Each diagnosis- and treatment-intent-specific care pathway was defined as a separate workflow with the types of resources needed to prepare such patients for RT (hereafter referred to as an RT workflow). Resource utilization rates were quantified based on percentage use of associated operations or tasks given the total number of patients referred to the RT workflow in question (later referred to as PPRCT, Percentage patients requiring capacity). Some patients may require more than one appointment, hence PPRCT can exceed 100%. Assessed tasks were categorized into: 1. Positioning aid (Mould), 2. Positron-emission tomography (PET), 3. Computed tomography (CT), 4. Magnetic resonance imaging (MRI), 5. Target definition (TD), 6. Treatment planning (TP), and 7. Patient quality assurance (QA).


Similarity between different RT workflows was calculated using pairwise comparisons between resource utilization rates as quantified by Mann-Whitney U tests and Pearson’s linear correlations (r). Absolute percentage differences were also calculated for each task and as a total summary metric over all tasks. RT workflows with quantitatively similar patterns, i.e. similar use of resources, were then identified and aggregated based on increasingly stronger correlation cutoffs (r≥0.75:0.05:0.95); a customized algorithm was developed to systematically investigate these as termination criteria for group inclusion (details on the algorithm available in Additional file 1). Two grouping strategies were investigated: 1. All elements correlate with one main element (GroupingStrategy 1, referred to as A1 below) and 2. All elements correlated with each other (GroupingStrategy 2, referred to as A2 below). To minimize potential effects by selected starting point for this procedure, random permutations of RT workflows were done 10 000 times/r-value cutoff. RT workflows were primarily required to have the same treatment intent to be paired, but effects by disregarding treatment intent were also investigated.

The data handling and the calculations were performed in Microsoft Excel (2016) or in MATLAB (MATLAB R2018a version, The MathWorks Inc., Natick, MA, U.S.A.). P-values≤0.05 were considered to indicate statistical significance and r-values≥0.70 were considered to indicate strong correlations. Descriptive statistics were reported using mean and standard deviation or median and range, whichever most suitable based on the underlying data distribution. Performance between grouping strategies was quantified by absolute and relative differences at each time point with respect to a reference grouping strategy, which handled all workflows separately.

Model and modelling scenarios

We investigated the effects of using the different grouping strategies to aggregate RT workflows as input data to a sub-structure of a SD model capturing the preparatory steps of the RT process, which will be developed for system dynamics analysis of the whole RT process in future work. The present model provided the overview of patient flows and number of patients waiting through subsequent operational steps required to prepare the patients for RT. The input data were handled in weekly batches. The starting point of the proposed seven-step model was when a treatment decision was taken as a documented referral date to the RT department; the end point was when a date for completed QA was documented, confirming that the patient was ready to start treatment (Fig. 1; Additional file 1). Available resources at each operational step were calibrated to a same capacity setting which allowed for patient throughput at a rate not to build queues for the worst-case scenario by each treatment intent. The simulations were performed in Stella Architect (v.1.7.1 isee systems, Lebanon, NH, U.S.A.) and run on a MacBook Pro 2016 (MacOS v.10.13.1).

Fig. 1
figure 1

A sub-structure of a system dynamics model capturing the preparatory steps of the RT process built in over seven RT preparatory tasks evaluating number of waiting patients from scenarios defined by various capacity settings, combined with actual patient volumes and corresponding capacity demands. Note that the flow starts at the top left of the graph and finishes at the top right. If the workflow in question does not include a certain operation, the patients flow through this task without any capacity demand; this never happens at the boundaries of the model (referral and ready to start treatment). Symbols: Rectangles = points in the flow where patients accumulate; blue thick lines with valve symbol = route for patients transferring through a capacity-restricted operation; red fine lines with arrow head = transfer of information held in the circular symbol to be utilized in a computation at the point of the arrowhead. At each capacity restricted operation, two variables are combined to determine what flows through, capacity for the operation and the variable PPRCT x, which determines the percentage of the waiting patients who will be requiring capacity in operation x. Abbreviations: CT=Computed Tomography, MRI=Magnetic Resonance Imaging, PET=Positron Emission Tomography, PPRCT=Percentage patients requiring capacity, QA=Quality Assurance, and RT=Radiotherapy

Model outputs were compared over all time points based on resulting number of patient waiting as a measure of the ability of each grouping strategy to match the number of patients waiting of the best-case scenario where all workflows were handled separately (reference). Aggregation of RT workflows by the customized algorithm are acknowledged by the grouping strategy number, i.e. A1 or A2, and utilization rate correlation coefficient cutoff, i.e. 75, 80, 85, 90 or 95. A1_75, therefore, refers to results by A1 with cutoff at r=0.75. For comparison, all RT workflows were also grouped according to the Pareto rule (80% of workflows handled separately and the remaining 20% merged into one joint group; Pareto_80/20). All RT workflows were also merged into one group as an estimate of the expected least representative worst-case scenario (all-in-one).


Data overview

In total, 3209 patients were referred to the RT department during the studied period. Of these, 2094 (65%)/1115 (35%) were planned for treatment with curative/palliative intent. There were 72/56 different cancer diagnoses resulting in 128 separate RT workflows, distribution of patients per RT workflow is showed in Fig. 2a-b. The majority of curative patients were to undergo a single treatment course (98%), but almost one of three palliative patients underwent more than one treatment course (32%). The treatments for five patients planned for seven treatment courses were cancelled although preparations for RT were completed.

Fig. 2
figure 2

Number of patients included in each RT workflow for curative (a) and pallative (b) intent

Similarity between workflows with and without consideration of treatment intent

For curative treatments, similarity between utilization rates of resources between RT workflows could generally not be ruled out (averaged minimum p=0.057). In two situations, however, candidates for grouping could be excluded (≥1 diagnosis/RT workflow; p<0.042). For the remaining comparisons, statistically significant correlations were generally strong (averaged median r=0.87, range: 0.78-0.97) and typically offered multiple candidates for grouping (median: 18 pairs; Table 1). The RT workflow for C61 (prostate cancer) showed no statistically significant correlations with any other workflow. The RT workflows for C24 (bile duct cancer) and L91 (hypertrophic skin disorder) correlated with only one other workflow. The average absolute percentage difference for total resource utilization rate for all curative RT workflows, calculated as median of all mean values, was 38% (range: 22-238%).

Table 1 Averaged statistically significant correlations for 72 RT workflows where patients were treated with curative intent at the Sahlgrenska University Hospital in Sweden during 2015-2016

For palliative treatments, similarity between utilization rates of resources between RT workflows could, as for curative treatments, generally not be ruled out (averaged minimum p=0.1820). There was only one situation where candidates for grouping could be excluded (3 diagnoses; p<0.034). Statistically significant correlations between RT workflows were strong (averaged median r=0.90, range: 0.77-0.96) and offered several candidates for grouping (median 42 pairs; Table 2). The RT workflows for C56 (ovarian cancer) and C64 (kidney cancer) showed no statistically significant correlations with any other workflow. The average absolute percentage difference for total resource utilization rate for all palliative RT workflows was 20% (range: 8-95%).

Table 2 Averaged statistically significant correlations for 56 workflows where patients were treated with palliative intent at the Sahlgrenska University Hospital in Sweden during 2015-2016

The pattern of similarity between utilization rates of resources remained, irrespective of treatment intent, (averaged minimum p=0.073). There were five situations where candidates for grouping could be excluded. Correlations between RT workflows were strong (averaged median r=0.88, range: 0.78-0.97) and offered more candidates for grouping than either of the two treatment intents separately (median: 50 pairs; Table 3).

Table 3. Averaged statistically significant correlations for 128 workflows disregarding treatment intent for patients at the Sahlgrenska University Hospital in Sweden during 2015-2016

However, the RT workflow for C61 (prostate cancer) with curative treatment intent had no potential grouping candidate; remaining workflows had ≥2 candidates for grouping. Averaged absolute percentage difference for total resource utilization rate for all workflows was 17% (range: 8-100%)).

Aggregating RT workflows

Using A1, where all elements in a RT workflow group were to correlate with one main element, and r≥0.75/0.80/0.85/0.90/0.95 resulted in a minimum of 7/9/12/18/28 groups for curative intent (Table 4, left). Corresponding figures for palliative intent were 4/4/5/7/12 groups and 7/10/13/21/34 groups when disregarding treatment intent. Within each group of similar characteristics, absolute percentage differences for resource utilization rates between included elements were on average at most 107/113/132/202/279% for curative intent, 39/37/39/38/63% for palliative intent, and 71/67/69/65/123% when treatment intent was disregarded.

Table 4 Number of groups and averaged within-group differences by the customized algorithm and correlation coefficient cutoffs for patients treated for either curative or palliative intent at the Sahlgrenska University Hospital in Sweden during 2015-2016

Using A2, where all elements in a RT workflow group were to correlate with each other, the different r-value cutoffs resulted in a minimum of 7/15/24/33/40 groups for curative intent, 4/8/15/24/36 groups for palliative intent, and 7/29/46/59/82 when treatment intent was disregarded (Table 4, right; Details for A2_85 in Fig. 3a-c). Corresponding absolute percentage differences for resource use between included elements were 109/160/190/258/279%, 39/32/35/45/92%, and 70/100/156/197/252%, respectively.

Fig. 3
figure 3

Total number of patients (left y-axis) for the groups aggregated together in A2_85 and the corresponding number of RT workflows per group (right y-axis) for curative treatment intent (a), pallative treatment intent (b) and disregareding treatment intent (c)

Using the 80/20 Pareto rule resulted in 14 RT workflow groups for curative intent, 8 groups for palliative intent, and 21 groups when treatment intent was disregarded, with an inflow pattern as illustrated in Fig. 4a-c (curative, palliative and total). Since the number of groups by the customized algorithm at r=0.85 were comparable to or exceeded the number of groups by the 80/20 Pareto rule, modelling results were compared and reported for the two grouping strategies up to r=≤0.85 only. Number of patients waiting were subsequently investigated for nine RT workflow group scenarios including the reference which handled all RT workflows separately: reference, Pareto_80/20, all-in-one, A1_75, A2_75, A1_80, A2_80, A1_85, and A2_85.

Fig. 4
figure 4

Inflow pattern of patients by intent: curative (a), palliative (b) and in total disregarding intent (c). Figure key shows Group# and corresponding ICD-10 code and for (c) the treatment intent is indicated by a C or P (curative or palliative respectively) before the ICD-10 code

Modelling results

For A1_75, there were ≤28 RT workflows for each of the seven curative groups (1 single), ≤49 RT workflows for the four palliative groups (2 singles), and ≤80 RT workflows for the seven groups when treatment intent was disregarded (1 single). For A1_80, corresponding figures were ≤25, ≤49, and ≤71 (each with 2 singles) and for A1_85, there were ≤19, ≤39, and ≤54 (3, 2, and 1 single(s), respectively).

For A2_75, there were ≤31 RT workflows per each of the seven curative groups (1 single), ≤49 RT workflows for the four palliative groups (2 singles), and ≤72 RT workflows for the seven groups when treatment intent was disregarded (1 single). For A2_80, corresponding figures were ≤15, ≤32, and ≤19 (3, 3, and 12 singles, respectively) and for A2_85, there were ≤11, ≤11, and ≤14 (9, 2, and 21 singles, respectively).

Performance of different grouping strategies

Overall performance of the different grouping strategies compared with the reference are presented in Fig. 5 and Tables 5, 6 and 7.

Fig. 5
figure 5

Number of patients waiting to various steps of the RT process by week as given by the output of the proposed simulation model over RT preparatory steps. RT tasks from left to right: 1. Mould, 2.PET, 3. CT, 4. MRI, 5. Target definition (TD), 6. Treatment planning (TP), and 7. QA. Top row corresponding to results for curative treatment intent (a), middle row for palliative treatment intent (b) and bottom row when disregarding treatment intent (c). The experiment includes nine different group scenarios: reference=each RT workflow handled separately, Pareto_80/20=80% of workflows handled separately and remaining 20% merged into one additional group, A1/2_75/80/85=proposed grouping strategies where all elements correlated with one main element [1]/correlated with each other [2] for utilization rate correlation coefficient cutoffs up to r=0.75/0.80/0.85, and all-in-one=all RT workflows merged into one group. Abbreviations: CT= Computed Tomography, MRI= Magnetic Resonance Imaging, PET= Positron Emission Tomography, QA=Quality Assurance, RT= Radiotherapy, TD=Target definition and TP=Treatment planning

Table 5 Absolute and relative differences for number of patients waiting between the investigated RT workflow grouping strategies and the reference scenario for curative treatment intent. Bold numbers indicate top three performing (lowest relative absolute mean) grouping strategies per task
Table 6 Absolute and relative patient differences for number of patients waiting between the investigated RT workflow grouping strategies and the reference scenario for palliative treatment intent. Bold numbers indicate top three performing (lowest relative absolute mean) grouping strategies per task
Table 7 Absolute and relative differences for number of patients waiting between the investigated workflow grouping strategies and the reference scenario disregarding treatment intent. Bold numbers indicate top three performing (lowest relative absolute mean) grouping strategies per task

When assessing the mean absolute differences in number of patient waiting between all grouping strategies compared with the reference, all-in-one consistently performed worst and A2_85 generally performed best (curative/palliative/disregarding treatment intent: 7.4±3.8/2.0±1.1/10.6±5.2 patients versus 1.5±1.2/0.6±0.5/2.3±1.8 patients; Fig. 5; Tables 5, 6 and 7). Performance of Pareto_80/20 were generally second best to A2_85 (3.4±2.0/0.5±0.5/2.1±1.7 patients). For the same r-value cutoff, results by grouping strategy A2 typically performed better than results by grouping strategy A1.

When assessing the impact of the grouping strategies on mean relative differences in number of patients waiting for each RT preparatory task, the smallest differences (least dependent on grouping strategy) were generally found for 5. Target definition and the largest (most dependent on grouping strategy) for 7. QA (Fig. 5; Tables 5, 6 and 7). However, patterns between treatment intents varied and the overall smallest differences were found for all tasks except for 1. Mould and correspondingly for overall largest differences except for 6. Treatment planning.

Assessing the performance of the grouping strategy based on median values instead of mean values were also investigated, with comparable results (data not shown).


In this work, we investigated the performance of various strategies to group care pathways for patients with different cancer diagnoses and treatment intents referred to RT (RT workflows). Our aim was to identify an aggregated input data set to a sub-structure of a SD model capturing the preparatory steps of the RT process, which produced similar results as the original patient referral pattern. Using data from 3209 patients treated in 2015-2016 through 128 care pathways at a modern seven-linac RT department, we found that the accepted 80/20 Pareto rule performed well with respect to number of patients waiting, but that even better results could be achieved if using a grouping strategy, which recognized similarities in resource use. The latter, however, resulted in a somewhat more extensive input data set compared with the former.

Needing to reduce dimensions of input data prior to modelling is a common problem when working with real world data [13]. A well-designed dimension reduction strategy creates a smaller input data set, which provides the same modelling results as the original representation. To this end, some measurement of similarity is important to use and for the problem addressed in this work, there is to our knowledge no recommended way of deciding this (a PubMed search on March 21st, 2019, gave zero relevant hits for various combinations of “radiation therapy”, “simulation model”, “input data set”, “operations research”). Neither of the accepted strategies to handle input data of a simulation model, the Pareto rule nor the merging of all data into one group, use a similarity measure when reducing the input data structure. We, therefore, proposed to use pairwise correlations between groups based on RT preparatory step resource use for a number of reasons. As patients pass through the different steps of the RT process, available resources set the limit for to which extent the demand can be met. If the treatment of multiple cancer diagnosis requires a similar amount of resources, scheduling them for RT can be simplified by grouping them together. Conditions for RT also change over time, with emerging data for instance motivating fewer treatments for prostate cancer [14]. This is a large group of patients referred to RT and a similarity measure based on resource utilization rates can assist in understanding to which extent existing treatments already follow this pattern. If so, such a similarity measure can guide whether a new patient care pathway needs to be created or if those for another cancer diagnose or treatment intent can be reused. Finally, the underlying data for our proposed similarity measure can easily be extracted from existing oncological information systems and the measure in itself is straightforward to calculate. Although the purpose of our system dynamics model over the RT process is not to find an optimal solution to the problem at hand, it is interesting to note that none of the aforementioned OR models for RT by Vieira [7], Kapamara [9] and Proctor [10] discuss grouping strategies or similarity measures when quantifying patient volumes as either an alternative to or together with probability distributions as estimated from observed data.

The level of data aggregation needed to reduce model complexity will depend on the amount of details the model is intended to capture. System dynamics is an OR method that can help to understand the behaviors of a complex system in various scenarios [15]. Systems can be complex based on several reasons, for instance non-linear relations between cause and effect. There is a difference between complicated and complex systems, where a complicated system can be hard to understand while a complex system can consist of many easy steps influencing each other in a non-predictable, and even in a counterintuitive way. For the understanding of system behaviors after grouping data in the investigated RT scenario, a high level of aggregation will probably work to capture overall effects during long time periods. However, the total capacity needed to handle groups of patients requiring different resources or resource use can be different from the average level. With too high a level of aggregation, details get averaged out and when investigating effects for certain groups the result may not be noticeable. Different inflow patterns will also have an impact in this context as shown by our data. Fluctuating incoming flows is common in RT [7] and a steady state typically never exist. This needs to be kept in mind when considering how to aggregate data for situations where incoming flows are stable over time since the impact of grouping strategy may be of less importance. For a general setting in line with RT, where there might be variable incoming flows of objects instead of patients and those objects go through a multiple-step process with different requirements of resource use at each step, it is likely that the choice of aggregation method will have an effect on the end results as will the investigated period of time. Which aggregation method that is most appropriate needs, however, to be determined for that specific situation based on the characteristics of available data.

Strengths of this study include the use of real-world data from a large modern RT department, which captured yearly variations in the referral pattern of patients to RT, and an objective approach to evaluate accepted and new grouping strategies as a means to reduce the input data set. By including all patients with an ICD-code registered in ARIA®, and who were planned for RT during the investigated period, the effect of rarely treated diagnoses could be acknowledged for various grouping strategies. We handled the data in weekly batches in our model, although other studies have identified daily variations in referral patterns [7]. Potential effects by daily variations will be further investigated in a full-scale SD model instead of the current model which was primarily designed to illustrate effects of grouping strategies in a generic simulation modelling context. A future refined version of the simulation model will acknowledge feedback loops to allow for patients to reverse in the process with various tasks needing to be redone rather than the single-direction of patient flows which is supported by the current version. The current design may also explain why we found the largest dependence on grouping strategy in the last step of the RT preparatory tasks (QA), where number of patients waiting can be expected to accumulate by the chosen design. With the over 100 patient care pathways identified in this study, conditions for model building were excellent and effects by different grouping strategies could really be tested. However, investigated correlations were strong leading to a certain redundancy in using the Mann-Whitney statistic to guide in similarity as a pre-processing step for our customized grouping algorithm. This step would be more important for less strongly correlating data than for the information we explored here. We selected the smallest number of groups by each scenario for analysis without further evaluation of the included diagnoses or the within-group differences. If these or other criteria had been used for selection, larger groups would have been the consequence. Since we wanted to identify as few groups as possible, detailed investigation of other selection criteria was outside the scope of this study. Furthermore, results by the grouping strategies of our customized algorithm depended on the ordering of data and a permutation strategy was introduced to minimize these effects. The total number of ways to order a set of 100 or more elements is far bigger than what is possible to investigate exhaustively and we, therefore, decided on the proposed strategy with 10000 randomly selected orderings. Increasing the number to 100000 orderings had no impact on the reported results, however, reducing the number to 1000 failed to identify some of the smaller solutions (data not shown). Finally, the results presented here are specific for the investigated RT department. Produced results will always mirror the character of the incoming flow of patients during the studied period of time. The proposed aggregation strategy, however, can be expected to work with a different dataset in any general setting where resource usage can be used to quantify the similarity of a process.


The need to group incoming patient flows to a simulation model over the RT process is important to maintain simplicity whilst acknowledging the highly variable data. The accepted 80/20 Pareto rule can be expected to perform well as a grouping strategy for this purpose but even better results may be achieved if using a grouping strategy where a similarity measure between RT care pathways are used to identify groups. Even if the latter requires pre-processing of data, finding similarities between groups, with or without consideration of treatment intents, it may also prove to be clinically useful in other situations. Such examples include capacity planning of RT and dynamic re-planning of different RT booking scenarios where reducing number of alternatives may be of importance whilst still needing to acknowledge the autonomy of each patient care pathway.

Availability of data and materials

Additional datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.



Computed tomography


Magnetic resonance imaging


Operations research


Positron-emission tomography


Quality assurance




System dynamics


Target definition


Treatment planning


  1. Delaney G, Jacob S, Featherstone C, Barton M. The role of radiotherapy in cancer treatment: estimating optimal utilization from a review of evidence-based clinical guidelines. Cancer. 2005;104(6):1129–37.

    Article  Google Scholar 

  2. Hall EJ, Giaccia AJ. Radiobiology for the radiologist, vol. ix. 6th ed. Philadelphia: Lippincott Williams & Wilkins; 2006. p. 546.

    Google Scholar 

  3. Winkfield KM, Gabeau D. Why workforce diversity in oncology matters. Int J Radiat Oncol Biol Phys. 2013;85(4):900–1.

    Article  Google Scholar 

  4. Sterman JD. Business dynamics: systems thinking and modeling for a complex world. Boston: Irwin/McGraw-Hill; 2000.

    Google Scholar 

  5. Law AM. Simulation modeling and analysis. 4th ed. New York: McGraw-Hill Publishing Co; 2006.

    Google Scholar 

  6. Steel GG. Basic clinical radiobiology, vol. viii. 3rd ed. London: Arnold; 2002. p. 262.

    Google Scholar 

  7. Vieira B, Demirtas D, van de Kamer JB, Hans EW, van Harten W. A mathematical programming model for optimizing the staff allocation in radiotherapy under uncertain demand. Eur J Oper Res. 2018;270(2):709–22.

    Article  Google Scholar 

  8. Koch R. The 80/20 principle: the secret of achieving more with less. Updated 20th anniversary edition ed. London: Nicholas Brealey Publishing; 2017. p. 413.

    Google Scholar 

  9. Vieira B, Hans EW, van Vliet-Vroegindeweij C, van de Kamer J, van Harten W. Operations research for resource planning and -use in radiotherapy: a literature review. BMC Med Inf Decis Mak. 2016;16(1):149.

    Article  Google Scholar 

  10. Kapamara T, Sheibani K, Petrovic D, Haas O, Reeves C. A simulation of a radiotherapy treatment system: A case study of a local cancer centre. ORP3. Guimares: EURO; 2007.

    Google Scholar 

  11. Proctor S, Lehaney B, Reeves C, Khan ZJOI. Modelling patient flow in a radiotherapy department. OR Insight. 2007;20(3):6–14.

    Article  Google Scholar 

  12. World Health Organization. ICD-10 International statistical classification of diseases and related health problems 2016 [Available from:].

    Google Scholar 

  13. Fodor IK. A survey of dimensions reduction techniques. Livermore: U.S. Department of energy: University of California, Lawrence Livermore National Laboratory; 2002.

    Book  Google Scholar 

  14. Morgan SC, Hoffman K, Loblaw DA, Buyyounouski MK, Patton C, Barocas D, et al. Hypofractionated radiation therapy for localized prostate cancer: an ASTRO, ASCO, and AUA evidence-based guideline. J Urol. 2019;201:528–34.

  15. Sterman JD. System dynamics modeling: tools for learning in a complex world. Calif Manag Rev. 2001;43(4):8–25.

    Article  Google Scholar 

Download references


The authors would like to thank Frida Johansson for helpful technical assistance.


This project was funded by The Swedish Research Council (2017-01735), ALF Western Sweden Healthcare region (ALFGBG-720111), and Assar Gabrielsson’s Foundation for Clinical Cancer Research (FB 18-41). Open Access funding provided by University of Gothenburg.

Author information

Authors and Affiliations



All authors collaborated in the conception and in the designing of the study. JL and CO created and run the study-specific algorithms; SH developed and run the simulation model. JL and CO analyzed data and wrote the manuscript; PH, SH, and TBE assisted in interpreting data and in editing the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jesper Lindberg.

Ethics declarations

Ethics approval and consent to participate

The study was approved by the Regional ethical review board in Gothenburg, Sweden. (Dnr: 841-16+T640-17; Regional review board replaced by Swedish Ethical Review Authority since 2019/2020). No other administrative permissions were required to access and use the medical records described in this study.

Consent for publication

Not applicable

Competing interests

The authors declare that they have no competing interests

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Grouping algorithm and simulation model description.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lindberg, J., Holmström, P., Hallberg, S. et al. An analytical approach to aggregate patient inflows to a simulation model over the radiotherapy process. BMC Health Serv Res 21, 207 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: