The aims of this paper were to define suitable resource categories for incomplete cost data collection, to assess the validity of alternative strategies, and to point out the consequences with respect to efficiency.
To determine the resource category suitable for incomplete data collection, inherent recall bias and mean differences between quarters should be assessed. In the case of minor recall bias, Clarke et al. recommend only the collection of complete data by extending recall windows to ensure that no information is lost and data are still collected less frequently . Applied to our study, this means that data collection in the first quarter for the previous time period of 3 months and in the fourth quarter for the previous 9 months would be preferable to incomplete data collection. Recall may be influenced by the frequency and severity of events, [9, 11] so that less frequent visits and more serious events improve the memory of resource use.
It became apparent that it is not appropriate to collect data on hospitalisation, rehabilitation and care insurance benefits using an incomplete algorithm, but to account for the entire target period by using longer recall periods. These cost categories were both relevantly different between quarters and the associated events were less frequent or more serious. Inpatient care indicates a serious event, and patients who had at least one admission to hospital had on average 1.9 stays per year (Table 2). As rehabilitation only occurs once a year, it is a rare event; only five patients reported two stays per year. Care insurance benefits are granted by care insurance funds, following a lengthy needs assessment process. For these cases, recall bias is hardly likely. Bhandari et al. also stated in their review that hospitalisation is a salient event that may be gathered accurately by applying longer recall periods  without an appreciable increase in recall bias. In addition, patient reports on hospital stays could be cross-validated against electronic hospital records where these exist. Clarke et al. recommended trading off recall bias against information loss, which would be caused by incomplete data collection. Only if the degree of variation introduced by incomplete data collection is smaller than the bias introduced by recall error should a short time period for cost assessment be preferred .
Attention should be paid to the fact that care insurance benefit is a proxy for informal care. If costs incurred for informal care are determined by the hours of care provided by relatives, neighbours or friends, incomplete data collection can be used in a similar way to the collection of outpatient nursing services or paid household help.
The resource categories physiotherapy, ambulatory clinic in hospital, medication, consultations (physicians), outpatient nursing service and paid household help are deemed to be appropriate for incomplete data collection for several reasons. First, there is no evidence for differences in mean costs between the middle quarters (quarters two and three) (Table 3). Second, the portion of these costs is about 30% of total costs, so that an inaccuracy of 5% causes a difference of only 1.5%. The main cost-driving events are hospitalisation and rehabilitation accounting for about 70%. Ridyard et al.  advise in their systematic review the balancing of resource use data collection between the main cost driving events, the frequency of data collection and the burden on the researcher. Third, existing differences in mean costs between quarter one and all the other quarters do not become important in the case of extrapolation. Although cost data from quarter one are employed in the case of interpolation, only the sum of direct healthcare costs differed significantly from complete cost collection (Table 5, Alt 4) when data were collected three times.
The lower costs in the resource categories physiotherapy, ambulatory clinic in hospital, medication, consultations (physicians), outpatient nursing service, and paid household help in quarter one (Table 3) can be explained by increased use of rehabilitation and hospital admissions during this time, which supplants ambulatory resource use.
Significant differences in mean costs between quarter four and quarters two or three occur in the case of medication data (Table 3). This reflects the drug regimen guidelines [35, 36] for the acute coronary syndrome, which recommend the use of clopidogrel, an antiplatelet drug, as follow-up treatment for up to 9 months only.
In our analysis, extrapolation turned out to be the better instrument for replacing the omitted periods, as quarter one showed consistently lower costs. Data from quarter one were not used for extrapolation, but for interpolation. Therefore, mean differences between complete and incomplete cost collection from extrapolation were mostly smaller than differences from interpolation (Tables 4 and 5). Furthermore, medication data proved to be only suitable for omitting quarter two and replacing by quarter three, as quarter four is not a representative quarter.
A comparison of our results is restricted by the lack of publications of empirical analyses regarding incomplete data collection. In a study of 174 patients with a stable chronic disease (fibromyalgia and low back pain), Goossens et al. showed no significant differences between multiple time frames of incomplete data collection . Thus, the authors concluded that, for patients with chronic diseases, incomplete cost data collection poses no problem for economic evaluations. They compared the differences in median by Wilcoxon’s signed rank test but not in arithmetic means, and they did not distinguish between different cost categories. As the comparison of means is central to any economic evaluation, non-parametric tests that address differences in the median and analyses of log-transformed costs that address differences in the geometric means are not well suited for this purpose . However, in comparison with our study, the sample size of 174 participants was even smaller, which poses severe limitations with regard to the statistical power of the analysis.
Nevertheless, the authors indicated that, in the case of acute diseases, randomised clinical trials and chronic diseases with seasonal effects, the necessary assumptions of agreement between the different time periods could not be met. Clarke et al.  argued that irregular consumption patterns add estimation error. Seasonal effects only become important if the time frame of recruitment is relatively short. If recruitment or the start of the intervention cover a 1-year period, seasonal effects occur for individuals but not for groups. As the comparison of arithmetic means between groups is central to any economic evaluation, group estimates have to be valid, but the results for individuals may differ from each other . Our analysis shows that, in the case of acute myocardial infarction with a 1-year follow-up, several kinds of resource categories are more appropriate for incomplete cost data collection than others. Generalisation of our findings is limited to the elderly population with acute diseases, followed by a chronic course associated with a continuous treatment scheme, as patients with acute myocardial infarction have similar patterns in the long-term course of disease and treatment. Applying incomplete data collection, several points have to be considered when choosing the method (inter- or extrapolation) and omitting quarters. Only those periods can be omitted for which it can be assumed that they are representative of other periods. Equally, only periods for which one may assume that they represented omitted periods can be used to replace the omitted periods. For other studies, the choice of omitted periods may depend on the disease and the intervention so that these assumptions must be tested in pilot studies or based on expert opinions or literature research.
It is important to reduce the burden on study participants, especially in older participants, by decreasing the frequency of data collection. Because this can be achieved by incomplete data collection or extending recall windows, one should carefully consider and differentiate which time frame and method of cost data collection are appropriate for the respective resource category. As an example, Heinrich et al.  assessed resource use by employing different time frames for the respective resource category, whereas Hakkaart-van Roijen et al.  and Kimman et al.  did not distinguish between different resource categories.
A further problem resulting from incomplete cost collection arises from the withdrawal of informed consent or death of the study participants, as it can be assumed that missing data increase because of longer time periods between data collection. For this reason, we recommend not omitting the first quarter.
Although incomplete cost collection will universally lead to increasing variance of the estimate, [13, 14] we only found partially larger standard deviations than in complete data collection. Goossens et al. exclusively found smaller standard deviations, which they attributed to random error, and they recommended including ‘more’ patients . To our knowledge, no calculation concerning increasing standard deviations and sample size has been published so far. When we assume that a suitable resource category will be collected incompletely, our estimate requires a larger sample size of about 3% at most. Nevertheless, more time would be saved as a result of incomplete data collection than extra time required for assessing additional patients. Furthermore, the burden on study participants and clinical investigators can be diminished through the economic data collection effort. When conducting economic analysis alongside clinical trials by means of incomplete data collection, sample size calculation has to be modified.