Accuracy of responses from postal surveys about continuing medical education and information behavior: experiences from a survey among German diabetologists

Background Postal surveys are a popular instrument for studies about continuing medical education habits. But little is known about the accuracy of responses in such surveys. The objective of this study was to quantify the magnitude of inaccurate responses in a postal survey among physicians. Methods A sub-analysis of a questionnaire about continuing medical education habits and information management was performed. The five variables used for the quantitative analysis are based on a question about the knowledge of a fictitious technical term and on inconsistencies in contingency tables of answers to logically connected questions. Results Response rate was 52%. Non-response bias is possible but seems not very likely since an association between demographic variables and inconsistent responses could not be found. About 10% of responses were inaccurate according to the definition. Conclusion It was shown that a sub-analysis of a questionnaire makes a quantification of inaccurate responses in postal surveys possible. This sub-analysis revealed that a notable portion of responses in a postal survey about continuing medical education habits and information management was inaccurate.


Background
Postal questionnaire surveys of physicians are a popular instrument to gather information [1,2]. They are often used for studies about continuing-medical education (CME) habits and information management since they are relatively inexpensive and easy to handle [2]. A major problem of such surveys is the low response rate [1]. Besides this non-response bias [3,4] there are a number of other factors which restrict conclusions from postal surveys. Alreck and Settle for example describe ten such sources of response bias [5]. But it seems that there are even more biases. Most important are: the tendency to socially desired responses (especially in surveys on sensible subjects like drug abuse or sexual habits) [6], acquiescence or the tendency for only yes-or no-responses [7,8], failure in self-perception or (technically) inaccurate statements (e.g. because of willful lies or inaccurate memories) [9][10][11]. Most of the studies about the problem of potential biases are restricted to questionnaires for patients. Therefore a MEDLINE search revealed only articles dealing with the accuracy of statements by physicians in general, but there were no satisfactory results for articles about potential in-accuracies in postal surveys about CME habits or information management (search terms -MeSH: "Physicians", "Reproducibility of Results", "Questionnaire", "Bias", "Quality Control". Title words: "Questionnaire", "Postal Survey*", "Validity", "Bias", "Inaccura*", "Accura*") [12][13][14][15][16][17]. This is also represented by the fact that authors of postal surveys in this field often do not discuss the problem of inaccuracies [e.g. [18,19]] or if it is discussed it is not quantified [e.g. [20,21]].
For this reason a sub-analysis of a questionnaire survey about CME and information habits of German diabetologists was performed. It should primarily determine how accurate the information in this study was and if the responses were credible. Furthermore I tried to evaluate if these inaccuracies could be attributed to the socially desired response bias.
The following report focuses on the sub-analysis and not on other results from the survey which are (partly) reported elsewhere [22].

Questionnaire
The data used for this sub-analysis was collected by an explorative survey about information management and CME habits (for details see [22]). For this survey a new questionnaire had to be developed. Initially a preliminary questionnaire was developed considering three already published surveys [19,20,23]. It was discussed with members of the research group and sent to experts requesting comments (practicing diabetologists, experts in evidencebased medicine, technology assessment, survey methodology, and continuing medical education). After incorporating these comments the questionnaire consisted of 92 items. It can be divided into the following sections [22]: CME in general, therapeutic decision making and behavior of problem solving, use of databases, reading habits, knowledge of technical terms and critical appraisal, personal data. 3 of 92 items were asked open.

Sample and send out
The sample comprised of 461 diabetologists in the northern part of Germany. It was selected from a database of German diabetologists (Diabetologe DDG) (URL: [http:/ /www.diabetesweb.de]). The sample represented 29% of all 1585 diabetologists in the database. Sample size was calculated with regard to confidence intervals for estimated population frequencies (95% CI): a maximal margin of error for proportions of ± 6.25% for questions answerable dichotomously was considered narrow enough (i.e. the maximum width of the 95% CI for proportions should be 12.5% for questions with only two response categories e.g. yes/no). Given the population of 1585 diabetologists this required a sample size of 213. Response rates of prior surveys ranged from 50% to 70% which results in a sample size of at least 416 persons. For technical reasons it was not possible to draw a random sample. Therefore the sample was determined by the first figure of the zip code (code 1-3).
In October 2000 the questionnaire had been distributed for the first time. One week later a reminder postcard was sent to all participants and after three weeks a new questionnaire was sent to all non-respondents. A cover letter as well as a metered self-addressed envelope were enclosed. Coding by numbers for response control was explicitly mentioned but the analysis was fully anonymous.

Sub-analysis Variables used
The analysis is mainly based on contingency tables of answers to logically connected questions. Non-consistent responses were denoted "positive". The following variables were used (see original questions in the additional file 1 1. Did respondents, who stated that systematic reviews/ meta-analyses had a strong influence on their therapeutic decision making, report that they knew these two terms? 2. Did respondents, who stated that published clinical trials and systematic reviews/meta-analyses had a strong influence on their therapeutic decision making, report that they read these kind of articles? 3. A question about the knowledge of technical terms was asked (as suggested by McColl and colleagues [19]). A contingency table was created with answers to the term absolute risk reduction (ARR) and number needed to treat (NNT). Respondents who stated that they could explain the number needed to treat but could not explain absolute risk reduction were labeled positive (the number needed to treat is the reciprocal of the absolute risk reduction).
4. There was a question on the knowledge about a fictitious technical term (the McNemar-Quality-Scale; explanation of terms was not required). Respondents who stated that they knew this scale were labeled positive. 5. Did respondents, who stated that they appraised the scientific value of an article by evaluating its methods section (as suggested by Williamson and colleagues [20]) report that they read this section of an article?

Test for socially desired response bias
The assumption was that the tendency to socially desired responses would be the most dominant response bias in this survey. It was also presumed that this would be most prevalent in the question about technical terms. Two tests were used to support these assumptions: 1. A knowledge-score was calculated for each respondent using responses to the question about technical terms (All items/technical terms were included except the McNemar-Quality-Scale. Every cross at category: I understand this term and could explain it to others was valued with one point. Every cross at category: I have some understanding was valued with a half point. The sum was rounded. Therefore maximum score was twelve points). This knowledge-score was cross tabled with the positive answers of variable 5 (knowledge of the McNemar-Quality-Scale).

A contingency table with answers to the fictitious Mc-
Nemar-Quality-Scale and the most unknown technical term was created. This term was the Alpha-error/Type-I-error. Only 50% (117/233) of all respondents knew this term.

Statistical Analysis
Descriptive statistics were mainly used. The chi 2 -test was used for comparison of categorical data (Yates continuity corrected for comparisons with 1 degree of freedom). Fisher's exact test was used if the expected cell values were less than 5. The mediantest was used for a comparison of the knowledge-scores because distributions were neither normal nor comparable [24]. Two-sided p-values < 0.05 were attributed as significant. Analyses were performed with the use of EpiInfo 2000, version 1.0.4 and KyPlot, version 2.0.

Knowledge of influential factors
15% (35/232) and 23% (53/230) respondents who stated that meta-analyses and systematic reviews respectively have a strong or very strong influence on their therapeutic decision making, had no or only a rough understanding of the meaning of these types of articles.

Reading and influence of different article types
The rates of respondents, who stated that the different article types have strong or very strong influence on their therapeutic decision making, but do not read these articles were very low. Rates were 3% (7/235) for clinical trials, 0% (1/235) for systematic reviews/meta-analyses, and for narrative reviews there was no discrepancy.

3./4. Knowledge of the NNT and a fictitious term
16% (38/234) could explain the number needed to treat but could not explain absolute risk reduction (Table 3). Overall 7% (17/234) of the respondents allegedly had at least some understanding of the McNemar-Quality-Scale  of which one stated that he/she could explain this scale to others (24 respondents (10%) reported that they knew the scale but did not understand it). Table 4 shows whether respondents who reported that they evaluated article-quality by examining the methods section actually read this part of an article. 13% (22/172) of responses were contradictory if they were interpreted strictly. Categorized in two groups (always/often and seldom/never) 8% (14/172) of contradictory responses remained.

Test for socially desired response bias
The median knowledge-score in the group positive for variable 4 (knowledge of the McNemar-Quality-Scale) was 10 (IQR: 8-11.5; range: 6-12; mode 10 and 12). In comparison the median knowledge-score of the other respondents was 6 (IQR:

Methodological issues
Because selection of the sample was not randomized systematic biases are possible. Demographic characteristics of all German diabetologists are subject to limited availability. Therefore an assessment of the representativity of the sample is restricted. The different proportion of general practitioners and pediatricians can be considered as bias. But whether this is of relevance for this analysis remains questionable (an association of positive responses and specialty could not be found; checked for variables 3, 4, and, 5; data not shown). The response-rate lies under the average of other surveys [25,26]. But no major differences in the four available demographic characteristics could be detected between the respondents and the sample ( Table 1). The relatively higher rate of undeliverable questionnaires among hospital-based physicians is certainly negligible since the number of persons is too small. Non-response bias may be another problem but its relevance seems as well questionable because an association between proportions of positive answers and sex, work place, or location of work place could not be found (checked for variables 3, 4, and 5; data not shown) (see also [22]). Nevertheless caution should be applied when generalizing the results of this survey and rates or numbers should be interpreted as a trend rather than at face value.
Another limitation lies in the methodology of this analysis. Since actual procedures of physicians were not observed (e.g. how they read journal articles) it is only possible to determine inaccuracies indirectly. Though it would be preferable to conduct such a study it is not feasible for practical reasons. Furthermore this analysis allows no extensive conclusions about the nature of the inaccuracies [27]. Although it was tried to evaluate the tendency for socially desired response it is not possible to definitely conclude which biases may contribute to the inaccurate responses. Qualitative methods would be needed for these kind of studies.

Interpretation of findings 1. Knowledge of influential factors
The rate of physicians who ascribed a high impact on their therapeutic decision making to factors not well known was very high with values of 15% and 23% respectively. These rates decreased to 2% and 4% respectively if one concedes that factors which were only roughly known can also have a strong influence.

Reading and influence of different article types
The rate of respondents who stated that published clinical trials had a strong influence on their therapeutic decision making but who read such articles only infrequently was very low. But it should be taken into account that virtually all surveyed physicians read this kind of articles always or often if they appear in journals they had subscribed (207/ 237; 87%).

3./4. Knowledge of the NNT and a fictitious term
The rate of respondents who allegedly could explain the number needed to treat but could not explain absolute risk reduction was very high. As McColl and colleagues did not perform an analysis like this a comparison between both studies is restricted. For such an analysis raw data are required. But the data in their publication indicate that there were also inconsistencies. In their survey 35% of respondents could explain the term number needed to treat but only 31% could explain the term absolute risk [19]. The alleged knowledge of the McNemar-Quality-Scale was lower than the knowledge of the NNT. But the value was also around 10%. One might argue that positive respondents confused the fictitious term with McNemar's statistical test or that they thought the researchers had been confused. But this seems not very likely since nobody during the development of the questionnaire referred to this potential problem. Moreover somebody who knows a statistical test would know the term Alpha-Error/Type-I-Error which was not the case in the majority of the positive respondents.
The proportion of inaccurate responses to this knowledgequestion should be viewed as a very conservative estimate.
A recently published study found that virtually nobody who stated that he/she allegedly understands the technical terms of the questionnaire developed by McColl et al. actually did so [28].

Examining and reading the methods section of articles
As for the other variables, the proportion of positive answers was about 10%.

Test for socially desired response bias
Given the other results of this analysis and the kind of response categories in this survey it seems reasonable to assume that the tendency for socially-desired responses would be the most prominent response bias. To attribute the alleged knowledge of the McNemar-Quality-Scale to the tendency for socially desired responses it must be interpreted in association with the knowledge-score and the responses to the most unknown term. If the knowledgescores are low among those respondents who allegedly knew the McNemar-Quality-Scale, this response behavior can not be interpreted as socially desired. Other explanations have to be considered instead of. But the analysis showed that their knowledge-scores were well above the other respondents. This could lead to the conclusion that these 17 respondents (7%) have had a tendency for socially desired responses. Their knowledge of the Alpha-Error/ Type-I-Error indicates on the other hand that these physicians were by all means willing to admit knowledge-gaps because they reported a lack of knowledge or understanding more frequently than the others. Therefore it seems unlikely that the inaccurate answers can be attributed to the socially-desired response bias.
Acceptance acquiescence [29] as another potential and important response bias is also unlikely due to the other findings of this analysis and the response categories in the questionnaire. The tendency for only yes-or no-responses can be ruled out as only 7 questions with a yes/no-response-category were asked. Thus it is believed that the in-accuracies in this survey are rather a problem of careless reading/answering (Which again might have been resulted from the long questionnaire or busy respondents although an association between the weekly hours of work and positive responses could not be found. Checked for variable 4; data not shown) or a failure in self-perception/ overestimation of competency. Furthermore misunderstanding of questions or about specific terms might also have contributed to the inaccuracies as was shown in a recent study [28].

Conclusions
As a result of this analysis the proportion of inaccurate or illogical responses in a survey about CME habits and information management of physicians was around ten percent. Although some researchers try to correct such inaccuracies [11] it has to be determined how accurate such methods are.
It seems unlikely that respondents had a significant tendency for socially desired responses. The analysis indicates that it rather seems to be a problem of careless reading/answering of questions, a failure in self-perception or a misunderstanding about specific terms or questions. However in order to understand response biases and the processes involved qualitative studies are needed.
The method described is considered appropriate and feasible for evaluating the accuracy of responses in surveys but further research is necessary to validate it. It should be applied to future questionnaire surveys about CME habits and information management of physicians to enable appropriate assessments of such studies.

Competing interests
None declared.