Test property | Measure used | Performance of Berlin test [10] | Performance of Fresno test [9] |
---|---|---|---|
Content validity (test covers entire topic of interest) | Expert opinión | Expert opinion (five teachers in EBP) | Revisions based on experts' suggestions |
Inter-rater reliability (degree to which 2 scorers rate a single performance similarly) | Inter-rater correlation | Total score 0.96 (IC 95%: 0.92–0.98) | Ranged from 0.76 to 0.98 for individual items, total scores 0.98 |
Internal reliability (degree to which all test questions on the test measure a single construct) | Cronbach's α average of all possible split half correlations | 0.75 | 0.88 |
Item difficulty (relative difficulty of each item) | % of candidates who answer achieve a passing score | Not given | Ranged from moderate (73%) to difficult (24%); no easy items |
Item discrimination (ability of each item to discriminate between those with overall high scores and those with overall low scores) | Item discrimination index (ranges from -1.0 to 1.0) | Not given | Ranged from 0.41 to 0.86, no items had negative or weak discriminations |
Construct validity (evidence that the test measures the construct it intends to) | Mean scores of experts and novices compared by t test | Significant difference, higher expert scores than novices | Significant difference, higher expert scores than novices |