Explicit criteria development
The criteria for measuring the appropriateness of cataract surgery were developed according to a previously described explicit method [6], i.e., the RAND appropriateness method, which consists of the following steps.
First, an extensive literature review was performed to summarize existing knowledge on the efficacy, effectiveness, risks, costs, and opinions about the use of phacoemulsification.
Second, from this review, a comprehensive and detailed list of mutually exclusive and clinically specific scenarios (indications) was developed in which cataract surgery by phacoemulsification might be performed. This list contained 765 indications in three categories: simple cataract (cataract with no other ocular pathologies that may affect the visual prognosis), cataract with diabetic retinopathy, and cataract with other ocular pathologies that may affect the visual prognosis. Each indication was specified in sufficient detail that patients within a given indication were reasonably homogeneous. The indications included the following variables. For patients with simple cataract, best-corrected visual acuity in the cataractous eye (three subgroups: ≥0.5, 0.2–0.4, ≤0.1), best-corrected visual acuity in the contralateral eye (three subgroups: ≥0.5, 0.2–0.4; ≤0.1), visual function (four categories: no impairment, glare, difficulty with recreational activities, or difficulty with activities of daily living); surgical complexity of the cataract procedure (three categories: a) No surgical complications or minor complexity anticipated, as the presence of narrow anterior chamber (corneal amplitude-iris <=2), deep-set eyes, extreme myopia without retinal involvement, posterior synechiae, or a small pupil. b) Medium complexity anticipated: Pseudoexfoliation with mydriasis >3 mm and without subluxation of the crystalline lens, dense cataract, poor pupil dilatation (mydriasis >3 mm, according to the dilatation guidelines), vitrectomized eye, poor patient cooperation during examination, and the presence of two or more minor factors. c) High complexity anticipated. Subluxation of the crystalline lens, fibrosis of the anterior capsule of the crystalline lens, brunescent cataract, posterior polar cataract, and the presence of two or more factors of medium complexity); and laterality of cataract (unilateral or bilateral).
For patients with diabetic retinopathy that may affect the visual prognosis and for patients with other ocular pathologies, the same variables were studied plus the anticipated visual acuity after intervention (in three subgroups, ≥0.5, 0.2–0.4; ≤0.1).
The 765 indications resulted from all possible combinations of the variables described and the respective categories. Additional file 1 (Appendix 1) contains a description of the variables and their categories. Cases in which phacoemulsification was performed in combination with other ophthalmic surgical techniques were excluded.
Third, we compiled a national panel of ophthalmologists (doers and non doers of cataract extraction) recognized in the field, the names of whom were provided by their respective medical societies and members of our research team. The panelists were provided with the literature review and the list of indications, and they rated each indication for the appropriateness of performing phacoemulsification, considering the average patient and average physician in the year 2004. Appropriateness was defined as meaning that the "expected health benefit exceeds the expected negative consequences by a sufficiently wide margin to make cataract surgery worth performing."
Ratings were scored on a 9-point scale. Cataract surgery for a specific indication was considered appropriate if the panel's median score was between 7 and 9 without disagreement, inappropriate if the value was between 1 and 3 without disagreement, or uncertain if the median rating was between 4 and 6 or if the members of the panel disagreed. Disagreement was defined as occurring when at least four panelists rated an indication from 1 to 3 and at least another four rated it from 7 to 9. Agreement if less than four panelists rated the indication outside the 3-point region (1–3; 4–6; 7–9) containing the median; and indeterminated if agreement nor disagreement was found. This method did not attempt to force panelists to reach agreement on appropriateness.
The ratings were confidential and took place in two rounds, using a modified Delphi process. The first round was performed by mail before the members of the panel met. The results were collated and presented to the 12 panelists at the 1-day second-round meeting. Each panelist also received the anonymous ratings of the other panelists and a reminder of his or her own ratings. After extensive discussion, the panelists revised the indications according to the above-mentioned definition of appropriateness. Each panelist rated 765 separate indications.
To determine the use of all theoretical indications created in clinical practice, data related to the algorithm variables were gathered for 1,053 patients on a waiting list to undergo cataract extraction by phacoemulsification from six ophthalmologic services at six area hospitals. These data were collected prospectively by the ophthalmologists of each center. The number of theoretical indications used in clinical practice was calculated for each of the three diagnostic groups.
Statistical analysis
The mean appropriateness ratings of all indications and the mean change from rounds 1 to 2 were calculated for each panelist. The mean difference from each panelist's score for each indication from the panel median of each indication also was measured for both rounds. A "conformity score" [7], describing each panelist's tendency to change his or her ratings in the direction of the round 1 panel median rating also was calculated. This score was defined as a decrease in mean absolute deviation from the round 1 median between rounds 1 and 2. The higher the conformity score, the more the individual's round 2 rating shifted toward the median of the round 1 rating.
We studied the reliability of the 12 panelists scores at 2nd round by performing an intraclass correlation coefficient.
Study of the validity of the explicit criteria: Determinants of appropriateness scores and their contribution to the model explanation were assessed with the least-squares regression model[10], with the median of the panelists' ratings being the dependent variable for each indication, and the variables in the algorithm being the covariates. Ordinal logistic regression also was used, and the classification of the panelists' scores in the categories of appropriate, uncertain, or inappropriate was the dependent variable[11]. Both models were compared regarding the degree of variability explained by each variable. R-square and -2 log L statistics were used, respectively.
Algorithms in decision tree form, which should permit rapid estimation of appropriateness in practice, were compiled from the final results by classification and regression trees (CART) analysis [12]. CART was used to build a classification tree with the appropriateness score of the panel ratings as a dependent variable, as categorical variable (appropriate, uncertain or inappropriate). Misclassification error of the CART, compared to the original panel classification, as the gold standard, was calculated as the ratio of the number of indications erroneously classified by the classification tree divided by the total number of indications.
All statistical analyses were performed using the SAS for Windows, version 8, except for the CART analysis with which we used S-Plus 2000 (MathSoft Inc., 1999) statistical software.