Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993) | Könyvtár

Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)

DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation

46 CICHETTI: THE RELIABII .ITY OF PEER REVIEW Table 4. NSF and COSPUP reviews: Summary of interreferee consensus levels Area of Study No. of Proposals No. of Reviews Mean No. Reviews per proposal R, A. NSF open reviews chemical dynamics 50 242 4.84 .25 economics 42 155 3.69 .37 solid state physics 50 192 3.84 .32 B. COSPUP open reviews chemical dynamics 50 213 4.26 .32 economics 49 181 3.69 .36 solid state physics 49 189 3.86 .34 C. COSPUP blind reviews chemical dynamics 50 212 4.24 .18 economics 49 198 4.04 .37 solid state physics 50 203 4.06 .33 Note: AU R, values are statistically significant at beyond the .005 level. blind reviews of grants submitted in the area of chemical dynamics to .37 for NSF open and COSPUP blind reviews of grants submitted in the field of economics. Similarly, the data on the reliability of peer review of AHA grants (the final calculated priority score) are expressed by Wiener et al. (1977, p. 309, Table 1) in terms of an intraclass correlation coefficient of .37 (p < .001). 4.7. Statistical meaning of low levels of reviewer agreement. The available data are clear. Quite low levels of chance-corrected interreviewer agreement are obtained in every area of scientific inquiry, from abstract, manuscript, and grant reviews. What does this mean from a biostatistical point of view? First, it must be understood that R, (or kappa) statistics are omnibus indexes, meaning that they reflect only the overall level of chance-corrected agreement. "Overall" means reviewer agreement averaged over all possible rating categories (e.g., "accept," "resubmit," "reject" for manuscripts, or "high," "medium," "low" approval, or disapproval for grants). It has been shown (e.g., by Cicchetti 1985, in the context of journal peer review, and by Cicchetti 1988, more generally) that the overall level of agreement is nothing more than a weighted average of agreement on all possible rating categories (see also, Fleiss 1981; Spitzer & Fleiss 1974). It has also been demonstrated (again, Cicchetti 1985; 1988) that low levels of R, or kappa can be produced not only by low levels of overall agreement, but also by large discrepancies in agreement on the various rating categories available to reviewers. We are referring specifically to wide discrepancies in reviewer agreement levels on approval (acceptance) categories as compared to rejection (disapproval) categories. Some of the available literature on the reliability of peer review (e.g., Cicchetti 1985; Ingelfinger 1974) suggests, indirectly, that reviewer agreement on decisions to reject manuscripts is appreciably higher than agreement on acceptance. Is this true in general? Based on the available data, is there an analogue for the peer review of grant proposals? To address these questions more specifically, one must develop rational criteria for dichotomizing reviewer agreement levels as "accept" or "reject. " The analogue for grant reviews would be to dichotomize reviewer agreement between proposals receiving high ratings and those with low ratings. In the case of the Journal of Abnormal Psychology, 86% of those 203 manuscripts receiving ratings of either "accept" or "accept subject to revision" by both reviewers were accepted for publication. Analogously, of those 803 manuscripts receiving a rating of "resubmit" by both reviewers, or "reject" by one and "resubmit" by the other, or "reject" by both, 95% were rejected by the editor. This provides a rationale for combining "accept" or "accept subject to revision" into an "accept" category and "resubmit" and "reject" into a "reject" category. With respect to grant reviews, Cole et al. (1978) note that of those NSF applicants receiving evaluations of "very good" to "excellent" (40-50), 92% were awarded grants. Conversely, of those applicants with grades ranging from "poor" (10-19), "fair" (20-29), and "good" (30-39), 86% of them were denied grants. This provided a rationale for dichotomizing on the basis of peer-review scores of 40-50 (high probability of approval) and 10-39 (high probability of disapproval). The number of individual NSF and COSPUP reviews for any given grant varied between 1 and 8. Since the mean number of ratings for NSF open and COSPUP open reviews was quite similar (e.g., see Table 4), however, it seemed reasonable to use these more robust scores in our analyses. The results based on these dichotomies are presented in Table 5 for manuscript reviews and in Table 6 for grant reviews (again, reported here for the first time). When one considers the manuscripts judged acceptable by one reviewer and then compares them to the corresponding set of manuscripts considered acceptable by a second reviewer, the agreement levels for "accept" vary between 44% and 66%. When the same set of analyses is performed on manuscripts classified in the "reject" category, however, the agreement levels vary between 70% and 78%. Direct comparisons between the proportion of reviewer agreement on accept versus reject recommendations produces chi square(d) values with corresponding p levels ranging between . 10 and < .00001. As expected,

Oldalképek

Tartalom