Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)

DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation

65 CICHETTI: THE RELIABII .ITY OF PEER REVIEW shown in Table 1, assuming equal marginal rates for the two reviewers. As shown in the table of expected frequencies, by chance there would be 35% agreement for acceptances and 65% agreement for rejections. If the observed agreement rates are corrected for chance using kappa, the result is identical values of .14, which coincide with the values of the omnibus kappa and the intraclass R reported by Cicchetti. A moment's reflection (and a little algebra) reveals that for the dichotomous case it could not be otherwise. Any disagreement is simultaneously a disagreement about acceptance and about rejection. Reviewers cannot in principle disagree at different rates for the two catego­ries, once the chance level of agreement is taken into account. Because there are additional degrees of freedom when three or more categories are analyzed, it is possible that differential agreement could have been identified had the data not been dichotomized. Data on submissions to the American Psychol­ogist, presented by Whitehurst (1984) and analyzed by Cicchetti (1985, Table 4, p. 567), provide an example. Table 2 shows the five gating categories, their prevalence, percentages of observed agreement, chance agreement, and chance-corrected agree­ment. Of the three observed percentages, the highest is clearly for outright rejection. This category was used 50% of the time by the reviewers, however, and therefore its expected agreement is also high. The chance-corrected agreement rates are quite comparable for unconditional acceptance and unconditional rejection. Although Cicchetti is careful to interpret his results cau­tiously and note limitations on generalization, he speculates that journals with higher acceptance rates might demonstrate the "reverse phenomenon," higher agreement rates for acceptance than rejection. Indeed, just such a prediction would be made from a consideration of chance levels of agreement if the base rate for acceptance exceeds 50%. If correction for chance is deemed appropriate when evaluating overall agreement among reviewers, it must surely be relevant when considering catego­ry-specific rates of agreement. Although reviewers may use different criteria and judgment processes for negative and positive evaluations, substantive interpretation of differences in agreement rates for acceptance and rejection should be sus­pended pending evidence that the differences exceed what would be expected by chance. Table 1 (Demorest). Reviewer agreement for the "Journal of Abnormal Psychology" (from Cicchetti et al. Table 5: Review Number 2) Observed Frequencies Category Reject Accept Total % Agreement Review Accept 258 204 462 44 No. 1 Reject 599 258 857 70 Total 857 462 1,319 Expected Frequencies Category Reject Accept Total % Agreement Review Accept 300 162 462 35 No. I Reject 557 300 857 65 Total 857 462 1,319 Note: Intraclass R= . 14; kappa for agreement on acceptance = (.44 — .35)/(l — .35) = .14; kappa for agreement on rejection = (.70 - .65)/(l - .65) = .14. Table 2 (Demorest). Category-specific agreement (%) for American Psychologist submissions (from Cicchetti 1985) Type of Agreement Corrected Category Prevalence Observed Chance for Chance 1 10.3 55 6 10.2 50.5 2 8.6 26.7 6.7 21 4 3 18.4 68.8 18.3 61 7 4 12 6 54.5 12.5 48.0 5 50.0 75.9 50.0 51.7 Note. Categories are defined as 1 = accept as is, 2 = accept with minor revisions, 3 = reject and recommend resubmission after revision, 4 = reject and recommend resubmission to an­other journal, 5 = reject. APPENDIX The kappa coefficient for category-specific agreement may be formed in the same manner as the conventional kappa statistic: (p 0 — p c)/(l ~ p c). Letting n u, n 1 2, n 2 1, and n 2 2 represent the frequencies in a 2 X 2 agreement matrix, the observed agree­ment rate for category 1 is the number of agreements in category 1, divided by the average number of category 1 ratings made by the two observers (Cicchetti 1985): „ nn Poll ) i(n„ + n 1 2) + (n„ + n 2 1))/2 - 2n n/(2n,i + ni 2 + n 2i). The number of agreements expected by chance for category 1 is (n n + n 1 2) (n n + n 2,)/(n n + n 1 2 + n 2 1 + n 2 2). Thus, the chance agreement rate for category 1 is: = 2(n n + ni 2) (nn + "21 ) 0( 1 (nn + "12 + "21 +1122) (2nn + ni 2 + n 2i) Substituting these values in the formula for kappa and simplify­ing yields: £ _ 2("n"22 — "2i nia ) 1 2nnn 2 2 + n 2i z + n J 2 z + (nu + n^ (n 2, + n 1 2) The same result emerges if the subscripts are interchanged and K is calculated for category 2 rather than category 1. Thus for the dichotomous case, the category-specific agreement rates, K, and K 2, are identical. When nonreliability of reviews indicates solid science Douglas Lee Eckberg Department of Sociology. Winthrop College. Rock Hill, SC 29733 Cicchetti begins by arguing that low inter rater reliability in reviews is a scientific "problem" that challenges science's claim of special knowledge. Should we accept this justification for their research? I believe not. I reject the idea that inter rater agreement has a relationship to validity of scientific knowledge at any but extremes of reliability. There are several interlocking reasons for this. Decisions to accept or reject submissions seldom rest on whether or not findings are "true." One seldom sees patently silly submissions; on this "truth" basis almost all submissions could be accepted. But as Cicchetti points out, decisions are

Next

/
Thumbnails
Contents