Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)
DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation
43 CICHETTI: THE RELIABII .ITY OF PEER REVIEW Table 2. Levels of reviewer agreement in the evaluation of the scientific merit of submitted manuscripts A. Behavioral Science No. of R, or Journal Reviews Kappa Value Sources "Social Problems" (1958-61) 193 .40' Smigel it Ross (1970) "Journal of Personality and Social Psychology" 286 .26» Scott (1974) "Sociometry" 140 .21« Hendrick (1976) "Personality and Social Psychology Bulletin" 177 .21« Hendrick (1977) "Journal of Abnormal Psychology" (1973-8) 1319 .19« Cicchetti & Eron (1979; and unpublished) "American Psychologist" (1977-8) 87 .54» Cicchetti (1980); Scarr & Weber (1978) "American Psychologist" (1978-9) 72 .38« Ciccheti (unpublished) "Journal of Educational Psychology" 325 .34« Marsh & Ball (1981) (1978-80) "Developmental Review" 72 .44« Whitehurst (1983; 1984) "American Sociological Review" 22 .29« Hargens it Herting (1990) "Law & Society Review" 251 .23« Hargens it Herting (1990) B. Medicine No. of R, or Journal Reviews Kappa Value Sources 2 Untitled Biomedical Journals 1572 .34' Orr & Kassab (1965) "New England Journal of Medicine" 496 .26' Ingelfinger (1974) A Major Medical Subspeciality Journal 866 .37« Cicchetti it Conn (1978) "British Medical Journal" 707 .31* Lock (1985) "Physiological Zoology" 209 .31« Hargens & Herting (1990) Note : »Intraclass R values; ''Kappa values; The criteria of Cicchetti & Sparrow (1981); Fleiss (1981); in which kappa or R, values < ,40=POOR; .40—.59= FAIR; .60 - 74 = GOOD; and .75-1.00=EXCELLENT. Note that levels of observed agreement (where available) ranged between 68.30% and 77.00% and the levels of chance-corrected agreement were all significant at or beyond the .05 level. Note also that the R value of .54 for reviews of the manuscripts submitted to the "American Psychologist" dropped to .38 on replication. sion seems to be based on a statement made some years ago about one of the most prestigious journals in the physical sciences: We have found, for example, that in a sample of 172 papers evaluated by two referees for the Physical Review (in the period 1948-56), agreement was very high. In only five cases did the referees fully disagree, with one recommending acceptance and the other, rejection. For the rest, the recommended decision was the same, with two-thirds of these involving minor differences in the character of proposed revisions (Zuckerman & Merton 1971, p. 67). Unfortunately, this brief analysis provides no answers to some very basic questions: (1) What type of rating system was used by the referees? (2) Given the high acceptance rates of Physical Review, how much agreement between reviewers would one expect on the basis of chance alone? (3) What is meant by "minor differences in the character of proposed revisions"? and (4) How representative a subset is this sample of all the manuscripts submitted at that time? The question of representativeness seems the most important. Commenting recently on this issue, Hargens (1988) and Hargens and Herting (1990b, p. 17) note the following; One reason that studies of referee reliability are relatively rare for physical-science journals is that such journals often use the single initial referee system. Thus, data on pairs of referee assessments of all submissions are unavailable for these journals. Those manuscripts that do receive at least two independent referee evaluations under this system are an unrepresentative subset of all manuscripts. Thus, nonexperimental data on referee agreement for these journals, such as the evidence reported by Zuckerman and Merton, should be reviewed with caution. Hargens is right in his conclusions, especially with respect to the structure of the journal Physical Review during the early study period (1948-56) from which the Zuckerman & Merton (1971) data were derived. From that time until 1969, the Physical Review did not allocate separate sections to physics specialty areas or suhfields. Beginning in 1970 however, and continuing to the present, the Physical Review allocated its total pages to four distinct suhfields: genera) physics, condensed matter, nuclear physics, and particles and fields. Data, deriving from the Physical Review and Physical Review Letters, Annual Report 1986, indicate that although the overall acceptance rate of Physical Review for 1986 7 (75%) remained consistent with previous years (an average of 77% between 1969 and 1986), the percentage of manuscripts accepted in the four suhfields varied rather widely. These data indicate that the acceptance rates were 81% for nuclear physics and 78% for condensed matter,