Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)
DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation
60 CICHETTI: THE RELIABII .ITY OF PEER REVIEW because there are social and other disincentives to strong disagreement, panel ratings cannot be used as evidence of real consensus. Nevertheless, in a period in which the ratio of available funds to trained scientists is shrinking, it seems worthwhile to consider full scientific investigation of this review process. For example, it is at least possible that averaged ratings from four independent reviewers of a proposal would agree with an average from four other independent reviewers at a higher level than would the ratings between reviews from two independent groups that had been carried óut in the current fashion. A larger number of noninteracting "primary" reviewers of grant proposals might be no more demanding of either scientists' time or of agency funds than the current procedure. Other investigations of potential biases or sources of unreliability in the present grant review process are also easily imagined. For example, it is widely thought that inexperienced reviewers may be "tougher" on proposals than more experienced reviewers. A peer review committee officer has also told me that grants reviewed early in a session tend to be discussed more thoroughly and thus evaluated more critically than those reviewed later. Information on both of these issues could be compiled readily. If they were found to be nontrivial sources of bias, reviewers could be made aware of these problems in an attempt to minimize them. Finally, it is worth emphasizing that measures of agreement may overestimate the reliability of the judgments concerning grants or manuscripts in the critical region of decision where the cut-off occurs. In the present framework of extremely tight funding, a cool reception by a single reviewer may be sufficient to preclude a fundable priority. Therefore, differences between rating scale use habits of different committee members can make some members much more influential with regard to the funding outcome than others. If ratings were standardized for each committee member before they were combined, or if members were each to place grants into the same preset distribution, potential abuses relating to this likely source of error would be eliminated. Consensus and the reliability of peer-review evaluations Stephen Cole Department of Sociology, State University of New York at Stony Brook, Stony Brook. NY 11794 Research I have conducted suggests that the low levels of reliability in peer review evaluations described by Cicchetti are not an artifact of the peer-re view system or of reviewer bias, but reflect the low levels of cognitive consensus that exists at the research frontier of all scientific disciplines (Cole 1983; Cole et al. 1981). I have argued that the level of cognitive consensus in the social sciences is not significantly lower than that in the natural sciences. Cicchetti presents some evidence supporting this view. Discussing the peer-review system used by Physical Review Letters, he quotes from the editors' policy statement on how difficult it is to make decisions; in only 10 to 15% of submissions do the 2 referees agree on whether the article should be accepted or rejected. Cicchetti concludes that if a more systematic study were undertaken, "we would predict that levels of referee consensus for Physical Review Letters would be of the same relatively low order of magnitude . . . characterizing general journals in many other disciplines" (sect. 4.5). The assumption that the natural sciences have higher levels of consensus than the social sciences has been used to explain and justify the higher rejection rates of social science journals (Hargens 1988). I see the difference in rejection rates between natural and social science journals as resulting from differences in the amount of space available, the diffuseness of a field's journal system, and, most important, norms concerning the desirability of making Type I or Type II errors (Cole et al 1978; 1988). Natural scientists prefer to make Type I errors; social scientists, Type II errors. My analysis leads me to be more critical than Cicchetti of current journal practices. He believes that since most articles in high-rejection-rate fields are eventually published and since authors of many rejected articles in low-rejection-rate fields do not resubmit, the system is working well. I agree with him for the low-rejection-rate fields, but disagree for the high-rejectionrate fields. If there are approximately equal and low levels of consensus in fields like physics and sociology, respectively, this means that physics journals are publishing papers that many physicists believe are of little significance and sociology journals are rejecting papers that many sociologists would find useful. The policy followed in physics allows the diverse scientific community to decide what is useful and neglect the published articles that are not useful. The policy followed by the sociology journals allows a sample of two or three referees influenced by norms calling for high rejection rates to make this decision. This has many negative consequences for the development of the field. We must realize that as a result of lack of consensus and norms supporting high rejection rates, many of these rejections are "unjustified, " thus giving the field a pervasive sense of inequity, bias against some work styles, and irrationality. This serves to reduce motivation and seriously interferes with the communication of ideas. In physics, two journals, the Physical Review and the Physical Review Letters, publish a large portion of all the literature. Bymonitoring what is published in these journals physicists can be sure of being up-to-date on their research interests. In sociology the two leading journals publish a very small portion of the literature in the field. Much research that would be of use to some segments of the community is rejected from high-visibility journals and must be published in obscure sources. This makes it more difficult to keep up with the latest developments in areas of interest. Communication is further hampered by long delays, sometimes amounting to years, resulting from the inefficient publication system. The only disadvantage for a field like sociology in switching its publication system to one more similar to that used by the physicists would be the increased cost of journal publication. But given the importance of publication for advancing one's career, it would seem that most authors would be willing to reduce the length of their papers and pay modest page charges, even if they had to pay these out of their own pockets. Another possible argument against increasing the acceptance rate in high-rejection-rate fields would be the potential decrease in the quality of published articles. Among those who argue that journal rejection rates result from the level of disciplinary consensus there is the implicit assumption that because of a lack of agreed-on criteria in the social sciences most articles submitted to the journals are of "poor" quality and not "really" publishable. There are two problems with this assumption. First, it assumes that, because most of the articles submitted to natural science journals are accepted, they "really" deserve to be published. Many studies of citation patterns, however, have shown that the bulk of articles published in physics journals, for example, are rarely if ever cited (Meyer 1979). There is also qualitative evidence that natural scientists are just as likely as social scientists to disparage the quality of articles in their journals. For example, Mulkay and Williams (1971), in their study of physicists in England, report that "all our respondents thought that the vast majority of papers in the journals which they read were of poor quality or of little significance." (p. 74) The second assumption is that the articles rejected by the social science journals "deserve" to be rejected. Stinchcombe and Ofshe (1969) conducted an analysis in which they assumed