Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)
DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation
61 CICHETTI: THE RELIABII .ITY OF PEER REVIEW the validity of a judgment of an article to be about .70. (We know from the data presented by Cicchetti that it is actually much lower.) They then showed that, given this assumption and the fact that only about 15% of submitted articles are published, almost as many papers that "truly" deserve to be published will be rejected as will be accepted. Given the real reliability of judgments, it is probable that more papers that "truly" deserve to be published are rejected than accepted. Even under the current system most sociologists believe that the bulk of the articles published in the leading journals are of poor quality and of little interest. As a result of low levels of consensus, these feelings are probably common in all scientific fields. Additional evidence against the view that lower rejection rates would reduce quality are the findings of Garvey et al. (1970) that a significant portion of articles published in "core" social science journals had previously been rejected by one or more journals. I am not suggesting that journals in fields like sociology publish all or even a majority of articles submitted. I am suggesting, however, that they gradually increase the proportion of submissions published. If low levels of peer review reliability are caused by a lack of consensus, is there anything we can do to improve the reliability? Cicchetti suggests increasing the number of reviewers. Because the selected reviewers are essentially a small sample from the population of eligible reviewers, the larger this sample is, the more likely it is that the sample statistic (the mean rating of the reviewers) will approximate the population statistic (the mean rating we would obtain if all eligible reviewers participated in the evaluation process). But this would not necessarily help us make a "better" decision about whether to publish the paper. Would we want to make publication contingent on the relative proportion of the population who would recommend publication? Following such a policy, innovative work that goes against current ways of thinking might not be published. The situation for the distribution of grants is different. Here there is a limited amount of money to be distributed and the scientific community does not have the power to increase the size of this pool. It is therefore necessary to be able to give priority ranks to submitted proposals. Because of the inherent lack of consensus on research-frontier science, it is inevitable that many worthwhile proposals will be rejected and that some proposals of little value will be funded. This was the major finding of my peer review study (Cole ct al. 1981). The problem here is the failure to recognize lack of consensus as the reality we must deal with. If we recognize this, there are a number of steps we can take to reduce (but never eliminate) the impact of random factors on the allocation of grant funds. The most important step is for such funding agencies as the National Science Foundation to recognize publicly that many rejected proposals are as worthy of funding as many accepted proposals. If they were to do this, they could set up an appeals procedure in which appeals would be treated sympathetically instead of as the complaints of "cranks. " If such an appeals system were functioning properly, a significant portion of appeals should result in the awarding of grants, even at the expense of reducing the amount of funds available for the next round of new proposals. In summary, the data suggest that the reliability of peer review can be improved by increasing the number of reviewers, but that given the inherent lack of consensus in science, this will not help solve the problem. Lack of consensus must be recognized as a reality; we can then introduce policies to minimize its effect on the development of knowledge and the careers of individual scientists. Unreliable peer review: Causes and cures of human misery Andrew M. Colman Department of Psychology, University of Leicester, Leicester LE1 7RH, England According to John Ziman (1968), the referee involved in the process of peer review is "the linchpin about which the whole business of science is pivoted" (p. 111). But, as the same commentator pointed out, "the most vexed and contentious topic in the business of scientific communication is the role of the referees, their danger as censors of new ideas, the procedures for appeal against their decisions, and so on" (Ziman 1976, p. 104). Cicchetti has marshalled a considerable body of evidence that shows referees' evaluations of scientific documents to be lamentably unreliable, and the topic is more vexed and contentious than ever. I shall confine my commentary to two possible remedies, only one of which was discussed by Cicchetti and to what I see as the root cause of the problem. Cicchetti summarized several arguments for and against blind review, which is designed to eliminate the effect of referee bias toward individual authors or institutions. The debate about blind review is somewhat scholastic, in my view, because there is little evidence to show that this kind of crude referee bias is a significant factor. Even Peters őr Ceci's (1982) well-known data on the fate of published articles resubmitted with fictitious authors and institutional affiliations can best be explained in terms of random error without invoking referee bias, and Occam's razor bids us reject the bias hypothesis in favor of the simpler random error null hypothesis (Colman 1982b). One important point that is worth adding to Cicchetti's remarks about blind review is that a grant applicant's past record of research could with some justification be considered a significant factor in predicting the likely outcome of any new award that the applicant might receive and ought, perhaps, to be taken into account by the referees. Blind review entails the deliberate concealment of this potentially relevant information. The use of multiple (more than two) independent referees is not a remedy that appeals to me, although it has its supporters, including Behavioral and Brain Sciences (BBS). My reservations about multiple refereeing are based partly on the findings of research in social psychology and partly on commonsensc considerations. Experimental evidence suggests that the involvement of several referees would produce a well-documented phenomenon characterized by a decrease in individual effort, called "social loafing" (Latané et al. 1979), and would also encourage diffusion of responsibility (Darley Őt Latané 1968). Both of these phenomena are likely to undermine the general quality, and hence the reliability of referees' reports. People tend to apply themselves more diligently and to behave with greater social responsibility when they feel that their input is important and that their efforts are likely to be instrumental in influencing outcomes (Golman 1982a, Chapter 9), but in the peer review process this feeling of instrumentality is bound to be an inverse function of the number of referees. Second, multiple refereeing tends to increase the nonproductive component of scientists' workloads. The volume of material that requires refereeing is already daunting: Some 40,000 scientific journals currently publish approximately two new articles per minute (Mahoney 1982). Refereeing manuscripts and grant applications is difficult, time-consuming, and generally unrewarding work. What is worse, conscientious refereeing is an ultimately self-defeating activity because it tends to generate ever-increasing workloads. Conscientious referees find their popularity with editors increasing and more and more manuscripts landing on their desks long after their own research has begun to suffer, until they cannot even éope with their refereeing work efficiently. It is clear that the reinforcement structure of science punishes virtuous behavior and rewards sloppy,