Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993) | Library

Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)

DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation

71 CICHETTI: THE RELIABII .ITY OF PEER REVIEW that Cole et al. [1981] exaggerated the role of chance in the NSF grant-review process by basing their analysis on the unrealistic assumption that the reliability of the entire NSF peer-review process equalled that of a single referees evaluation). For example, under classical-test theory assumptions, if the average referee's reliability equals only .25, a composite formed from 2 evaluations should have a reliability of . 40, and a composite based on 3 should have a reliability of .50 (Hargens & Herting 1990b). For most biomedical and behavioral science journals, editors reject papers that receive two negative evaluations, solicit a third evaluation for papers that receive split reviews, and personally read papers that receive two favorable reviews in order to advise authors about what, if any, revisions should be made before the paper will be accepted. Thus, most papers that eventually appear in these journals receive at least three evaluations; those receiving split reviews from the initial referees are sometimes reviewed by four or five people before they are accepted. Only papers that receive negative evaluations from both initial referees are evaluated by only two people, but, as Cicchetti shows, negative evaluations are substantially more reliable than other evaluations at these journals. Thus, the overall reliability of the peer review process as a whole should be significantly higher than the levels of individual referee reliability implied by the reported associations between referee recommendations. I think it likely that Cicchetti will prove right in predicting that studies of referee agreement in general physical science journals, such as Physical Review Letters, will yield associations similar to those in the medical and behavioral sciences. I also suspect, however, that his speculation that specialized physical science journals will show greater agreement on positive compared to negative recommendation categories is incorrect. The data in Cicchetti's Table 6 and data on referee agreement for Physiological Zoology (Hargens & Herting 1990a) are inconsistent with that claim. If my suspicion is correct, even specialized physical science journals are likely to show modest levels of referee agreement. In part, this should happen because referee reports on submissions to such journals contain a relatively low proportion of negative evaluations (Hargens 1990), which will in turn tend to limit the overall reliability of individual referees' evaluations, because there are relatively few of the most reliable recommendations. Once again, however, the peer-review process at such journals tends to mitigate the damage that low associations between referee recommendations might cause. Specifically, these journals usually use a "when in doubt, accept" decision rule (Zuckerman & Merton 1971), and require at least two negative recommendations to reject a paper. As a consequence, these journals typically publish a substantial majority of submissions. Ifit is true that negative recommendations are more reliable than positive recommendations at these journals, then final editorial decisions will be largely determined by the most reliable (negative) recommendations rather than by less reliable (but more positive) ones. Regardless of the fate of these speculations, however, it is clear that the various elements of a peer-review system are interrelated, and that an assessment of any one element must place it in the context of the entire system. Confusion between reviewer reliability and wise editorial and funding decisions Charles A. Kiesler Psychology Department, Vanderbllt University. Nashville, TN 37240 Electronic mall: kieslec1@vuctrvax.bltnét Reviewing manuscripts for publication and grant proposals for funding is merely a means to an end, not an end in itself. The desired end product should be wise decisions about what is published and funded. Defining the reliability of such reviews as the correlation between reviewer ratings confuses process with outcome, and this is Cicchetti's main problem. In addition, there are important differences between the reliability of grant reviews and the reliability of article reviews; differences sufficiently important that I shall tackle them separately. Let me take up the issue of journals first. A high correlation between reviewer ratings of submitted manuscripts should neither be expected nor desired. The expectation that these ratings should be highly correlated is naive; it almost assumes that reviewers are randomly drawn by the editor. As an editor, I intentionally act in ways that lower the correlation between ratings. For example, I give the manuscript to reviewers who have very different strengths or skills to bring to the manuscript: One might be a very sophisticated statistician, another a Freudian theorist. One would not expect a high correlation between them because they are evaluating different aspects of the manuscript; a valuable service to the editor. Sometimes, I also give manuscripts to two reviewers who I know will represent quite different points of view. I might select one reviewer who I know agrees with the general theoretical orientation and another who argues strongly against it. In this manner, I can see at one time both the very best and the very worst things one could say about the manuscript, and therefore make some judgment about how different and innovative it is. Furthermore, since I think a scientific field should develop an expanding pool of educated reviewers, I often give a manuscript to two sophisticated and experienced reviewers, and to another person (usually young) whom I have not consulted before. In this way I can discover good new reviewers, as well as show young people what is expected in a review. (I return all the reviews to their authors, so they can see each other's comments.) All of these actions, which I submit contribute to making wise decisions about whether or not one accepts a manuscript for publication, are certainly counterproductive if one is seeking a high correlation between reviewers' judgments. But they are typical behaviors of a good editor who intends to play an active and decisive role in the final evaluation of a manuscript. This notion of the editor having a very active role in the judgment of a manuscript seems lost on Cicchetti. Not only does he not discuss the kinds of processes described above, but he even ignores the role of the editor in making the judgment. For example, he recommends that there be three reviewers rather than two, to avoid a one-to-one vote. In my own case, when sending a manuscript out for review, I try to read just enough to make the judgment about whom to select as a reviewer. Then, when the reviews come back to me, I set them aside and review the manuscript myself with reference to the reviews. That's why my reviewers are always N + 1, and I can easily compare and contrast what the reviewers contribute. The proportion of manuscripts one can accept is also a critical part of any investigation of reliability. I had just finished a stint as associate editor when Scott (1974) came out with his original article criticizing the low reliability of reviewers for the Journal of Personality and Social Psychology (JPSP ) and I noted an interesting phenomenon. In only about 15% of the cases did reviewers of manuscripts for JPSP agree that the manuscript should definitely be published. Cicchetti would regard that as an unreliable review process. Only about 15% of the manuscripts could be accepted for publication in JPSP, however. If the review process is supposed to lead to a wise decision rather than producing a high correlation between ratings, the reliability of the JPSP process was very good. The outcomes were right in line with the needs of the editor to publish only a small subset of the manuscripts submitted. Whether that is the 15% to be published depends on whether or not editors see themselves as playing a very active role in the process. I think the

Thumbnails

Contents