Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)
DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation
69 CICHETTI: THE RELIABII .ITY OF PEER REVIEW advancement of science, in which case it seems far less serious that "good" articles would be rejected by unreliable reviews. I submit that the latter view only makes sense if one also endorses two unlikely additional assumptions: (1) that rejecting some of the few "good" articles along with the many "bad" ones will still reduce the absolute amount of false leads chased down, while not unduly delaying the eventual resubmission and publication (presumably by someone else) of the previously rejected, good idea; and (2), that researchers who cannot get funded or published will neither be missed nor unduly hurt when they then leave the society of scientists. Pending the arrival of a better "science" of science, a better psychology of science, even a better political science of science, one can only observe that, in addition to all of its personal satisfactions, science is a complex social activity offering diverse social rewards of great value and diverse punishments of unsuspected force. The true sources of the most fruitful ideas and work in science remain mysterious. Scientists differ in temperament, in their values, and in their skills at communication. There is no one way in science. There are no 10 ways. Scientists have embraced electronic mail and have welcomed multiple new journals despite the undeniable difficulties and frustrations created by the information glut. One fact seems clear, then: Most of us will always want to know what others are thinking, doing, and finding. And whatever journal we are reading, whoever its reviewers or gatekeepers may be, still, we decide for ourselves what is flawed design, what is misleading interpretation, and what is "good" science. As electronic means for sharing papers and creating reprints evolve further, I submit that it makes sense (every way except politically, which is, alas, perhaps the most significant way) to separate reviewing, editing, and publication. Let us have multiple professional reviewers, some of whom advise authors prior to publication and some of whom rate articles once they appear in print. But let us normally publish, electronically, all submitted papers when the author, as editor, thinks each is ready. Let us include with each paper careful abstracts, conscientious keyword lists, and brief professional reviewer ratings whenever available, to serve us as guides through the great wilderness of articles we might all then fear. Let there be a gatekeeper to electronic publication to keep out undue repetition from authors, and, when necessary, to enforce basic conventions of style and some reasonable quotas on how often one may publish in the system. Then, let authors vie not for space in prestigious journals, but for the attention of prestigious reviewers and readers. History and citations can later tell us what was useful and what was not. The problem of reliability would then be all but finessed. Replication, reliability and peer review: A case study Michael E. Gorman Humanities Division, School of Engineering and Applied Science, University of Virginia, Charlottesville, VA 22903 Electronic mall: meg3c@prime.acc.virginia.edu Cicchetti begins his paper with a brief discussion of Mertonian norms. Their concern is to see whether these norms are being followed by scientific journals. An issue he leaves for others is the question of how to evaluate empirically the efficacy of such norms. For example, Cicchetti discusses the way in which journals are biased against negative findings. Experimental simulations of scientific reasoning, in which science and engineering students work on abstract tasks, provide independent support for the value of seeking negative results, and also specify under what conditions a disconfirmatory strategy will be most useful (see Gorman & Gorman 1984; Klayman & Ha 1987). Similarly, Cicchetti points out that there appears to be a very strong bias against replication studies. In a series of experiments I found that a strategy I called "replication-plus-extension" was superior to straight replication (Gorman 1989). Consider, for example, a student who wants to make sure that the triple "2,4,6" is really an instance of an abstract rule, given that the student knows as many as 1 triple in 5 will be subject to what Doherty and Tweney (1988) have called "system-failure error," that is, if it appears to be correct, it will actually be classified as incorrect and vice-versa. This student could propose "2, 4, 6" again. But what if the cost of an additional experiment is high? Then it makes more sense to propose a similar triple, "10, 12, 14" for example, which will not only replicate the previous one but extend the pattern to new instances. From a logical standpoint, this is a flawed strategy: "10, 12, 14" does not really replicate "2, 4, 6." But from a satisficing standpoint (see Giere 1988), the strategy makes good sense. In fact, students working on more complex tasks of this sort can employ it effectively (again, see Gorman 1989). Cicchetti points out that one strategy for getting around journals' bias against replications is to embed the replication within another study or studies. Replication-plus-extension is an alternate strategy. Obviously, the two can be combined. I am aware of no empirical data regarding journals' preference for either strategy; future research should be directed at this question. For example, one could investigate replication-plus-extension through experimental or quasi-experimental designs by sending three versions of the same results to a wide range of journals, one of which was deliberately written as a replication, another as a replication-plus-extension and still another embedded with a novel finding. Such reliability studies as Cicchetti's are important, but they should be complemented by two additional kinds of research: (1) Experimental studies directed at determining the normative value of philosophical or sociological prescriptions about science (see Fuller 1989, for a discussion). (2) Qualitative "biographies" of manuscripts, in which the same paper is followed through the revision and publication process, often spanning several journals. Cicchetti disparages these sorts of studies, yet they can reveal aspects of the peer-review process inaccessible to quantitative studies and suggest variables like replication-plusextension that merit more rigorous exploration in quantitative designs. Is there an alternative to peer review? Richard Greene Veterans Affairs Central Office, Washington, DC 20420 Cicchetti's target article is a major contribution to the peerreview literature. It is especially useful in its collection and analysis of the critical research studies in this field and raises a number of important criticisms of the peer-review process. I will concentrate my comments on the grant-review process, because this relates to my experience managing the national research program of the Department of Veterans Affairs (DVA). Cicchetti refers to a number of studies showing that reliability among peer reviewers is highest when considering what is unworthy of support. The real problem comes in assigning a scientific priority to a set of studies that are all, or mostly all, supportable. The problem is well described by the author's citation of James B. Wyngaarden (Culliton 1984) referring to distinguishing "'shades of excellence' among competing grants that are all at the top." There is a consensus in the scientific community that the peer review process was not designed to measure the difference between two highly meritorious projects, one with a 155 priority score that will be funded and one with a 156 score that will not receive support. And yet, the
