Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)
DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation
93 CICHETTI: THE RELIABII .ITY OF PEER REVIEW Table 3. Effect of editorial summary rejection of 333 manuscripts on the overall reliability of peer review of manuscripts submitted to Journal of Abnormal Psychology (1973-1978) A. Based on two independent reviews First Second Review Review Accept Reject Total Accept Reject 181 173 172 470 353 643 Total 354 642 996 po (ovendl ) PC(overall) Kappa (or R.) = 65.4% = 54.1% = .24 r<W, = P O(reject) = 181/353.5 51.2% 470/642.5 73.2% B. Adding the 333 editor's rejections to the reject-reject ceil First Review Second Review Accept Reject Total Accept Reject 181 173 172 803 353 976 Total 354 975 1329 P0(overall) PC(over«ll) Kappa (or R,) = 74.0% = 60.9% = .34 P< 3(»crept) = 51.2% 82.3% The great majority of commentators viewed the target article as a worthwhile endeavor, although they differed on their specific interpretation of what the results mean; two remaining commentators, however, Kiesler and Bailar, questioned the value of such research. These two commentators share the minority view that the only meaningful goal of peer review is to improve decisions about which submissions should be accepted (or approved) and which should be rejected (or disapproved). As such, the issue of reliability is essentially irrelevant to them. They also express the view that high levels of agreement signal that there is too much redundancy in the peer review process, that it is not working well, and that a balanced review has not been achieved. Kiesler is convinced at a basic conceptual level that high levels of reliability are incompatible with what he terms "wise" editorial and funding decisions. He states specifically that to expect high levels of reviewer agreement is "naive" because it falsely assumes that reviewers are randomly drawn by editors. I would submit that herein lies the most serious error in Kiesler's reasoning. In fact, if he were to choose reviewers randomly in his own general area of focus (the broad field of psychology), this procedure would almost guarantee levels of reviewer agreement even lower than what has been reported. Given that Kiesler needed a Freudian theorist as well as a sophisticated statistician to obtain a balanced review (using his hypothetical example), the probability that such expertise could be obtained on the basis of purely random selection procedures would indeed approach zero. In fact, any set of reviewers selected at random in any genera] focus area (behavioral science, medicine, general suhfields of physics) would almost perforce, be expected to disagree to a greater extent than those chosen specifically for their areas and levels of expertise Rourke correctly intimates that the validity of the comments of randomly selected reviewers would also be comprised because of insufficient knowledge about the area they would have been asked to evaluate. (A similar view is expressed by Lock.) In short, the balanced selection of reviewers should, if anything, enhance both the reliability and the validity of the resulting reviews. If we accept Bailar's commentary at face value then to expect the peer review process to be "reliable," "fair," and "objective" would be considered an "inappropriate" goal. A careful reading of Bailar's comments suggests that as an editor he chose to work around the obvious unreliability, unfairness, and subjectivity of the peer review process for the Journal of The National Cancer Institute (JNCI). As one example, his regular use of reviewers who were clearly biased (i.e., would never recommend publication or would never criticize their colleagues) would prompt other commentators to act quite differently (I agree). Thus Kraemer would remove reviewers who "condemn everything" or have an apparent conflict of interest with the author(s) of the paper under review Similarly, other commentators would rather remove than live with or "work around" other obvious biases in the peer review system (I again agree). These biases include "confirmatory bias" against "negative" research findings; well-conceived replication studies (Corman, Lock, Salzinger, Schönemann, Zentall), innovative research (Armstrong & Hubbard, Lock); the time of day that grants are evaluated, subjective "rating scale use habits" of grant reviewers, and the hypothesized harsher (more negative) evaluations provided by less experienced grant reviewers (Cohen). In summary, for Bailar to allow individuals who are clearly biased or who may have a potential conflict of interest to remain as "regular" reviewers stretches to the breaking point my limits of permissible peer review practices. Consistent with the views of peers at large, I am totally opposed to the practice. It is also somewhat curious that Bailar voices concern that ethical issues were not discussed in the target article. His comments follow closely his voicing obvious frustration with not being able to discuss such issues directly in connection with the Peters & Ceci (1982) publication about eight years ago. The fact of the matter is that about 20% of the authors' reply was devoted to the ethical issue. Mahoney addresses the ethical issue more broadly and I endorse his sanguine remarks heartily. Another issue that both Kiesler and Bailar seems to have overlooked is that high quality research (worthy of support) is integrally related to: (a) asking important questions; (b) designing and executing the research in an exemplary manner (utilizing proper controls); (c) using state-of-the-art instrumentation (and/or test materials); (d) writing clearly and succinctly; and (e) presenting a compelling discussion of the results and their implications (or heuristic value) for furthering scientific advancements in the field. Because of the interrelatedness of these five evaluation attributes, my many years of experi-