Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)
DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation
87 CICHETTI: THE RELIABII .ITY OF PEER REVIEW Blind reviewer. Would reviews be fairer if the reviewers could not hide behind their anonymity? I suspect that the resulting fear of retribution would greatly reduce referees' participation in the review process. The cure may be worse than the disease. Bias against negative findings. The bias against negative findings is more complex. Would a nonsignificant difference between groups be significant with greater power (e.g., more subjects)? Could the failure represent a Type II error (failure to observe a difference when a real difference exists)? Negative findings sometimes occur when research is not done carefully (resulting in increased within-group variance) or they may be due to inadvertent fluctuations in the experimental treatment (resulting in reduced between-group variance). There is a reluctance to publish negative results because there are many more ways to fail to observe an effect than ways to observe it. In some cases, failure to replicate represents the useful establishment of boundary conditions of a phenomenon (i.e., when an effect is found under some conditions but not others). On the other hand, when the negative findings occur under conditions comparable to those in which the original findings were reported, and those negative findings are not just an example of Type II error (i.e., they can be replicated), they should be suitable for publication (see, e.g., Roberts 1976). Improvements. How can the system be improved? One of Cicchetti's suggestions is an appeal process. An informal appeal process already exists, in which authors who feel that an incorrect decision has been made can appeal to the editor. To allow newer contributors better access to the appeal process perhaps it should be formalized (e.g., authors whose submissions have been rejected would be informed that they have the option of responding to the reviewers comments). A second suggestion by Cicchetti is to increase the number of reviewers. The larger the sample of reviewers, the more reliable their combined judgment is likely to be. The increase in reliability would, I think, offset the added cost in reviewer time. Third, I would like to see distributed to reviewers a set of guidelines that warn of potential biases and suggest that reviewers try to avoid them. This may appear too simplistic, but it is a cost-effective strategy that could result in the significant reduction of unfair biases. As to the lack of reviewer agreement, it may be that such variability is an inherent characteristic of the field. There does not appear to be good agreement on what constitutes quality research, what are minor methodological flaws, or what are important findings. Much of this disagreement represents strong theoretical and methodological (confirmation) biases that, realistically, cannot be eliminated. It may be possible for editors to reduce the effect these biases have on the review of a manuscript through the careful selection of reviewers who do not have strong biases against the kind of research or direction of findings submitted, and by directing reviewers to avoid introducing their biases into the review process. Author's Response Reflections from the peer review mirror Domenic V. Cicchetti VA Medical Center, West heven, CT 06516' Electronic melt: c1cchettl@yaievm.bltnet In an earlier BBS target article on peer review it was noted that "the area that seems to be most promising that of cross-disciplinary comparisons - is still relatively unresearched" (Peters & Ceci 1982, p. 252). Within this suggested framework, a number of hypotheses received support in the current target article, namely, that across the various disciplines: (1) agreement is better on manuscript and grant submissions of perceived poor quality than on submissions of good quality; (2) better-defined (specific and specialized) areas of scientific inquiry have higher acceptance rates and use fewer reviewers than less well-defined (general and less focused) areas of scientific interest; and (3) levels of chance-corrected interreferee agreement are rather low (R, usually £ .40). The disciplines thus far investigated have included psychology, sociology, medicine, and physics. These issues were discussed in the context of research design considerations, statistical or data analytic approaches, and suggestions for improving the quality of peer review. Another important issue, discussed briefly, was how editors or granting officials use the information supplied by referees (reliable or not) to arrive at publication or funding decisions. I am gratified by the generally positive evaluations of my work and its perceived heuristic value in generating follow-up research. My Response focuses on the areas of concern expressed by the various commentators. These fall into five categories: (1) methodological, statistical, and data analytic strategies; (2) interpretation of the results; (3) using peer reviews to improve editorial/funding decisions; (4) improving the peer review process; and (5) future research in peer review. 1. Methodolological, statistical, and data analytic strategies 1.1. Corrigenda. Let me begin by pointing out several minor errors of omission and commission that have been corrected in the revised target article (compared to the preprint that was circulated to the commentators). The first, my own discovery, pertains to the data reported in Table 3, section B, which depicts the parallel relationship between acceptance rates for manuscripts submitted to Physical Review and the use of one or more reviewers. The significant relationships now become even more apparent because the last two subfields appearing in the table (Particles it Fields, General Physics) interchange positions to reflect the same ordering as in Table 3, section A. This means (as previously) that as the subfields tend toward more general focus (Nuclear Physics, Condensed Matter, General Physics, Particles & Fields), both the percentage of accepted manuscripts and the percentage of manuscripts using a single reviewer decrease significantly (p < .000001 in the former case, p = .0003 in the latter). The second error, Table 5, was caught by one of the commentators, Eckberg. In the second row of the table, the number of rejected manuscripts should read 577 rather than 578. Since the correct N of 577 was used in the calculations, the resulting chi square(d) value of 57.895 (p < .00001) is correct. The third error is that the ordering of the second and third endnotes had to be reversed to be consistent with correct footnote citation in the text. Finally, through another typographical error that escaped my review, the denominator of the Formula for R, (Model I), now the third footnote, had to be amended, by removing the previously