Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)
DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation
66 CICHETTI: THE RELIABII .ITY OF PEER REVIEW based on numerous criteria. Is the work sufficiently new, complete, and logically consistent? Is there adequate awareness of the literature and few enough errors of calculation or citation? Does the author address conventions and disagreements on methods? Is the methodology strong enough to support research claims? Does it meet all of these criteria well enough to be accepted into an underfunded journal with limited space? Especially where journals are relatively general, there are many grounds for disagreement, and for good reasons. Research does not follow a precise model. Individual reviewers cannot be well versed in all the techniques and literature on which they must pass judgment. Given these impossibilities, one relies on one's general disciplinary training and looks for evidence of the above criteria where one can pass judgment without looking foolish later (or one may simply withdraw from the review, which does not solve the editor's problem). Different reviewers will therefore see different points, and may draw different conclusions about "worth." As with judgments of athletic performences at high levels of complexity (e.g., Olympic ice dancing or diving), there may be little difference between the performances of winners and losers, but "everyone" may still distinguish careful work from that of hack writers. What is acceptable to a reviewer or editor is not a problem of agreement per se, but of journal economics. As Cicchetti tells us (sect. 8), a great many "losers" ultimately are published. The cost of delay? Time (perhaps occasionally a career, but this is a different issue). The gains? Truly "bad" work is screened out. For other work, the increased feedback by peers can lead to the benefits of substantial reworking. (The author of the target article says this is infrequent, but he gives no hard figures.) It is hard to make a case that this harms science. Even most "winners" pass quickly from sight, with hardly the ripple of a citation or recognition beyond the tenure committee (e.g., Crane 1972; Merton 1973, p. 448). What difference can it make to the general progression of science which of two or more highly complex and generally-decent-but-not-earth-shattering works is published first? Science progresses because researchers are motivated to continue trying to publish, to continue to take part in the great discourse on nature, and very occasionally, to be brilliant. I shall use Cicchetti's article as an example of reviewer problems. I would not have accepted the article for publication, but would have asked for revision and resubmission, along with a scathing note. Why? First, my weaknessés. I am not terribly familiar with the kappa statistic or the various models of intraclass correlation. In any case, the work that goes into constructing them is invisible, and I accept them at face value. I am also not up on the peer review literature, though I did take part in the discussion of Peters & Ceci's work (Eckberg 1982). I therefore cannot judge most of Cicchetti's citations. I am impressed by the number of citations, however. I can judge some things. I can tell that the article begins by drawing on the literature in philosophy of science concerning the special validity of scientific knowledge, but that this is not thorough and is basically dropped later. I take umbrage at the fact that in Table 2 the author imposes on the reader evaluative criteria for strength of agreement (from poor to excellent), and that this seems to spring from the author's brow. I am bothered that in section 4.5 the author "would predict" certain findings in an ar.ea where no research has been done. This is pure speculation. I am especially bothered that the numbers in some tables simply do not appear to add up. The numbers of reviews/manuscripts of Journal of Abnormal Psychology and Journal of Personality and Social Psychology differ in Tables 1 and 2. Why? R,'s in Tables 2 and 5 are different. Is this because of the dichotomization of data? Tables 5 and 6 purport to show differences in proportions of agreements between reviews, on whether to accept or reject manuscripts or give high or low scores to grant applications, using x 2 as the test of significance. In the lower two rows of Table 5 and all rows of Table 6, one can reconstruct category frequencies. In the upper rows of Table 5 one can quickly produce all the possible sets of frequencies fitting the presented data. There are some real problems here for this reviewer. [Note: This issue has been clarified by Cicchetti in sect. 1.4 of his response, Ed.] In ascending order of importance: (1) Why does the author (in sect. 4.7) believe the evidence shows a greater propensity to "agree" on rejections, when the results are interpreted more parsimoniously as showing that these reviewers simply reject more often than they accept? Chance overlap would yield the same patterns; (2) How does the author decide who the "two" reviewers will be, given that (at least in the case of the grant reviews; Table 4) about four reviewers is the norm? We are not told; (3) Why is it that in all cases where frequencies can be determined, n of disagreements is precisely the same in both the Acceptance and Rejection (or High Ratings and Low Ratings) columns? This should not be the case, if reviewers are selected randomly. It appears that the author merely divides splits equally by hand f c Even so, why are there always even numbers of splits? (4) I find that I can reproduce none of the x 2 scores on Tables 5 and 6, though I have tried several different techniques. Some of my scores are close to those the author provides, but none are precise and a few are off substantially. Either the author has been very sloppy with his calculations, in which case all their original data are suspect, or he is using conventions with which a given social scientist might be unfamiliar, in which case he should explain his usage. Some of my problems with this article are mere quibbles; others concern differences in interpretation. The methodological issues may be cleared up by the author in his replies; certainly, either no one else noticed the x 2 "problem," or they followed the same conventions as the author and so had no "problem." From this experienced reviewer's standpoint, there would have been sufficient questions on enough issues to have warranted sending the piece back to the author. The article is long and complex enough, however, for me to realize that another reviewer and I might disagree on the importance or significance of various points. We might even agree to disagree. This is the nature of professional judgment; it does not mean that science is in trouble. Journal availability and the quality of published research Jack M. Fletcher Department of Pediatrics, University of Texas Medical School, Houston, TX 77030 In concluding his review of the reliability of various peer review systems, Cicchetti recommends a focus on the relationship of peer review and grant funding, fearing that the unreliability of peer review leads to a failure to fund worthwhile grant applications. This focus is certainly justified in the current climate of research funding, particularly since publication and funding mechanisms use peer review procedures that Cicchetti justifiably identifies as poorly understood and potentially unreliable. One recommendation is always that more funds should be made available to reduce the probability that important research was not funded. If the relationship between space availability in journals and the quality of published research were examined, it would become apparent that the availability of journal space does not ensure that most quality research is published. Unfortunately, more journal space also ensures that considerably more research of poor quality will be published (Lock 1985). If the quality of funded research were also evaluated according to changes in the availability of funds, similar conclusions would be forthcoming. More emphasis should be placed on the goals and internal mechanisms of the journals and their pub-