Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)
DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation
64 CICHETTI: THE RELIABII .ITY OF PEER REVIEW Peer review: Explicit criteria and training can help Fred Delcomyn Department of Entomology and Neuroscience Program, University of Illinois. Urbana, IL 61801 Electronic mail: bugoutfd@uxl.cso.uiuc.edu I doubt there is an active scientist in the United States who could not write an essay on peer review at least as long as Cicchetti's excellent target article. It is inevitable that our passions will be aroused by a process that can determine the success or failure of applications for funding or the acceptance or rejection of journal articles. After all, careers are at stake. Does peer review work? Yes, up to a point, as the target article shows. Peer review can be used to classify documents like grant applications into broad categories such as excellent, fair, and poor. Expecting it to allow one to distinguish between applications that are all in the excellent category, however, is like expecting to be able to measure the diameter of a nerve cell with a meter stick. I think the prospects for refining peer review for the selection or rejection of manuscripts for publication is greater. Cicchetti makes several good suggestions, but other things can be done as well. I have two specific proposals: Make the criteria to be used in a review more uniform and explicit, and "train" reviewers. Explicit criteria. What is the point of making review criteria more explicit? We all know what constitutes a good paper, right? Wrong. As Cicchetti points out, two reviewers can make similar comments about a paper and yet have opposite recommendations as to its acceptability. Let's cut down on this kind of conflicting advice by agreeing on the ground rules. These rules may be different in different disciplines, even in different fields, but there is no reason journals cannot develop an explicit set of guidelines for acceptable manuscripts, guidelines that can be published in the journals themselves. What should these criteria or guidelines be? Some journals already provide a partial list in their forms to reviewers. The Journal of Experimental Biology, for example, asks reviewers for an assessment of experimental techniques, presentation of data, and quality of reasoning, among others. There are clearly many other aspects of a paper that can be evaluated. Based on my experience as a physiologist, I will make three specific suggestions. (1) Do the experiments whose results are reported answer the questions set out in the introduction? I see no point in cluttering an already crowded literature with a publication that confuses issues by seeming to address one question when it actually addresses another (or none at all). (2) Are the experiments carefully executed and controlled? The issue here is whether conclusions can confidently be drawn from the results, or are the procedures so flawed that no firm conclusions can be made. (3) Are the conclusions that the author(s) draw supported by the data actually presented? There is a place for speculation and formulation of new hypotheses, but authors must obviously take care to separate their conclusions from their speculations. What if the data seem to contradict someone's favorite hypothesis? Too often one hears of the struggles of a researcher to publish work that disputes someone else's data or interpretations. Here is where explicit criteria would be so helpful. If the research is well done, and the answer to each of the three questions above is affirmative, then there is no reason not to publish the work. It should not be the job of the referees or editor to settle scientific disputes. As long as there is no error that can be identified in the work, let it be published and let those whose work is called into question do the necessary experiments to settle the matter. That's the best way to make progress. Using explicit criteria will not eliminate the ability of an editor to select what to publish. Such other more subjective criteria as importance or timeliness can still be used. Explicit criteria will just cut down on rejections because of the controversial nature of someone's work. Reviewer training. Do we really need to "train" reviewers? Of course we do. I doubt that anyone who has reviewed more than a few years would say that their early reviews were as good as their later ones. Even experienced reviewers find their approaches to papers changing with time. One not only learns what to look for in a papier, one also learns how to phrase a criticism so it does not seem like a personal attack on the author. What can be done to train reviewers? First, journals can draw up more explicit and detailed instructions to reviewers than are presently sent out. (In neurobiology, my experience is that usually no instructions at all are sent.) A copy of the criteria or guidelines for what constitutes a good paper would be a start. Some explicit statement that reviewers should stick to objective descriptions of the paper, and not make derogatory comments about the authors) might also help. Another approach I have found useful as a reviewer is to receive copies of the remarks of other reviewer(s) after I have sent mine in. Cicchetti mentions this possibility. From the standpoint of curiosity, it would be interesting to know who the other reviewer was, but this is not really necessary. What is important is seeing a colleague's opinion of the paper, to see if you missed an important point, or for younger reviewers, just to see how someone else handles the entire review process. It is not likely that the system of peer review will change any time soon. If we have to live with it, the least we can do is to organize it in such a way that we make sure we all play by the same agreed-on rules. Periodic evaluations of the peer review system, such as this target article, are important steps to this goal. Different rates of agreement on acceptance and rejection: A statistical artifact? Marilyn E. Demorest Department of Psychology, University of Maryland Baltimore County, Catonsville, MD 21228 Electronic mail: demorest@umbc.bltnet An important substantive finding that emerges from Cicchetti's target article is that reviewers of manuscripts and grant proposals appear to have higher rates of agreement on rejection/disapproval than on acceptance/approval. This conclusion is based on category-specific rates of agreement as shown in Tables 5 and 6: Given that one reviewer makes a particular recommendation, what percentage of the time does the second reviewer agree? The data clearly indicate higher percentage agreement for negative recommendations (70%—83%) than for positive ones (41%-60%). A statistical interpretation of these findings is that the higher agreement rates on negative recommendations reflect their higher prevalence. The omnibus agreement statistics reported throughout the review (intraclass correlation and weighted or unweighted kappa) corrected for chance levels of agreement. (Indeed it is the adoption of chance agreement as the null model for evaluating observed agreement that makes the reliability of peer reviews appear so dismally low!) The same standard has not been applied in evaluating agreement on a category-by-category basis, however. When category-specific agreement rates are corrected for chance, they are shown to be identical for acceptance and rejection. To illustrate, consider the data presented for the Journal of Abnormal Psychology. The reconstructed agreement matrix is