Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)

DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation

62 CICHETTI: THE RELIABII .ITY OF PEER REVIEW superficial, casual, thoughtless, insensitive, inefficient, and therefore unreliable refereeing. Any increase in the number of manuscripts and grant applications that scientists are called upon to referee as a result of the introduction of multiple refereeing is likely to exacerbate the malaise and eat further into the time they should be devoting to doing science. In summary, multiple refereeing seems both counterproductive and gra­tuitously labor intensive. The most important safeguard, not even mentioned by Cic­chetti, against bias and incompetence on the part of referees, would be an automatic author's right of reply to referees' criticisms. Under the peer review system in its conventional form, authors of scientific papers and grant applicants often find themselves in a Kafkaesque situation analogous to that of a person prosecuted and condemned in a court of law without any right of defense. Sometimes, scientific work is rejected on grounds that the authors believe, rightly or wrongly, to be demonstrably invalid. In my view, before reaching a final judgment, editors and those who award research grants, should routinely solicit the authors' responses to the referees' crit­icisms, and if necessary the referees' replies to the authors responses, until a clear resolution of the issue emerges. It may sometimes be necessary to submit the original manuscript together with the referees' reports, the authors' responses, and the referees' replies to a qualified independent arbiter before a fair decision can be reached. This procedure was implemented when I was editor of Current Psychology: Research 6- Reviews. I found it immensely helpful, and there is no doubt that at the very least it increased the face validity of the manuscript evalua­tion process. Although this is clearly no panacea, I feel sure that if it were generally implemented, it would make authors, editors, and even referees feel happier about the peer review process. I am reasonably optimistic that the reliability and validity of the process would correspondingly improve. Evaluating scholarly works: How many reviewers? How much anonymity? John D. Cone School ol Human Behavior, United States International University. 10455 Pom era do Roed, San Diego, CA 92131 Cicchetti documents fairly convincingly that: Researchers agree on the "normative" criteria to apply in judging a paper's schol­arly worthiness; they disagree on the application of these criteria to given manuscripts and on the punishability of given papers. Cicchetti also asserts the commonly held belief that "levels of interreferee agreement are substantially higher for journals in the physical sciences." It would be of some interest to know more about interreferee agreement on judgments about manuscripts submitted to phys­ical science journals. Conducting such studies would require care in controlling certain likely confounding factors, however. For example, in comparing agreement for relatively focused journals (e.g., Nuclear Physics, Condensed Matter) with rela­tively more diffuse ones (e.g., General Physics, Particles and Fields), the number of reviewers would need to be held con­stant. The common belief that reviewing is more reliable in the physical sciences may stem from the greater use of the single initial reviewer system in the physical sciences. It might be that such a system yields higher acceptance rates. This is because higher acceptance rates might be prevalent when less critical reviewing is undertaken. The basis for this reasoning is the assumption that reviewing is at least partially under audience control. If so, the mere presence of another reviewer could lead to more critical reviews and, in turn, to higher rates of rejection. If audience control w a factor, the "partial anonymity of the reviewer case" should lead to greater rejection rates than the . "total anonymity case." It would be interesting to investigate this prediction. The well-designed study would vary both the number of reviewers and the level of anonymity and use acceptance rates and interreviewer reliability as its dependent variables. My prediction would be for lowest acceptance and highest agree­ment rates for the multiple reviewers subjected to only partial anonymity, because reviewers who know that others are per­forming the same task and that agreement is to be checked will tend to be more conscientious. The increased vigilance associ­ated with such reviewing will turn up more concerns about aspects of the submission and lead to a greater probability of rejecting it. Related data on this issue are available in the direct observation assessment literature, where it has been shown that observers who know they are being checked for agreement tend to be more reliable and to record more of the behavior being observed (e.g., Romanczyk et al. 1973). Cicchetti provides no evidence for his assertion that "manu­scripts requiring more than one reviewer tend to be those that are problematic." It could be that using multiple reviewers merely turns up more problems. This being the case, the use of more than one reviewer should be associated with lower rates of acceptance, as Cicchetti's Table 3 indeed reveals. An undiscussed variable in the Cicchetti review is submission rate. Journals with fewer submissions might be expected to have higher rates of acceptance, a supposition given some support by the data in Table 3. In behavioral psychology the proliferation of journals has led to correspondingly fewer submissions to any one journal. Associated rates of acceptance have therefore gone up. Research on reviewer reliability needs to take this into account. A journal with relatively fewer submissions (e.g., the Nuclear Physics section of Physical Review) will tend to have higher acceptance rates than one with two or three times the submissions (e.g., General Physics, Condensed Matter). Accep­tance rates or judgments and their reliability should be com­pared for journals with equivalent submission rates; this would help control for any tendency toward leniency just to keep the pages filled. Another variable worthy of study is the acceptance/rejection base rate of a particular journal and the reviewers' knowledge thereof. While these can be adequately controlled with appro­priate statistics (e.g., kappa, R t) in the computation of agree­ment, the reviewers' judgments themselves may be partly determined by their knowledge of base-rate acceptance levels for the particular journal. The base-rate problem has long been studied in clinical decision making in psychology; it is well established that clinicians' "hit" rates for particular diagnoses vary with the base rates of the diagnoses in the population. If agreement with the editor's ultimate decision is viewed as a "hit," and something reviewers strive to accomplish, base rates would need to be controlled when comparing acceptance and, possibly, reviewer agreements across journals. Finally, while I am generally sympathetic to Cicchetti's ob­servations and recommendations and found his review a good stimulus for some of my own verbal behavior, I did puzzle over his summary of Mahoney's studies. He asserts that the best available evidence shows that reviewers apply subjective crite­ria in judging scholarly submissions. As support for this asser­tion he points to the fact that manuscripts were "accepted or rejected on the basis of whether the findings were positive, negative, or mixed, rather than on the basis of their worthiness." It is not clear what is subjective about this. Indeed, basing decisions on outcome should be one of the more objective approaches to the process. Moreover, contrary to Cicchetti, it should have a positive influence on the reliability and validity of peer review. After all, at least in the behavioral sciences, it is not obvious that there is all that much that is worthy about a study that fails to reject the null hypothesis.

Next

/
Thumbnails
Contents