Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993) | Library

Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)

DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation

79 CICHETTI: THE RELIABII .ITY OF PEER REVIEW negative results rarely get submitted to begin with. Hence, rejection of papers that meet objective evaluation criteria, but contain negative results, may be an outcome that rarely occurs in actual practice. The point is investigators seem to act as if they have identified a single or small set of measurable characteristics of the reviewers contributing to experimental effects, when the need to maintain the conviction regarding potency of these selected variables may be greater than the evidence to support it. In short, we are still left with the questions: "What process do reviewers undertake when they perceive information and selectively weigh its importance against arbitrary evaluation criteria?" and, "Which reviewer variables are associated with different review outcomes?" Answers to these questions pertain to independent variables and, as such, would serve to broaden our understanding of what leads to high or low levels of reliability or differential outcomes by discipline subspecialty. Examining process variables may not be the answer, but peer review, by virtue of being a classification system, involves a process or activity leading to an outcome or decision. To date, the study of process variables in peer review has been largely neglected. Kuhn (1962) reminds us that, in a preparadigmatic state, the "real" solution in any field cannot be negotiated by a representative panel of experts. Cicchetti outlines historical constraints operating within the system of peer review and invites us to break away with some concrete recommendations for change. An additional recommendation should be to examine process variables believed to be associated with levels of expected agreement. The "black box" remains as long as creative efforts to examine and improve the system of peer review are neglected. Is unreliability In peer review harmful? Henry L. Roediger III Department of Psychology, Rice University, Houston, TX 77251-1892 Electronic mall: roedige(curicevm 1 .rice.edu Cicchetti's target article provides an excellent analysis of studies assessing the reliability of peer review in journal and conference submissions and grant proposals. Even the best studies show modest levels of reliability, a fact decried by many who see arbitrariness in the peer review system. The underlying assumption behind the gloom that studies of peer review cast is that the publication (or granting) process would somehow be more accurate and fairer if the reliabilities involved in peer review were improved, say to .70 or .80. To me, this state of affairs seems unlikely to occur under any realistic set of conditions. Furthermore, I remain unconvinced that it would even be desirable, in the long run, for the scientific enterprise, even though it might make life easier for editors and grant administrators. Below I will provide underpinnings for these opinions. Cognitive psychologists have long been interested in the processes involved in judgment and decision making in complex realms (e.g., hiring decisions, picking stocks, making clinical diagnoses). The literature is replete with findings of poor reliability and validity of human judgments when people, even experts in a field, are faced with complex, multiattribute decisions (e.g., Kahneman et al. 1982; Nisbett & Ross 1980). Given this backdrop, a finding of high reliability in peer review judgments would come as a surprise. One reason for unreliability in peer review that may not pertain to other areas of judgment concerns how reviewers are selected by editors (see Roediger 1987). I spent five years as editor (and another three as associate editor) of a journal referred to by Cicchetti as a "specific focus journal" (thcJournal of Experimental Psychology, Learning, Memory, and Cognition). Although perhaps specific in some sense, the topics under consideration seemed broad enough to me: reading, attending, learning, remembering, decision making, judging, problem solving, categorizing, perceptual-motor skill learning, and other topics. As editor, I would skim each submission to assign reviewers. A common scenario would be as follows: The authors of the paper would be examining a particular theory or line of thought about some phenomenon, or they would be contrasting two or more viewpoints. Based on a series of several experiments, they would usually reach some conclusion on the phenomenon in question. As editor, I would try to pick reviewers who would come at the paper from different viewpoints. If the authors eventually concluded that their results supported Theory X, then usually I would have someone associated with Theory X as one reviewer, and someone associated with Theory Y (or some other approach) as another reviewer. If the paper had some fatal flaw (poor reasoning, improper methods, inappropriate statistics, inconsistent results across experiments), both reviewers would probably argue against publication. This is just what Cicchetti shows: Peer reviews are quite consistent on flawed papers. But suppose the paper did not suffer from any obvious flaws. A typical (but not universal) pattern for such a paper supporting Theory X would be for another proponent of Theory X to evaluate the paper positively, whereas a "Theory Y reviewer" might recommend against publication, suggesting further research. As Cicchetti notes, the reviewers may not even disagree on their assessments of the facts, but rather of the weightings given to them. Of course, these "unreliable" judgments seem perfectly sensible to anyone editing a journal. Further, both reviewers are often right, in the sense that most papers (excluding the truly bad ones weeded out by peer review) have some merits and some demerits to which reviewers can point. If this scenario is representative, then some unreliability in the peer review system may be occasioned by editors seeking the advice of experts with varying points of view on the topic at issue. This process may occasion unreliability of peer judgment, but probably provides better information to the editor and the authors. If this is one cause of reviewer unreliability, then one way to enhance reliability would be for editors to try to identify reviewers who had in the past consistently agreed or disagreed with the position argued by the author in the manuscript under review and to send the paper only to like-minded reviewers. I assume no one would seriously argue for this proposal, which shows the danger of emphasizing reviewer reliability at the cost of other considerations (such as providing a variety of perspectives). Finally, consider the neglected issue of the validity of peer review. Can scientists really predict accurately which manuscripts or grant proposals will lead to surer progress in the field? Can any reviewer validly discriminate the top 20% of the papers or proposals from the next 20%, which is often the task in the behavioral sciences with their high rejection rates? Given that peer judgments are unreliable, asking questions about validity is even more hazardous, especially since there is likely to be disagreement about the criterion variable. For example, suppose that reviewers or editors were asked to predict the number of cumulative citations over a 10-year period for papers accepted for publication. Would the resulting correlations between predicted and actual citations even approach the modest .2 to .3 we have come to expect from peer review studies? I doubt it. My skepticism about the outcome of such a study is based in part on informal observations of colleagues discussing controversial papers that have been published and have then shaped the direction of my field (cognitive psychology). Often, years later, one will still hear debates about the original paper, whether or not it should have been accepted, and whether the resulting approach has been worthwhile or a blind alley. If scientists cannot agree, even in retrospect, that heavily cited and important papers were indeed worthy, then what hope do we have of deciding such matters a priori? (See Roediger, 1987,

Thumbnails

Contents