Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993) | Könyvtár

Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)

DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation

49 CICHETTI: THE RELIABII .ITY OF PEER REVIEW experimental studies (Abramowitz et al. 1975, and Cicchetti & Conn 1976). The bias against manuscripts reporting negative findings is consistent with the earlier work of Bozarth and Roberts (1972); Hunt (1975); Kerr et al. (1977); Reid et al. (1981); Rowney and Zenisek (1980); Smart (1964); and Sterling (1959). The related issue of bias against replication studies is still being debated in more recent literature (e.g., Bernstein 1984; Casrud 1984; Furchtgott 1984; Garber 1984; Heskin 1984; Sommer & Sommer 1984). With few exceptions (e.g., Rourke & Costa 1979), the apparent bias against replication studies is very strong (on the part of both reviewers and editors). With respect to the testing of major theories or hypotheses in a given field of scientific inquiry, one would be most concerned about the literature being glutted with Type I errors, that is, rejecting the null hypothesis (that there are no statistically significant differences) when the hypothesis is true (e.g., see Greenwald 1975; and most recently, Soper et al. 1988). A successful strategy has been simply to build the replication study into the first part of the research design, followed by the main study. Although referees and editors, in our experience, seem willing to accept replication studies embedded in an overall research design, they are quite unwilling to accept them alone. (For recent empirical data underscoring the vital need for replications in the examination of dominant theories or hypotheses, see again, Soper et al. 1988.) Finally, in a qualitative evaluation of reviewers' comments, Mahoney noted the wide variability in responses. When examining the comments in isolation, he noted, "one would hardly think that very similar or even identical manuscripts were being evaluated" (Mahoney 1977, p. 171). In conclusion, the results of Mahoney's experiment indicate a strong reviewer bias against both negative and mixed results, with an opposite bias in favor of manuscripts reporting positive results. Mahoney describes this phenomenon as confirmatory bias or the tendency to evaluate positively those results that are consistent with one's own beliefs and to evaluate negatively those that are inconsistent with them. (See also Beck 1976; Goodstein & Brazis 1970; and, most recently, Greenwald et al., 1986, for a critical discussion of the broader corpus of literature in which confirmatory bias and other theoretical biases are seen as obstructing scientific progress.) In a second experimental study by Mahoney et al. (1978), 68 volunteer referees for two behavioristic psychology journals were sent experimental manuscripts that were identical in content, except that half the referees were randomly assigned manuscripts in which the alleged authors supported their arguments by citing their "in press" publications. The remaining referees received manuscripts in which "self-citation" was not used by the fictitious author. In addition, half the manuscripts in each group were given a prestigious author affiliation, while the remainder were described as having come from a "relatively unknown college." Referees were again asked to rate the manuscript using various evaluative criteria and to provide a summary recommendation concerning the article's publishability potential ("accept," "accept with minor revisions," "accept with major revisions," or "reject"). Statistically significant results (p < .05) indicated that articles in which the fictitious author provided self-citations were rated as more innovative and publishable than those in which no self-references were cited; institutional prestige, whether high or low, bore no significant relationship to either the reviewers' evaluation of the manuscript's normative attributes or to the reviewers' summary recommendations. Mahoney and colleagues note what may have been an unintended flaw in the design of the study however, namely, "the fact that none of the four institutions was known to specialize in behavioristic psychology so that - from the reviewer's perspective - there may have been little perceived variation in 'relevant' prestige" (Mahoney et al. 1978, p. 70). Despite this possible shortcoming, Mahoney's experimental research on peer review can still be appropriately described by the double entendre "rare," but "weil done." How do the Mahoney studies help us understand the low levels of reviewer agreement in the evaluation of scientific merit? Earlier (sect. 5.2), we noted that the low levels of reviewer agreement were difficult to interpret because we could not determine how much of the unreliability was due to differences in such important variables as the reviewers themselves (e.g., harsh vs. lenient critic), the manuscripts rated (e.g., some manuscripts were technically or otherwise more difficult to review than others), or the availability of author identity and affiliations (some journals use blind reviews, others do not). Because such variables were controlled in the Mahoney experiments, the low levels of reliability that were reported earlier are easier to accept now as probably nonartifactual. In summary, on the basis of the best controlled studies of the peer-review process to date, we are forced to conclude that referees do at times apply subjective criteria, which cannot be described as "fair," "careful," "tactful," or "constructive," despite the fact that such traits are widely accepted as desirable characteristics of referees (e.g., Gordon 1977; Hall 1979; Jones 1974; Lindsey 1978; Merton 1973). The clearest instance of this phenomenon was that manuscripts were likely to be accepted or rejected on the basis of whether the findings were positive, negative or mixed, rather than on the basis of their worthiness. Such subjective considerations, when they affect one reviewer, or both, may have a negative influence on both the reliability and validity of the peerreview process. Somewhat paradoxically, the consistent application of the same biased criterion (say, a preference for positive findings) to a given set of manuscripts would inflate the reliability of the peer-review process, while potentially compromising its validity (i.e., falsely assuming that positive results are always more worthy of publication than negative ones). 6.2. Further reasons (or the low reliability of peer reviews. As we have seen, the list of subjective criteria detected by the better controlled manuscript-review studies includes the extent of "confirmatory bias," "self-citation" bias, and "prestige of author and affiliation" bias. Although many will argue that better research emanates from more prestigious institutions, the categorical acceptance of such research, coupled with a summary rejection of research produced at less prestigious institutions, will build an inevitable bias into the peer-review process. Although comparable quasi-experimental or experi-

Oldalképek

Tartalom