Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993) | Könyvtár

Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)

DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation

86 CICHETTI: THE RELIABII .ITY OF PEER REVIEW pulling out one ball. If the ball is white, the reviewer says yes; if black, the reviewers says no. This process gives a random agreement benchmark that can be compared with the actual data. This comparison reveals the following effects: The apparently greater rejection agreement over acceptance agreement is an illusion. Two outcomes support this conclusion when the data are examined in the metric of percentages: First, the difference between acceptance agreement and rejection agreement is lower in the real data (22%) than it is in the random benchmark (30%). Second, actually reading the proposals produces less of an improvement over benchmark in rejection agreement (11%) than in acceptance agreement (19%). Both effects are opposite to what one would expect from the target article narrative. The benchmark difference between acceptance and rejection agreements is a simple function of the acceptance rate. This can be seen by examining the generalized random result, as illustrated in Figure 3: The result is obtained by expressing the number of proposals as n, the typical peer reviewers' yes ratings as p (in proportions), and the reviewers' no ratings as q = 1 — p. Then the acceptance agreement is np 2/np = p, the rejection agreement is q, and the difference is q - p = I - 2p. These expressions show that acceptance agreement, of course, goes to zero as the proportion accepted goes to zero. At the same time, rejection agreement goes to 1.0 and the difference goes to 1.0. Equality of acceptance agreement and rejection agreement only occurs in the random benchmark for the special case when p = q = 0.5. A metric does exist in which one finds what intuition predicts, namely, a perfect equivalence between acceptance agreement and rejection agreement. One obtains this result if one avoids the slippery ground of percentages and proportions. Rather, one simply counts proposals. Doing this shows that, relative to the chance count in Figure 2, peer review in Figure 1 increases the count for both forms of agreement by exactly 10 proposals each. The general expression for the above increase in agreement count is given by 2npq<t>, where n, p, and q are as defined above, and <J> is the (fourfold point; McNemar 1955, p. 202) correlation between reviewers. (For the data of Table 6 and Figure 1, <f> = .29.) This expression is easily interpreted by reference to Figure n Acceptance Disagreement Rejection Agreement Agreement = P = q Dlflerence = 1 — 2p Figure 3 (Wasserman). Flow chart representing the generalized random-agreement benchmark. Note that agreement here is expressed in proportions, not in percentages. The number of items reviewed is n, the acceptance proportion of the average reviewer is p, and the rejection proportion is q = 1 — p. See text for detailed explanation. 3: There are two ways of reaching a disagreement and when <j> is zero, as it is in both Figures 2 and 3, each way produces a count of npq. If the correlation were perfect (<f> = 1.0), however, then no disagreements would exist. In that case, the acceptance agreement count would be given by np 2 + npq = np, and the rejection agreement count would be given by nq 2 + nqp = nq. This confirms the intuition about symmetry: In general, peer review increases each form of agreement by npq<\> counts. In the particular case of Figure 1, npqrfr = 150*0.35*0.65*0.29 = 10. The real peer reviewers described in Table 6 and Figure 1 do agree slightly more often than one would expect from the random benchmark in Figure 2. The effect is very slight, however: They only get together on 20 more proposals out of 150. By contrast, it will not escape notice that the random benchmark accounts for most of the variance in reviewer behavior. Hence, in this particular case at least, the benchmark is a fair model of a peer reviewer. What to do about peer review: is the cure worse than the disease? Thomas R. Zentall Department of Psychology. University of Kentucky. Lexington, KY 40506 Electronic mall: 2entall@ukcc.bitnet Peer review is among the most important professional services that scientists provide. It determines what research gets funded and what research gets published and in what journals. As Cicchetti so carefully documents, it is a system that is seriously flawed because of inherent subjectivity and reviewer bias. The question is, what changes can be made in the system to eliminate the flaws? Any major change in the peer review process is likely to create its own problems, perhaps even more serious ones. Furthermore, given that the review process depends on the voluntary contribution of reviewer time, one needs to weigh the potential benefits that might accrue from change against the costs involved. The two major issues raised by Cicchetti are, (a) the surprising, but well-documented, low reliability of grant and manuscript reviews, and (b) unfairness in the review process due to reviewer bias. Many of Cicchetti's suggestions address reviewer bias, but any variable that brings out a pervasive reviewer bias is likely to increase reliability, though perhaps at the expense of fairness. Thus, the issues of reviewer bias and fairness may be negatively correlated. Increasing fairness in the review process may be a valid goal, but can these biases be removed and, if so, at what cost? Blind review. Voluntary blinding defeats its purpose because those most likely to benefit from their reputation would be least likely to blind, and, as Cicchetti notes, mandatory blinding may be impossible to enforce. On the other hand, is it really unfair to include knowledge of the author's reputation in one's judgment of suitability for publication? Just as statistical tests address the question of reliability of findings, so too, the reputation of the author may provide indirect, supplementary information about the reliability of the findings (though clearly, the latter should be given less weight). Cicchetti also notes a related bias due to self-citations of "in press" research. Shouldn't the fact that related findings have gone through a (typically, stringent) review process argue for their increased reliability? Ideally, research findings should be able to stand on their own, but in reality experimental results are usually evaluated in the context of prior research, and inpress self citations are a part of that literature. Because of the inaccessibility of these papers, however, it is reasonable, albeit cumbersome, for editors to request that preprints of such citations be included with the manuscript to be reviewed.

Oldalképek

Tartalom