Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)

DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation

73 CICHETTI: THE RELIABII .ITY OF PEER REVIEW submission. Reviewers are not selected to reproduce each other's results, but to supplement and complement each other. This is particularly true for submissions that are interdisciplin­ary. For example, a study reporting the results of a randomized clinical trial investigating the efficacy of an educational interven­tion for low birth weight, premature infants for enhancing cognitive development in the first three years of life, might well require review by a biostatistician, a neonatalogist, a pediatri­cian, an educator, and a psychologist, to fully and fairly assess its quality. By soliciting reviews from such varied professional fields, editors are acting in a way that might minimize re­producibility of the results. Are they wrong? I think not. A simple-minded illustration: Suppose x t repre­sents the true scientific quality of submission i (sampled from those sent to a particular journal or agency), and x ( J, the assess­ment of reviewer j (sampled from competent reviewers) of submission i where: X w - x, + e„. (1) with gy representing the error of reviewer /s evaluation of submission i, that portion of the reviewer's assessment that is independent of the quality of the submission (which includes bias and other such errors, not all of which are random). The validity of a reviewers' assessments for the scientific quality may be assessed by: Correlation (Xy, x,) - (r) 1*, where r = Variance(x|)/[Variance(X|) + Variance^)], (3) and the reliability between reviewers by: Correlation(Xy, X il t) - r+(l-r)t, j*k, (4) where t is the correlation coefficient between reviewers' errors made for a submission. If the errors are completely indepen­dent, then the reliability and the validity are closely related (r and rAt the other extreme, if the errors are perfectly correlated (t=l), the reliability may be perfect, but the validity may well be near zero. Now suppose we were to select a panel of R qualified re­viewers and to use their mean as the assessment of quality. Then the validity of this mean as a measure of true quality would be: Correlation (X„ x.) - (Rr/{Rr+(l-rXl+(R-l)t)}] 1/ 2. (5) One can see (Equation 5) that if t - 1, the reliability may be perfect (Equation 4), but there is no improvement in validity gained by soliciting more than one reviewer, and that validity may be near zero. The lower the correlation of errors (i), the lower the reliability may be (Equation 4), but ifit is nonzero (r > 0), the greater the increase in validity may be with each addi­tional reviewer (Equation 5). Maximal validity is obtained when the errors are independent, and one has as many reasonably reliable reviewers as possible (cf sects. 7.1,7.2). Consequently, if editors do indeed deliberately select multi­ple reviewers to cover the various professional fields relevant to a submission, they are thereby probably minimizing the correla­tion of errors, thus maximizing the validity of the overall assess­ment, but thereby possibly decreasing the interreviewer relia­bility as well. With that in mind, do we really want more reliable reviewers? Perhaps those reliabilities reported in the .2-.4 range are of no concern. Not so, for the goal is to improve the validity of the review process. To the extent that this goal can be achieved by improving the reliability of individual reviewers, yes, we do want more reliable reviewers. The strategies dis­cussed in section 7 should, howeVer, be assessed and amplified with specific strategic goals in mind: (1) To increase reliability by increasing the sensitivity of reviewers to the differential quality of submissions (increase Variance x ()). For example, both reviewers who commend everything (sect. 7.6) and those who condemn everything should be removed from the review process. Reviewers with "blind spots" should excuse themselves from reviews in that area. My own "blind spots" include applications of Lisrel mod­els. I have yet to see one whose scientific quality I have not questioned. Respected colleagues may have no problems in this area but might have troubles with meta analyses or quasi­experimental or observational studies (sect. 3.1), areas in which I believe I am able to distinguish "good" from "bad." (2) To increase reliability by decreasing reliance on factors irrelevant to the quality of the submissions (decrease Variance (e^)). Double-blinding, despite its weaknesses, remains the prime strategy here. Thus I agree with section 7.3 and disagree with section 7.4. Any reviewers can choose to reveal their identities to the author at any time. It need not be made a formal part of the review process. Use of multiple reviewers (sect. 7.2) also serves this purpose. Finally, in journal review, one might add strategies already common in grant review. No editor or reviewer from the same institution as the submittors should participate in the review process. Reviewers or editors should excuse themselves from the review of submissions of close personal friends or frequent professional collaborators, or from any other situation in which there might be an appearance of a conflict of interest. (3) To increase validity but to decrease apparent reliability, by decreasing the correlation of errors (decrease t). No two reviewers from the same institution or who are close collab­orators should review the same submission. Effort should be made to select reviewers across the broadest possible spectrum of specialties pertinent to the submission. Ultimately, however, strategies to improve the review pro­cess focused on individual reviewers are not, I think, likely to optimize it. Again, let me propose a simple-minded illustration: Classify submissions as either Flawed or Nonflawed, and char­acterize the review process as in Table 1. There are two possible errors, that of accepting a flawed submission (impaired sen­sitivity to flaws) and rejecting a nonflawed submission (impaired specificity to flaws). I would argue that the Type I error, that is, the more serious error, is that of accepting a flawed paper, for such papers can mislead an entire field and may delay or derail progress. If the flaw is later detected and revealed, such a paper is an embarrassment to the authors, as well as to those who recommended acceptance. For a flawed grant proposal, time and money are wasted that might have been better invested elsewhere. On the other hand, rejecting a nonflawed submis­sion (Type II error, I propose) frequently means only a delay, an annoyance to the submittor. In the long run, many such papers are published elsewhere (sect. 8); many such proposals are resubmitted and funded later. The kappa coefficient used here (the so-called "unweighted" form) places equal weight on the Type I and II errors, whereas, on the basis of the argument above, I would prefer a form that Table 1 (Kraemer). A model for the evaluation of the probabilities describing the review process. Decision of Review Process Accepted Rejected Submissions Flawed P(SE) P(l-SE) P Nonflawed (1-PX1-SP) (l-P)SP (1-P) Q U-Q) L P is determined by submissions to the journal or agency. Q is determined by resources of the journal, (space) or agency (fund­ing). SE represents the sensitivity of the review process to flaws in submissions, SP, the specificity.

Next

/
Oldalképek
Tartalom