Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)
DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation
54 CICHETTI: THE RELIABII .ITY OF PEER REVIEW often without further revisions, and usually in journals as prestigious as the rejecting journals to which they were originally submitted. The situation seems to be very different in the major astrophysics and astronomy journals. Only about onethird of the rejected manuscripts are subsequently published in other journals (Abt 1988). As noted recently by Hargens (1990), this phenomenon is consistent with the notion that unlike social and medical scientists, astrophysicists and astronomers are more likely to conclude that their rejected work does not merit being published elsewhere. This may in turn reflect more agreement on evaluation standards in well-defined areas of the physical sciences than in either less well defined areas of the same discipline (e.g., general physics, cross disciplinary physics) or in the more general areas of medicine or behavioral science, that have been investigated to date. In considering the implications of such findings for peer review in general, one should probably be less concerned with the high rejection rates of general journals in social and behavioral science and medicine than with the overall increased rejection rates for grants submitted in all three areas. First, despite previous arguments to the contrary (e.g., Cole 1978; 1983), rejection rates for manuscripts both between journals and within the same journal have remained remarkably constant over time. Hargens (1988); and Hargens (1990) reports that the rejection rates for the late 1960s and the early 1980s for 30 leading U.S. journals in a wide range of disciplines were very highly correlated (Pearson R = . 94). Moreover, these rejection rates were not significantly associated with either changes in journal submission rates between the two time periods or with whether journals levied page charges. Our analysis of the rejection rates in the 10 subfields covered by the Physical Review Letters shows that between 1981 and 1986, all possible rankorderings of rejection rates (i.e., comparing each year's rankings with those of each remaining year) vary between .91 and .99. Considering both the relative ease with which authors in many areas of social and medical science succeed in publishing their previously rejected articles in other prestigious journals and the extent to which authors in the physical sciences choose not to do so (tending to regard their rejections as decisive), the focus of concern should be on the problems associated with the rather arbitrary rejection of grant submissions, in which the phenomenon cuts across various disciplines (e.g., physics, chemistry, economics) and may prevent or seriously delay the implementation of worthy research endeavors. 1 2 ACKNOWLEDGMENTS The authors gratefully acknowledge the extensive computer programming and data-analytic contributions of Robert Heavens, James Owen, and Lorraine Gambino. This research was supported by a Veterans Administration Merit Review Grant, MRIS 1416 (Dr. Cicchetti). NOTES •Senior Research Psychologist, Biostatistician, and Senior Research Scientist, West Haven VAMC and Yale University, 350 Campbell Ave., West Haven, CT 06516. 1. For a more detailed description of normative attributes and specific criteria for guiding referees and editors in the review of scientific manuscripts, see Bowen et al. (1972); Chase (1970); Cicchetti it Conn (1976); Cicchetti it Eron (1979); Cottfredson (1978); Creenwald (1976); Maher (1978); Scott (1974); Whitehurst (1983); and Wolff (1973). For corresponding information pertaining to grant reviews, see Allen (1960); Cole it Cole (1981; 1985); Cole et al. (1978); Mitroffit Chubin (1979); Noble (1974); and Wiener et al. (1977). 2. The general formula for the kappa or weighted kappa statistic is: Kappa,., - (PO - PC)/(1 - PC), in which: PO refers to the proportion of observed (or actual) rater (reviewer) agreement; PC refers to the proportion of agreement expected on the basis of chance alone; 1 - PC refers to the maximum possible difference between observed and chance agreement. The level of statistical significance of kappa is determined by dividing K by its standard error (s.e.) and referring the resulting Z value to a table of areas under the normal curve to determine the p value of kappa (e.g., a Z of kappa of 1.96 is statistically significant at the .05 level). The validity of this procedure was empirically demonstrated by Cicchetti (1981) and Cicchetti it Fleiss (1977). For weighting systems to be used with the kappa statistic, see Cicchetti (1976), Cicchetti et al. (1977); Cicchetti it Heavens (1979); Cicchetti (1978); Cicchetti it Sparrow (1981), and Heavens it Cicchetti (1978). 3. The formula for R liu s, , Il (, when the same set of raters(reviewers) evaluate each subject, also deriving from a one-way repeated measures, ANOVA can be defined as: MSS - MSE* , . , Rin.„d.i in = • ln whlc hMSS + (MSE'XR-1) + R(M S", MSE, ) N MSS = mean square between subjects; MSE = mean square error (or residual); MSR = mean square between raters (or reviewers); R = number of raters (reviewers); N = number of subjects (abstracts, manuscripts, grants). 4. The formula for the intraclass correlation coefficient (Rj), Model I when different sets of raters or reviewers evaluate each subject (e.g., abstract, manuscript, grant proposal) derives from a repeated-measures (e.g., across reviewers) analysis of variance (ANOVA) model, and can be defined as: = MSS - MSE * . , . , •{Model ., MSS + (MSS + (R-1XMSE*)]' I n MSS = mean square between subjects; MSE = mean square error; and R = the number of ratings (e.g. , reviews) per subject (e.g., abstract, manuscript, grant proposal). The level of statistical significance of a given R i(Mod e, „ value is determined by referring the quantity MSS (with its number of degrees of freedom [df]) by MSE (with its df) to a standard ANOVA table. •Note. For the R„ Mode l „case, MSE pools the variance associated with raters with the variance associated with residual. 5. The formulae for the R, for determining the reliability of dimensionally scaled data when the numbers and specific sets of examiners may vary at each assessment (e.g., in the usual peerreview process for evaluating grants), also derive from a one-way repeated measures ANOVA model and can be expressed as: r»IMO*M ...) = (MSS - mMSE)/(MSS + m(R 0 - 1)MSEJ, in which, MSS and MSE are defined as in R, (Modr i R 0 = the average number of raters per subject; m = N(R 0 — 1)/[NR 0 - 1)- 2]; N = the number of subjects (e.g., NSF grants). The level of statistical significance of R ttM<Mll „„ is determined by application of the formula F = MSS/MSE, which, with (N — 1) and (M — 1) degrees of freedom is referred to appropriate