Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993) | Library

Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)

DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation

100 CICHETTI: THE RELIABII .ITY OF PEER REVIEW years back (Cicchetti & Eron 1979) were described in more detail in a subsequent BBS commentary (Cicchetti 1982). We found that although there were high correlations between what reviewers perceived as "important" and "well-designed" studies and their tendency to recommend publication (rs between .62 [research design] and .73 [importance]), the reliability of these ratings was appreciably lower (.19 and .28, respectively, as given in Table 1 of the target article). Although reviewers were asked to use specific rating forms describing "importance," "research design," and other manuscript attributes, it is entirely possible that they first read the manuscript, decided on their recommendation, and then filled out the form to be consistent with that recommendation (e.g., if one thinks the article is worth publishing then it must be important, well-enough designed, of appropriate reader interest). Regardless of what specific interpretations may be appropriate, the more formal training of reviewers would probably enable them to use the same set of specific evaluative criteria more consistently. Then the process that reviewers use to arrive at a recommendation will have become standard, reliable, and if applied appropriately (e.g., using prototypic reviews as standards), valid. 5. Concluding comments A number of commentators have suggested further investigations to place the area of peer review on an ever more solid scientific foundation. Given the interdisciplinary nature of science - my strongest appeal is that the crossdisciplinary approach taken in this target article should be further encouraged in future investigations. I would simply refer the interested reader to the specific commentaries of Bornstein, Cohen, Gorman, Hargens, Lock, Marsh & Ball, Nelson, and Salzinger. In my opinion, the training of reviewers, as well as editors, authors, and consumers of research, is pivotal in increasing both the reliability and the validity of peer review. I have recently come across the first study of which I am aware that broaches this topic directly. Oxman et al. (in press) were able to train successfully three classes of referees ("experts in research methodology," "MDs with research training," and "research assistants" - three in each group) to assess the overall scientific quality and other evaluation attributes of 36 review articles published in a wide range of journals in medicine (e.g., New England Journal of Medicine), psychiatry (e.g., American Journal of Psychiatry), and psychology (e.g., Psychological Bulletin): Following specific training (or practice) on review articles and an additional one hour training session, the 36 articles were evaluated independently by the nine reviewers. For level of "overall scientific quality," the intraclass R, across the nine examiners, was .71; R f values for each of the three groups of reviewers, separately, were, as follows: "Experts in research methodology" - R, = .77 (EXCELLENT); "MDs with research training" R, = .74 (GOOD); and "Research Assistants" - R, = .62 (GOOD). Nine additional evaluative attributes, were measured on 7-point ordinal scales with four anchorage points provided for the scoring of each attribute (e.g., see Cicchetti et al. 1987). The nine attributes and their average R, values, across the nine judges, concerned the extent to which: (1) search methods were reported (R, > .8, or EXCELLENT); (2) a comprehensive search of the literature was conducted (R, > .6 or GOOD); (3) inclusion criteria were reported (R ä > .8 or EXCELLENT); (4) selection biases were avoided (R, > .6, or GOOD); (5) validity criteria were reported (R, > .6, or GOOD); (6) validity data were reported (R, > .6, or GOOD); (7) findings were combined appropriately (Rj = .5 or FAIR; (8) methods for combining the data were reported (R, > .6, or GOOD); and (9) conclusions were supported by the data (R, = .40, or FAIR). Although there were somewhat lower levels of agreement among the research assistants than within the other two groups of reviewers, for eight of the 10 evaluations (items 1-7 and the overall evaluation of scientific quality) the differences in R, values were small. The average lower R, values of the remaining two attributes, however, were due to the very low Rj levels achieved by "research assistants" relative to the other two groups of evaluators. For rating the extent to which "the findings were combined appropriately," the R, for "experts" was at .6 (GOOD) and for "MDs with research training," it was > .9 (EXCELLENT). The corresponding R, for "research assistants," however, was in the very "POOR" range, at > .2. Similarly, the extent to which "conclusions were supported by the data," the R/s for "experts" and "MDs with research training" were beyond .6 (GOOD), whereas the corresponding R, for "research assistants" was again in the very POOR range, or barely beyond the . 1 level. These results to my knowledge are the first to demonstrate that reviewers of different levels of experience, can be taught to evaluate reliably the same scientific documents. It is hoped that additional investigations of this kind will be undertaken across a broad range of research topics both within and across disciplines. Following the lead of commentator Lock, I would also hope that the important issue of training peer reviewers will be discussed at the 1992 Second World Conference on Peer Review. Finally, in the open forum of "creative disagreement," I would extend a special invitation to those two commentators (Bailar and Kiesler), who were the most dubious about the need to study further the reliability and validity of the peer review process. I hope the panorama of ideas expressed by commentators across disciplines will convince them of the need to turn some of their own anecdotal experiences into further valuable research in this area. As editors of prestigious journals in behavioral science and medicine, their future insights and empirical investigations can make major contributions to the further understanding of the vicissitudes of peer review. NOTE 1. Author is also affiliated with Yale University. References Abelzon. P. H. (1960) Scientific communication. Scicnce 209:60-62. (aDVC) Abramowitz, S. I., Gomez, B. A Abmmowitz. C. V. (1975) Publish or politic:

Thumbnails

Contents