Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)

DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation

68 CICHETTI: THE RELIABII .ITY OF PEER REVIEW rejected interdisciplinarians resubmit and are accepted else­where, while the specialists tend not to. On the surface, it looks as though the interdisciplinary way of doing things ensures that innovative scholarship is not lost, as opposed to the way of specialties, which are more likely to stifle any innovative im­pulse at the very start. That is not the end of the story, however. There is one more market to consider, namely, knowledge consumers (i.e., read­ing scientists) who must choose from among the variety of knowledge products the ones that are best suited to their cognitive needs. The interdisciplinary markets are flooded with more articles in more obscure journals than the specialty mar­kets are. Since the cognitive limitations of the consumer remain fixed as the number of markets and products grow, physical access is becoming an increasingly important determinant of which research turns out to be influential. Is it published in a journal that I routinely peruse? Is the journal copy readily available in the library? Does the article appear indexed in many databanks? The answers to these questions depend on issues quite incidental to the intellectual merits of a given article: Can I afford the journal, and does it publish other articles I normally find interesting? Is the current periodicals section properly policed and updated? Does the title of the article contain words that make the right associations with other words in the databank? All the best laid plans to reform peer review will have been for naught, if the high quality journal that publishes the high quality scholarship turns out to be low on physical access. My point, then, is that interdisciplinary research may give an illusory sense of preserving good scholarship simply because of its more liberal publication policies. This illusion is fostered by focusing on the editorial office as the only clearinghouse for knowledge products. It is not that interdisciplinarians do bad work, but rather that their work is so diffusely placed that access to such work, and hence its ultimate impact, is limited. Given the inaccessibility of some journals, the work might as well have never made it into print. This suggests some policy implications: 1. Editors should forge closer links to the library and infor­mation systems that will determine the access that potential consumers have to journals and books. 2. The goals of peer review should be oriented more to the interests of a given journal's readership. At the moment, when there is a conflict of aims, peer review aims more at publishing papers that exemplify the methodological standards of the jour­nal's discipline than papers that are likely to be taken up by the readership in their own research. 3. Regardless of whether one thinks that more scientists make for better science, the growing number of paper submis­sions may, at some point in the future, have to be checked by requiring that authors restrict the number of papers they pub­lish over a given period. (In fact, Donald Campbell has sug­gested on occasion that the use authors put to such self-restraint could be weighed in tenure and promotion decisions.) 4. Tighter control of the knowledge markets, including in­creased self-selection of paper submissions, could encourage specialization and rigidify disciplinary boundaries. Much de­pends here on whether journal editorial policies are dictated more by the character of the field of study, or whether the field of study comes to have its character in virtue of the journal editorial policies. This is one of many intriguing issues that Cicchetti leaves hanging. On forecasting validity and finessing reliability J. Barnard Gilmore Department of Psychology, University of Toronto, Toronto, Ontario, Canada, M5S 1A1 Electronic mell: gllmore@psych.utoronto.ca Reliability is of such great concern in judgments of scientific worth because validity is. Where there is no reliability, there can be no validity. Even where reliability is found, there could be, and often there appears to be, distressingly low validity. The issue facing both peer reviewers and those who engage them is whether or not validity had been achieved with a given set of ratings. These truths are familiar. They have been brought home to us many times in the past, as in the thoughtful work by Gottfredson (1978) and Mahoney (1985). Moreover, these are brutal truths, brutal, because achieving validity appears to be beyond all reasonable hope insofar as predicting the eventual importance of scientific work would require one to predict an unpredictable future. Let us have no illusions. The cherished arguments for generously supporting pure research, the arguments concern­ing the unpredictable sources of new scientific understanding, are the same arguments for doubting that we can forecast in advance which work needs financing and which work needs publication. Still, yes, choose we must. But nothing requires us to assert that we will have chosen well. And, nothing forces the conclusion that it would be the least bit foolish to make many of our choices by drawing lots. Consequently, the concern with improving the reliability of potentially invalid ratings made by multiple reviewers, and with improving the measures of whatever reliability we do have, must not be overemphasized. Cicchetti asserts, for example, that some statistics are appropriate for measuring reliability and some are not. Instead, one might assert that the appropriateness or the lack thereof is more often to be found in the meaning ascribed to the statistics rather than to the choice of the statistics themselves. A careful reading of Garner & McGill (1956) makes it clear that some important differences in meaning and in­terpretation are appropriate to measures that are variance­based, such as kappa, versus measures derived from information theory, which reflect the proportional shared uncertainty (mea­sured in bits) among raters. The "reliability" reflected in an uncertainty statistic reflects the proportion of our total uncer­tainty about the judgment of another rater, which will be reduced by the information contained in knowledge of an earlier rating. I would submit that this shared uncertainty index is closer to what we always intended to mean by "agreement" than is that percentage-of-total-variance index implicit in most kappas. There are sound metric reasons, too, for sometimes prefer­ring the uncertainty statistic to kappa. Garner and McGill remind us that variance-based statistics require an interval scale substrate to justify many of their interpretations, whereas un­certainty measures are always metric free, generalizable, and mutually comparable. The significant and sad fact is that most reliabilities measured with the shared uncertainty statistic turn out to be "lower" in relative size than those reported using kappas. (For a clear example of this, see Gilmore 1979.) Thus, the data presented in the target article may well deserve an interpretation that is even less optimistic than those marginally optimistic interpretations offered there. In the conclusion of the target article the authors note that one of two assumptions (see Harnad 1986) prevails. One may assume that most published research contributes to the advancement of scientific work, in which case the rejection by journals of what would otherwise have proven to be helpful new data or new perspectives is indeed a serious matter. Conversely, one may assume that most published research does not contribute to the

Next

/
Thumbnails
Contents