Braun Tibor, Schubert András (szerk.): Szakértői bírálat (peer review) a tudományos kutatásban : Válogatott tanulmányok a téma szakirodalmából (A MTAK Informatikai És Tudományelemzési Sorozata 7., 1993)
DOMENIC V. CLCCHETTI: The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation
84 CICHETTI: THE RELIABII .ITY OF PEER REVIEW When items or ratings by different reviewers are combined, reliability systematically increases. Applying the SpearmanBrown prophecy formula (Gulliksen 1987) to a reliability of .30 for the ratings of 1 reviewer, the estimated reliability is .46 for the combined ratings of 2 reviewers, .56 for the ratings of 3, .77 for the ratings of 7, and so on. The use of three reviewers is increasingly common, and the reliability in this case, though less than ideal, is substantially better than that for a single referee. 2. Disagreement among reviewers is useful. Journal editors often select reviewers deliberately for their dissimilarity. Reviewers may be chosen because they differ in the nature of their expertise - one a substantive specialist, another a methodologist, for example - or in their theoretical viewpoints (e.g., Bakanic et al. 1987). It is not surprising then that reviewers disagree when assessing the same apparently carefully defined components, such as "importance," or "design and analysis." And, insofar as the reviewers attend to different aspects of a manuscript's acceptability, the validity of the combined evaluations is improved (e.g., Hamad 1985), just as using predictors that measure different portions of the criterion variance maximizes the multiple correlation of the predictors with the criterion. 3. Editorial decisions should not be based solely on reviewers' ratings. Concern with the agreement among reviewers seems linked in large part to a model of the journal editor as a kind of psychometric clerk who simply adds up the scores that a manuscript gets from each reviewer and then accepts the paper if it achieves a passing score. Numerous anecdotes (e.g. , Goodstein 1982), as well as the close associations reported by Cicchetti and others between reviewers' recommendations and editors' decisions suggest that this model may indeed describe the behavior of s ne editors. But good editors are not clerks. They read the manuscript, appraise the reasons reviewers give for their recommendations, and weigh all the information about it (e.g., Goodstein 1982). This kind of active decision making takes time and specialized knowledge, and may be too much of a burden for the sole editor of a journal that receives many submissions. The workload can always be divided up among a set of associate editors, however, each of whom has complete responsibility for processing papers in a particular area, a practice followed by the Personality and Social Psychology Bulletin and some other journals. NOTE 1. The anomalous and unreplicable intraclass correlation of .54 for American Psychologist manuscripts (Cicchetti 1980, and unpublished; Scarr & Weber 1978) may arise because nine of the 87 papers were resubmissions. The nine were presumably revised in response to the initial reviews, making it likely that these papers would receive favorable ratings, especially if the original referees were used. When these manuscripts are excluded, the correlation drops to .45. Chairman's action: The importance of executive decisions in peer review Peter Tyrer St. Charles Hospital. London W10 6DZ, England Cicchetti has provided valuable data in support of a maxim I sometimes repeat to disconsolate writers of rejected manuscripts: "A determined author can get any rubbish published." The low levels of reviewer agreement found in this wide-ranging review may be regarded as unsatisfactory by some, but encouraging to potential authors. After noting that worthlessness appears easier to detect than excellence, our author-in-waiting must be reassured by levels of agreement between assessors that barely exceed those of chance when several referees are used. Even taking into account the omnibus quality of the R, and kappa statistics, which can obviously conceal islands of excellent agreement, the levels of agreement cannot be regarded as good by any scale of values. Bearing in mind that research careers and the funding of departments depend so much on peer review of scientific papers and grant applications, it is sad that apparently random factors play such a major part in success. The target article also explodes the myth that papers concerned with the "hard" physical sciences are assessed with greater levels of agreement than papers of "soft" social and psychological subjects. The reasons for poor agreement, to paraphrase Shakespeare's Cassius, "lie not in the words but in ourselves, that we are underlings." The editor of a scientific journal and the chairman of a grantgiving body are faced with much conflicting information before coming to a final judgment. Cicchetti outlines a number of ways of improving the reliability of peer review, but even if levels of agreement are improved, the position of the editor (or chairman) can be a very important one. Disagreement is often resolved by the taking of "chairman's action," whereby an executive decision is made to reject one or more of the views referees used in coming to a decision, or, alternatively, to send the manuscript (or application) to another referee independently. It needs to be appreciated that the editor usually has a completely free hand in choosing referees for any article. The bias of the editor can influence whether an article he would like to see published goes to a reviewer who is likely to provide a favourable report. Alternatively, a paper the editor does not want published can go to a tough and critical referee. The opinion of the editor is particularly important when contentious papers are being reviewed. The vagaries of editorial and referee judgment are particularly important for a young worker on the threshold of a research career. In this vulnerable stage there is a danger that one or two rejections may mean the abandonment of a research endeavour, when a more hardened worker would be inclined to persist. To make allowances for the poor levels of agreement, and the importance of editorial interest and bias, it would be valuable for potential authors to be aware of the particular interests of the journals for which they are writing. For example, one important journal in the United Kingdom not only tries to write the first and last words about any relevant topic, but is prepared to take considerable risks in trying to achieve this. Another journal will bend over backwards to include topical material in its columns, and so the time of submission is all-important. Others, from the examination of their contents, consistently give proportionately much more space to one or two aspects of a subject even though there is no indication of this in the guidance to contributors. For example, for controversial issues such as the merits and disadvantages of community care in psychiatry, one well-known journal has a bias toward its merits and another toward its disadvantages. Only an informed author knows which to select first. The fact that many papers rejected by one journal are subsequently published in other equally prestigious journals (Wilson 1978) suggests that bias of interest is much more potent than assessment of merit. In view of this, much more attention should be paid to giving guidance to potential authors, particularly young research workers, in preparing their manuscripts and choosing the appropriate journal for their submission (Freeman & Tyrer 1989). One other implication of the findings, which is also a major criticism of peer review, is the low likelihood of approval for papers and grant submissions concerned with "ground-breaking" research. Cicchetti provides data suggesting that agreement about such submissions is likely to be poor and that the "safe" option of rejection is most likely. In such circumstances the whole policy of peer review appears stultifying, and one sometimes longs for the good old undemocratic days when the publication of a paper was dependent only on the editor's whim or on the cranky beliefs of millionaire philanthropists. Such