[Skip to Content]
Sign In
Individual Sign In
Create an Account
Institutional Sign In
OpenAthens Shibboleth
Purchase Options:
[Skip to Content Landing]
Peer Review
June 5, 2002

Measuring the Quality of Editorial Peer Review

Author Affiliations

Author Affiliations: Health Reviews Ltd, Rome, Italy (Dr Jefferson); Sideview, Princes Risborough, England (Ms Wager); Annals of Internal Medicine, Philadelphia, Pa (Dr Davidoff).

JAMA. 2002;287(21):2786-2790. doi:10.1001/jama.287.21.2786

Context The quality of a process can only be tested against its agreed objectives. Editorial peer-review is widely used, yet there appears to be little agreement about how to measure its effects or processes.

Methods To identify outcome measures used to assess editorial peer review as performed by biomedical journals, we analyzed studies identified from 2 systematic reviews that measured the effects of editorial peer review on the quality of the output (ie, published articles) or of the process itself (eg, reviewers' comments).

Results Ten studies used a variety of instruments to assess the quality of articles that had undergone peer review. Only 1, nonrandomized study compared the quality of articles published in peer-reviewed and non–peer-reviewed journals. The others measured the effects of variations in the peer-review process or used a before-and-after design to measure the effects of standard peer review on accepted articles. Eighteen studies measured the quality of reviewers' reports under different conditions such as blinding or after training. One study compared the time and cost of different review processes.

Conclusions Until we have properly defined the objectives of peer-review, it will remain almost impossible to assess or improve its effectiveness. The research needed to understand the broader effects of peer review poses many methodologic problems and would require the cooperation of many parts of the scientific community.

A fundamental tenet of all scientific and scholarly work is that every aspect of it must be subjected to critical appraisal; only those findings and principles that withstand such appraisal become established. Although much appraisal occurs as work is in progress (and some after it has been published), work that is submitted for publication undergoes critical appraisal, known as peer review, as part of the editorial process.

Editorial peer review is therefore an extension of the basic principles of science and scholarship. It has existed for more than 200 years1 and has achieved near universal application for assessing research reports before publication. Despite its wide acceptance, peer review has been subjected to a variety of criticisms,2 and, indeed, surprisingly little is known about its effects on the quality and utility of published information,3 much less about its beneficial or adverse social, psychological, or financial effects.

The same can be said about critical appraisal in scholarly work generally. However, uncertainty about the effects of peer review is not simply a matter for academic concern. Clinical decisions must be made on the best available evidence, usually systematic reviews and meta-analyses, but these can be misleading if they are based on invalid, incomplete, inaccurate, or duplicate information, or if the review articles themselves are poorly done. Any process affecting the assessment and dissemination of clinical evidence therefore has a direct bearing on patient care.

In this article we review the criteria used by others to measure the effects of peer review, consider what this implies about the aims of peer review, especially in relation to clinical evidence, and suggest ways in which its effects might be measured more rigorously.


In 2 systematic reviews of the effects of editorial review and technical editing, we identified published articles that evaluated the peer review process and identified the criteria used in those studies to evaluate peer review. Our first review considered processes that occur between submission of a paper and a decision on publication; the second considered the processes that occur between acceptance and publication. Both systematic reviews were performed using Cochrane methodology. The methods and primary findings of the reviews are published elsewhere.36


We included 19 studies in our systematic review of the effects of peer review; these are described separately.3 Two studies were identified from our review on technical editing since they included information about changes that occur to papers between submission and acceptance or did not distinguish the preacceptance and postacceptance processes.5 We identified 8 other studies that measured the quality of papers or reviews but did not compare peer-review processes. The outcome measures used in these studies are shown in online Table A (available in PDF format ).735 Brief descriptions of the 8 studies not described in companion papers are also shown.

Ten studies measured various aspects of the quality of papers that had undergone peer review. Only one study8 compared the quality of articles published in peer-reviewed and non–peer-reviewed journals, but it used a nonrandomized design and the findings may have been confounded by other factors, such as differences in the quality of studies submitted to the different journals. The other studies measured the effects of variations in the peer-review process or used a before-and-after design to measure the effects of standard peer review in a particular journal. A major limitation of most studies is that they assessed the quality only of accepted papers, and measured the changes that took place between submission or acceptance and publication. Only the studies of economic submissions12 and statistical quality9 included papers that were rejected by the target journal. Of the 10 studies, only 213,14 used journal readers to assess the quality of papers, the others were based on editors' assessment. Virtually every study used its own rating instrument. These included between 7 and 36 items rated using 2- to 10-point scales. Most scales appeared to be unvalidated but, in 1 case when the scoring system was tested, it was found to have low reliability.10 The 2 studies of readability7,15 used published scales that have not been validated for use in this setting.

Eighteen studies measured the quality of reviewers' reports under different conditions such as blinding or training.1734 Three of these included an assessment by the authors whose work had been reviewed.27,28,33 The others used editors to judge the quality of reviews. Instruments used to rate review quality ranged from 2- to 10-item scales, most were rated using a 2- to 5-point system, but 1 used a visual analog scale, and 1 used ratings from 1 to 100. One of these scales had a published validation.36 Four studies examined the amount of agreement among reviewers,30,31 between reviewers and editors19 or between reviewers and readers.27 One study compared the time and cost of different review processes.35

The aspects of reviews most commonly rated were those relating to the methodological soundness of the reviewed study, its importance, originality and presentation. Several studies also attempted to assess the tone or courteousness of the review. One study measured the number of errors that a review detected.25 Three considered the speed of review.11,23,35 Aspects of articles examined were more wide ranging, including quality assessments of each section (introduction, methods, results, and discussion) and also subjective measures of the article's relevance, overall quality, readability, and comprehensibility.


Analysis of published studies on editorial peer review reveals the diversity of study questions and end points. This suggests that peer review is expected to have a wide range of effects, that its true effects have not been determined, or that the aims of peer review have not been identified properly. Our review also showed that the term peer review is used to describe a number of processes, most commonly gathering opinions from external experts, but also review by in-house editors, and that it may not always be possible to make a clear distinction between peer review and technical editing.

Based on our reviews of studies and the larger literature of opinion about peer review, we suggest that its aims may be categorized as (1) selecting submissions for publication (by a particular journal) and rejecting those with irrelevant, trivial, weak, misleading, or potentially harmful content, and (2) improving the clarity, transparency, accuracy, and utility of the selected submissions.

The selection of submissions depends on assessment of their quality and how well they match the journal's scope and aims. The quality criteria may be categorized as the importance, relevance, usefulness, and methodological and ethical soundness of the research and the clarity, accuracy, and completeness of the report.

The main purpose of medical research is to improve health or the delivery of health care. If peer review is regarded as one stage in this process, it might be expected to have measurable effects on health status. However, outcomes such as this are difficult to assess because they are affected by numerous other factors. Surrogate outcomes, such as process measures, are much easier to assess, but may not provide a reliable measure of more meaningful indicators of success.

In Table 1 we summarize the possible effects of editorial peer review on the quality of reports of clinical research, provide definitions for these, and suggest indicators that could be used to assess them.

Table. Indicators of the Quality of the Output of Editorial Peer-Review of Clinical Studies and Methods to Assess Them
Table. Indicators of the Quality of the Output of Editorial Peer-Review of Clinical Studies and Methods to Assess Them
Image description not available.

Research so far has measured only certain aspects of peer review, largely focused on variations in the processes used rather than comparing the effects of peer review with those of other systems. Given the resources spent on peer review and the importance placed on it, this is unsatisfactory even though it may reflect the fact that no part of the process of scientific evaluation has been rigorously studied. Ironically, however, the fact that peer review is so well entrenched makes it harder to study, since scientists and editors may be unwilling to take part in randomized studies if they believe that the current system serves them well. How, therefore, should the scientific community proceed in its evaluation of peer review?

Ideally, this would be assessed by large-scale, long-term research into 2 cohorts of studies, randomized to undergo either peer review or an alternative method of assessment, such as random selection for publication. Given the complexity of factors at play a multivariate analysis may be necessary. However, researchers might not be prepared to accept such randomization, and knowledge of the trial could bias the results. It would be important to ensure that both groups of studies were of equal average quality—for example by examining submissions to a single journal. The follow-up period would have to be lengthy to allow for changes in health status or health care delivery to occur as a consequence of publication.

Another difficulty in studying the effects of peer review is that the quality components of a manuscript are often interlinked, and it is meaningless to study them in isolation. For example, a methodologically flawed study or incomplete report will detract from the publication's usefulness.

If the scientific community could agree on the objectives of peer review and collaborate in the assessment of its effects, we could start identifying some of the practices for which evidence of effect is better than that of controls. We propose that the following questions should be tested collaboratively across journal settings:

  • Does peer review identify submissions of higher quality than other selection methods (or chance)?

  • Does peer review improve the clarity, transparency, accuracy, and utility of published papers meaningfully beyond that of the submitted version?

This would involve assessing the quality of both published and unpublished submissions using well-validated instruments. We suggest that measures of quality should include the importance, relevance, usefulness, and methodological and ethical soundness of clinical studies. Such research would also involve tracking submissions between journals since submissions rejected by one journal often go on to be published by another.37

Although there is some evidence that peer review and editing improves articles between submission and publication, the effectiveness of its selection and filtering functions remains virtually untested.3,5 Yet, despite this lack of evidence, peer review is well established in most academic disciplines. It is therefore possible that peer review is retained for different reasons than those stated. For example, it may serve to protect journals' reputations or to provide acceptability for commercially-funded studies. Using unpaid reviewers probably reduces some aspects of the work of in-house editors, although it also carries administrative costs. Peer review is also so well established that it has become part of the system for assessing academic merit in appointments and promotions. The broader functions of peer review, including its social and psychological effects such as increasing the credibility and prestige of published work, are rarely acknowledged and have not, to our knowledge, ever been seriously studied.


Given the widespread use of peer review, it is surprising that so little is known of its aims or effects although the same might be said of several other, well-established processes of scientific appraisal. The financial costs of peer review to the scientific community are difficult to estimate but should not be ignored.38 There is also anecdotal evidence that peer review has shortcomings and may even have harmful effects.2 Yet, until we have properly defined the aims of peer review it will remain almost impossible to estimate the effectiveness of the process or to improve it systematically.

The research needed to evaluate the effects of peer review poses many methodological problems and would require the cooperation of large numbers of authors and editors. The growth of electronic publishing has increased the urgency of establishing an effective and efficient system for evaluating scientific information but may also offer opportunities to explore alternatives to the current peer-review system.39 Until such research is undertaken, the ability of peer review to improve the quality of published research and, ultimately, improve the dissemination of reliable health information will remain uncertain.

Kronick DA. Peer-review in 18th-century scientific journalism.  JAMA.1990;263:1321-1322.Google Scholar
Wager E, Jefferson T. The shortcomings of peer review.  Learned Publishing.2001;14:257-263.Google Scholar
Jefferson T, Alderson P, Wager E, Davidoff F. Effects of editorial peer review: a systematic review.  JAMA.2002;287:2784-2786.Google Scholar
Alderson P, Davidoff F, Jefferson TO, Wager E. Editorial peer review for improving the quality of reports of biomedical studies [protocol for Cochrane Methodology Review on CD ROM]. Oxford, England: Cochrane Library, Update Software; 2001;issue 3.
Wager E, Middleton P. Technical editing in biomedical journals.  JAMA.2002;287:2821-2824.Google Scholar
Wager E, Middleton P. Technical editing of research reports in biomedical journals [Protocol for a Cochrane Methodology Review on CD ROM]. Oxford, England: The Cochrane Library Update Software; 2001;issue 3.
Biddle C, Aker J. How does the peer review process influence AANA Journal article readability?  AANA J.1996;64:65-68.Google Scholar
Elvik R. Are road safety evaluation studies published in peer reviewed journals more valid than similar studies not published in peer reviewed journals?  Accid Anal Prev.1998;30:101-118.Google Scholar
Gardner MJ, Bond J. An exploratory study of statistical assessment of papers published in the British Medical Journal JAMA.1990;263:1355-1357.Google Scholar
Goodman SN, Berlin J, Fletcher SW, Fletcher RH. Manuscript quality before and after peer review and editing at Annals of Internal Medicine Ann Intern Med.1994;121:11-21.Google Scholar
Jadad AR, Cook DJ, Jones A, Klassen TP, Tugwell P, Moher M, Moher D. Methodology and reports of systematic reviews and meta-analyses: a comparison of Cochrane reviews with articles published in paper-based journals.  JAMA.1998;280:278-280.Google Scholar
Jefferson T, Smith R, Yee Y, Drummond M, Pratt M, Gale R. Evaluating the BMJ guidelines for economic submissions: prospective audit of economic submissions to BMJ and The Lancet JAMA.1998;280:275-277.Google Scholar
Justice AC, Berlin JA, Fletcher SW, Fletcher RH, Goodman SN. Do readers and peer reviewers agree on manuscript quality?  JAMA.1994;272:117-119.Google Scholar
Pierie JP, Walvoort HC, Overbeke AJ. Readers' evaluation of effect of peer review and editing on quality of articles in the Nederlands Tijdschrift voor Geneeskunde Lancet.1996;348:1480-1483.Google Scholar
Roberts JC, Fletcher RH, Fletcher SW. Effects of peer review and editing on the readability of articles published in Annals of Internal Medicine JAMA.1994;272:119-121.Google Scholar
Rochon PA, Gurwitz JH, Cheung CM, Hayes JA, Chalmers TC. Evaluating the quality of articles published in journal supplements compared with the quality of those published in the parent journal.  JAMA.1994;272:108-113.Google Scholar
Bingham CM, Higgins G, Coleman R, Van der Weyden MB. The Medical Journal of Australia Internet peer-review study.  Lancet.1998;352:441-445.Google Scholar
Blank RM. The effects of double-blind versus single-blind reviewing; experimental evidence from the American Economic Review Am Econ Rev.1991;81:1041-1067.Google Scholar
Callaham ML, Wears RL, Waeckerle JF. Effect of attendance at a training session on peer reviewer quality and performance.  Ann Emerg Med.1998;32(3 pt 1):318-322.Google Scholar
Das Sinha S, Sahni P, Nundy S. Does exchanging comments of Indian and non-Indian reviewers improve the quality of manuscript reviews?  Natl Med J India.1999;12:210-213.Google Scholar
Ernst E, Resch KL. Reviewer bias against the unconventional? a randomized double-blind study of peer review.  Complement Ther Med.1999;7:19-23.Google Scholar
Ernst E, Resch KL. Reviewer bias: a blinded experimental study.  J Lab Clin Med.1994;124:178-182.Google Scholar
Feurer ID, Becker GJ, Picus D, Ramirez E, Darcy MD, Hicks ME. Evaluating peer reviews: pilot testing of a grading instrument.  JAMA.1994;272:98-100.Google Scholar
Fisher M, Friedman SB, Strauss B. The effects of blinding on acceptance of research papers by peer review.  JAMA.1994;272:143-146. [published correction appears in JAMA. 1994;272:1170].Google Scholar
Godlee F, Gale CR, Martyn CN. Effect on the quality of peer review of blinding peer reviewers and asking them to sign their reports: a randomized control trial.  JAMA.1998;280:237-240.Google Scholar
Jadad AR, Moore RA, Carroll D.  et al.  Assessing the quality of reports of randomized clinical trials: is blinding necessary?  Control Clin Trials.1996;17:1-12.Google Scholar
Justice AC, Cho MK, Winker MA, Berlin JA, Rennie D. Does masking author identity improve peer review quality? a randomized controlled trial.  JAMA.1998;280:240-242.Google Scholar
McNutt RA, Evans AT, Fletcher RH, Fletcher SW. The effects of blinding on the quality of peer-review. a randomized trial.  JAMA.1990;263:1371-1376.Google Scholar
Nylenna M, Riis P, Karlsson Y. Multiple blinded reviews of the same two manuscripts: effects of referee characteristics and publication language.  JAMA.1994;272:149-151.Google Scholar
Oxman AD, Guyatt GH, Singer J.  et al.  Agreement among reviewers of review articles.  J Clin Epidemiol.1991;44:91-98.Google Scholar
Strayhorn Jr J, McDermott Jr JF, Tanguay P. An intervention to improve the reliability of manuscript reviews for the Journal of the American Academy of Child and Adolescent Psychiatry Am J Psychiatry.1993;150:947-952.Google Scholar
van Rooyen S, Godlee F, Evans S, Smith R, Black N. Effect of blinding and unmasking on the quality of peer review: a randomized trial.  JAMA.1998;280:234-237.Google Scholar
van Rooyen S, Godlee F, Evans S, Black N, Smith R. Effect of open peer review on quality of reviews and on reviewers' recommendations: a randomised trial.  BMJ.1999;318:23-27.Google Scholar
Walsh E, Rooney M, Appleby L, Wilkinson G. Open peer review: a randomised controlled trial.  Br J Psychiatry.2000;176:47-51.Google Scholar
Neuhauser D, Koran CJ. Calling Medical Care reviewers first: a randomized trial.  Med Care.1989;27:664-666.Google Scholar
van Rooyen S, Black N, Godlee F. Development of the review quality instrument (RQI) for assessing peer reviews of manuscripts.  J Clin Epidemiol.1999;52:625-629.Google Scholar
Ray J, Berkwits M, Davidoff F. The fate of manuscripts rejected by a general medical journal.  Am J Med.2000;109:131-135.Google Scholar
Donovan B. The truth about peer review.  Learned Publishing.1998;11:179-184.Google Scholar
Bingham C. Peer review on the internet: are there faster, fairer, more effective methods of peer review? In: Godlee F, Jefferson TO, eds. Peer Review in Health Sciences. London, England: BMJ Books; 1999;205-223.