Author Affiliations: Health Reviews Ltd, Rome, Italy (Dr Jefferson); Sideview, Princes Risborough, England (Ms Wager); Annals of Internal Medicine, Philadelphia, Pa (Dr Davidoff).
Context The quality of a process can only be tested against its agreed objectives.
Editorial peer-review is widely used, yet there appears to be little agreement
about how to measure its effects or processes.
Methods To identify outcome measures used to assess editorial peer review as
performed by biomedical journals, we analyzed studies identified from 2 systematic
reviews that measured the effects of editorial peer review on the quality
of the output (ie, published articles) or of the process itself (eg, reviewers'
Results Ten studies used a variety of instruments to assess the quality of articles
that had undergone peer review. Only 1, nonrandomized study compared the quality
of articles published in peer-reviewed and non–peer-reviewed journals.
The others measured the effects of variations in the peer-review process or
used a before-and-after design to measure the effects of standard peer review
on accepted articles. Eighteen studies measured the quality of reviewers'
reports under different conditions such as blinding or after training. One
study compared the time and cost of different review processes.
Conclusions Until we have properly defined the objectives of peer-review, it will
remain almost impossible to assess or improve its effectiveness. The research
needed to understand the broader effects of peer review poses many methodologic
problems and would require the cooperation of many parts of the scientific
A fundamental tenet of all scientific and scholarly work is that every
aspect of it must be subjected to critical appraisal; only those findings
and principles that withstand such appraisal become established. Although
much appraisal occurs as work is in progress (and some after it has been published),
work that is submitted for publication undergoes critical appraisal, known
as peer review, as part of the editorial process.
Editorial peer review is therefore an extension of the basic principles
of science and scholarship. It has existed for more than 200 years1 and has achieved near universal application for assessing
research reports before publication. Despite its wide acceptance, peer review
has been subjected to a variety of criticisms,2
and, indeed, surprisingly little is known about its effects on the quality
and utility of published information,3 much
less about its beneficial or adverse social, psychological, or financial effects.
The same can be said about critical appraisal in scholarly work generally.
However, uncertainty about the effects of peer review is not simply a matter
for academic concern. Clinical decisions must be made on the best available
evidence, usually systematic reviews and meta-analyses, but these can be misleading
if they are based on invalid, incomplete, inaccurate, or duplicate information,
or if the review articles themselves are poorly done. Any process affecting
the assessment and dissemination of clinical evidence therefore has a direct
bearing on patient care.
In this article we review the criteria used by others to measure the
effects of peer review, consider what this implies about the aims of peer
review, especially in relation to clinical evidence, and suggest ways in which
its effects might be measured more rigorously.
In 2 systematic reviews of the effects of editorial review and technical
editing, we identified published articles that evaluated the peer review process
and identified the criteria used in those studies to evaluate peer review.
Our first review considered processes that occur between submission of a paper
and a decision on publication; the second considered the processes that occur
between acceptance and publication. Both systematic reviews were performed
using Cochrane methodology. The methods and primary findings of the reviews
are published elsewhere.3- 6
We included 19 studies in our systematic review of the effects of peer
review; these are described separately.3 Two
studies were identified from our review on technical editing since they included
information about changes that occur to papers between submission and acceptance
or did not distinguish the preacceptance and postacceptance processes.5 We identified 8 other studies that measured the quality
of papers or reviews but did not compare peer-review processes. The outcome
measures used in these studies are shown in
online Table A (available in PDF format ).7- 35
Brief descriptions of the 8 studies not described in companion papers are
Ten studies measured various aspects of the quality of papers that had
undergone peer review. Only one study8 compared
the quality of articles published in peer-reviewed and non–peer-reviewed
journals, but it used a nonrandomized design and the findings may have been
confounded by other factors, such as differences in the quality of studies
submitted to the different journals. The other studies measured the effects
of variations in the peer-review process or used a before-and-after design
to measure the effects of standard peer review in a particular journal. A
major limitation of most studies is that they assessed the quality only of
accepted papers, and measured the changes that took place between submission
or acceptance and publication. Only the studies of economic submissions12 and statistical quality9
included papers that were rejected by the target journal. Of the 10 studies,
only 213,14 used journal readers
to assess the quality of papers, the others were based on editors' assessment.
Virtually every study used its own rating instrument. These included between
7 and 36 items rated using 2- to 10-point scales. Most scales appeared to
be unvalidated but, in 1 case when the scoring system was tested, it was found
to have low reliability.10 The 2 studies of
readability7,15 used published
scales that have not been validated for use in this setting.
Eighteen studies measured the quality of reviewers' reports under different
conditions such as blinding or training.17- 34
Three of these included an assessment by the authors whose work had been reviewed.27,28,33 The others used editors
to judge the quality of reviews. Instruments used to rate review quality ranged
from 2- to 10-item scales, most were rated using a 2- to 5-point system, but
1 used a visual analog scale, and 1 used ratings from 1 to 100. One of these
scales had a published validation.36 Four studies
examined the amount of agreement among reviewers,30,31
between reviewers and editors19 or between
reviewers and readers.27 One study compared
the time and cost of different review processes.35
The aspects of reviews most commonly rated were those relating to the
methodological soundness of the reviewed study, its importance, originality
and presentation. Several studies also attempted to assess the tone or courteousness
of the review. One study measured the number of errors that a review detected.25 Three considered the speed of review.11,23,35
Aspects of articles examined were more wide ranging, including quality assessments
of each section (introduction, methods, results, and discussion) and also
subjective measures of the article's relevance, overall quality, readability,
Analysis of published studies on editorial peer review reveals the diversity
of study questions and end points. This suggests that peer review is expected
to have a wide range of effects, that its true effects have not been determined,
or that the aims of peer review have not been identified properly. Our review
also showed that the term peer review is used to
describe a number of processes, most commonly gathering opinions from external
experts, but also review by in-house editors, and that it may not always be
possible to make a clear distinction between peer review and technical editing.
Based on our reviews of studies and the larger literature of opinion
about peer review, we suggest that its aims may be categorized as (1) selecting
submissions for publication (by a particular journal) and rejecting those
with irrelevant, trivial, weak, misleading, or potentially harmful content,
and (2) improving the clarity, transparency, accuracy, and utility of the
The selection of submissions depends on assessment of their quality
and how well they match the journal's scope and aims. The quality criteria
may be categorized as the importance, relevance, usefulness, and methodological
and ethical soundness of the research and the clarity, accuracy, and completeness
of the report.
The main purpose of medical research is to improve health or the delivery
of health care. If peer review is regarded as one stage in this process, it
might be expected to have measurable effects on health status. However, outcomes
such as this are difficult to assess because they are affected by numerous
other factors. Surrogate outcomes, such as process measures, are much easier
to assess, but may not provide a reliable measure of more meaningful indicators
In Table 1 we summarize
the possible effects of editorial peer review on the quality of reports of
clinical research, provide definitions for these, and suggest indicators that
could be used to assess them.
Research so far has measured only certain aspects of peer review, largely
focused on variations in the processes used rather than comparing the effects
of peer review with those of other systems. Given the resources spent on peer
review and the importance placed on it, this is unsatisfactory even though
it may reflect the fact that no part of the process of scientific evaluation
has been rigorously studied. Ironically, however, the fact that peer review
is so well entrenched makes it harder to study, since scientists and editors
may be unwilling to take part in randomized studies if they believe that the
current system serves them well. How, therefore, should the scientific community
proceed in its evaluation of peer review?
Ideally, this would be assessed by large-scale, long-term research into
2 cohorts of studies, randomized to undergo either peer review or an alternative
method of assessment, such as random selection for publication. Given the
complexity of factors at play a multivariate analysis may be necessary. However,
researchers might not be prepared to accept such randomization, and knowledge
of the trial could bias the results. It would be important to ensure that
both groups of studies were of equal average quality—for example by
examining submissions to a single journal. The follow-up period would have
to be lengthy to allow for changes in health status or health care delivery
to occur as a consequence of publication.
Another difficulty in studying the effects of peer review is that the
quality components of a manuscript are often interlinked, and it is meaningless
to study them in isolation. For example, a methodologically flawed study or
incomplete report will detract from the publication's usefulness.
If the scientific community could agree on the objectives of peer review
and collaborate in the assessment of its effects, we could start identifying
some of the practices for which evidence of effect is better than that of
controls. We propose that the following questions should be tested collaboratively
across journal settings:
Does peer review identify submissions of higher quality than other
selection methods (or chance)?
Does peer review improve the clarity, transparency, accuracy,
and utility of published papers meaningfully beyond that of the submitted
This would involve assessing the quality of both published and unpublished
submissions using well-validated instruments. We suggest that measures of
quality should include the importance, relevance, usefulness, and methodological
and ethical soundness of clinical studies. Such research would also involve
tracking submissions between journals since submissions rejected by one journal
often go on to be published by another.37
Although there is some evidence that peer review and editing improves
articles between submission and publication, the effectiveness of its selection
and filtering functions remains virtually untested.3,5
Yet, despite this lack of evidence, peer review is well established in most
academic disciplines. It is therefore possible that peer review is retained
for different reasons than those stated. For example, it may serve to protect
journals' reputations or to provide acceptability for commercially-funded
studies. Using unpaid reviewers probably reduces some aspects of the work
of in-house editors, although it also carries administrative costs. Peer review
is also so well established that it has become part of the system for assessing
academic merit in appointments and promotions. The broader functions of peer
review, including its social and psychological effects such as increasing
the credibility and prestige of published work, are rarely acknowledged and
have not, to our knowledge, ever been seriously studied.
Given the widespread use of peer review, it is surprising that so little
is known of its aims or effects although the same might be said of several
other, well-established processes of scientific appraisal. The financial costs
of peer review to the scientific community are difficult to estimate but should
not be ignored.38 There is also anecdotal evidence
that peer review has shortcomings and may even have harmful effects.2 Yet, until we have properly defined the aims of peer
review it will remain almost impossible to estimate the effectiveness of the
process or to improve it systematically.
The research needed to evaluate the effects of peer review poses many
methodological problems and would require the cooperation of large numbers
of authors and editors. The growth of electronic publishing has increased
the urgency of establishing an effective and efficient system for evaluating
scientific information but may also offer opportunities to explore alternatives
to the current peer-review system.39 Until
such research is undertaken, the ability of peer review to improve the quality
of published research and, ultimately, improve the dissemination of reliable
health information will remain uncertain.
Jefferson T, Wager E, Davidoff F. Measuring the Quality of Editorial Peer Review. JAMA. 2002;287(21):2786-2790. doi:10.1001/jama.287.21.2786