Context.— Selecting peer reviewers who will provide high-quality reviews is a
central task of editors of biomedical journals.
Objectives.— To determine the characteristics of reviewers for a general medical
journal who produce high-quality reviews and to describe the characteristics
of a good review, particularly in terms of the time spent reviewing and turnaround
time.
Design, Setting, and Participants.— Surveys of reviewers of the 420 manuscripts submitted to BMJ between January and June 1997.
Main Outcome Measures.— Review quality was assessed independently by 2 editors and by the corresponding
author using a newly developed 7-item review quality instrument.
Results.— Of the 420 manuscripts, 345 (82%) had 2 reviews completed, for a total
of 690 reviews. Authors' assessments of review quality were available for
507 reviews. The characteristics of reviewers had little association with
the quality of the reviews they produced (explaining only 8% of the variation),
regardless of whether editors or authors defined the quality of the review.
In a logistic regression analysis, the only significant factor associated
with higher-quality ratings by both editors and authors was reviewers trained
in epidemiology or statistics. Younger age also was an independent predictor
for editors' quality assessments, while reviews performed by reviewers who
were members of an editorial board were rated of poorer quality by authors.
Review quality increased with time spent on a review, up to 3 hours but not
beyond.
Conclusions.— The characteristics of reviewers we studied did not identify those who
performed high-quality reviews. Reviewers might be advised that spending longer
than 3 hours on a review on average did not appear to increase review quality
as rated by editors and authors.
ALTHOUGH all editors would like to know how to select good reviewers,
there have been only 3 attempts to identify their characteristics.1-3 Two of these studies
found that the best-quality reports were provided by reviewers who were young1,2 and, therefore, of junior academic
status,1 particularly if they were working
at a top academic institution or were known to the editors.2
The third study demonstrated that younger reviewers with considerable refereeing
experience provided stricter assessments of manuscripts than other reviewers.3 None of the other characteristics examined (such as
research training and postgraduate qualifications) were associated with review
quality.
The process of peer reviewing has also received little attention. While
3 studies have reported on the time spent by reviewers on the task,4-6 none examined the relationship
between time spent and review quality.
Our principal objective was to determine the characteristics of reviewers
who produce high-quality reviews. In addition, we considered the characteristics
of good reviews in terms of the time spent by the reviewer and time taken
to deliver it to the journal.
Consecutive manuscripts (research papers) submitted to BMJ and sent for review between January and June 1997 were eligible
for inclusion. Each manuscript was sent to 2 reviewers (selected from the
existing reviewer database as having an interest in and knowledge of the subject
matter of the manuscript) as part of a randomized trial of blinding and unmasking
to coreviewer.7 Reviewers were supplied with
the journal's standard advice and asked to return their review within 3 weeks.
In addition, two thirds of the reviewers (the third included in the "uninformed"
portion of the trial had to be omitted to determine any Hawthorne effect)
were asked to report how long they spent carrying out the review (including
reading the manuscript, making notes, and writing the review). The time taken
to return the review to the journal was recorded.
The quality of each review was assessed independently by 2 editors and
by the corresponding author of the manuscript, using a new review-quality
instrument. It considers the following 7 aspects of a review: the extent to
which the reviewer addressed the importance of the research question, the
originality of the question, the strengths and weaknesses of the method, the
presentation of the paper (writing, organization, illustrations), the interpretation
of the results, and the extent to which the reviewer provided constructive
comments and substantiated the comments. Each item is scored on a 5-point
Likert scale (1 =poor, 5=excellent). The total score is the mean of the 7
item scores. In addition, there is a global item seeking an overall assessment
of the quality of the review. The internal consistency of the instrument is
high (Cronbach α=.84) as is the interrater reliability of the total
score (Kendall coefficient, τ=0.83). Full details of its psychometric
properties is available from the authors.
Information on the characteristics of reviewers was obtained from a
mailed questionnaire survey carried out in 1996. This covered demographic
characteristics (sex, age, country of residence), education and familiarity
with research (age when received first degree, postgraduate qualifications,
postgraduate training in epidemiology or statistics, current academic appointment,
appointment in university or teaching center, current research investigator),
publication experience (number of peer-reviewed research publications in past
5 years, number of papers reviewed in past year, number of journals where
participated as a reviewer, member of a journal editorial board, member of
a research funding body), and willingness to review blinded papers and to
have identity revealed to authors.
Review quality was measured in the following 2 ways: the mean of the
2 editors' total scores and the author's total score. Initially, the relationship
between the total quality score and each reviewer characteristic was assessed
using linear regression analysis. Characteristics that were statistically
significant (P<.01) were then entered stepwise
using multiple regression (SPSS [Statistical Package for the Social Sciences]
for Microsoft Windows, Release 6.1, SPSS, Inc, Chicago, Ill). The significance
of comparisons of categorical variables was assessed using χ2
tests.
Recruitment and Response Rate
An estimated 420 eligible manuscripts were submitted to the journal
during the recruitment period of which 2 reviews were obtained for 345 (82%).
Of these 690 reviews, information on the characteristics of reviewers was
available for 670. The analyses presented here are based on all 670 reviews
for the editors' assessment of quality and on the first 507 reviews for which
the corresponding author's assessment was available. The only exception is
the analysis of time spent carrying out the review, for which the sample was
restricted to the 438 (editors' assessment) and 342 (author's assessment)
reviewers who answered that question.
Characteristics of a Good Reviewer
Editors' Assessments.— Four of the 16 characteristics of reviewers were significantly associated
(P<.05) with the editors' assessment of review
quality: age (r2=0.03), resident in North
America (mean score, 3.22 vs 2.88; r2
=0.02), training in epidemiology or statistics (3.03 vs 2.74; r2 =0.04), and current research investigator (2.92 vs 2.74; r2 =0.006). Age showed a quadratic relationship
in which younger reviewers up to about 60 years were more likely to produce
higher-quality reviews (Figure 1),
beyond which there was no statistically significant change in review quality.
When entered in a multiple regression model, only 2 characteristics
remained significantly associated (P<.01) with
higher-quality reviews: younger age and having training in epidemiology or
statistics (Table 1). Together
with the characteristic "resident in North America," these 3 were, however,
of very limited predictive power (r2=0.08).
Reviewer Characteristics Associated With Higher-Quality Reviews Based on Editors' (n = 670) and Author's Assessments (n = 507)
Author's Assessments.— Four of the 16 characteristics of reviewers were significantly associated
(P<.05) with the author's assessment of review
quality: age (r2=0.008), training in epidemiology
or statistics (2.94 vs 2.77; r2 =0.009),
not a member of a journal editorial board (2.96 vs 2.81; r2 =0.011), and resident in North America (3.13 vs 2.85; r2 =0.014). When entered in a multiple regression
model, 2 characteristics remained significantly associated with higher-quality
reviews (Table). As above, these
were, however, of very limited predictive power (r2=0.02).
Characteristics of a Good Review
There was no association between the editors' assessment of review quality
and the time taken by reviewers to return their reviews. There was, in contrast,
a clear nonlinear relationship with the time spent by reviewers on their reviews.
Review quality improved with increasing time up to about 3 hours, but not
beyond.
The characteristics of reviewers considered in this study had little
association with the quality of the reviews they produced. This was true regardless
of whether editors or authors defined the quality of the review. The only
consistent finding was that reviewers trained in epidemiology or statistics
were more likely to produce good reviews. While age was associated with review
quality according to editors' assessments (consistent with previous studies1,2), it was not when reviews were assessed
by authors. The converse was true for the characteristic "not a member of
an editorial board." In regard to what makes a good review, the longer time
spent on the task (up to about 3 hours), the better the review.
The lack of association between review quality and certain reviewer
characteristics was surprising. We had expected that those actively involved
in research, those occupying academic positions, and members of research funding
bodies would have made better reviewers than others. This was not so. We did
not seek to categorize the prestige of academic institutions so were unable
to investigate the previously reported association between high-quality reviews
and highly prestigious institutions. The association with North American residency
was probably confounded given that such reviewers are more highly selected
by a British journal than their British counterparts.
Before discussing the implications of these findings, 2 potential methodological
limitations need to be considered. First, two thirds of the reviewers knew
they were participating in a study, which may have affected the quality of
their review, and the time spent carrying it out. Concern about a Hawthorne
effect, however, appears to be unfounded as data presented elsewhere demonstrate.7 The mean total score for the unblinded, masked reviewers
was 2.79 and for the uninformed reviewers was 2.87 (difference, 0.08; 95%
confidence interval, −0.06 to 0.22). Second, our findings depend crucially
on the review-quality instrument. Full details of its development and validation
is available from the authors. It has good internal consistency and interrater
reliability, and we believe it was sufficiently accurate and robust for the
purposes of this study. However, it should be noted that the instrument can
only assess review quality in terms of content and completeness, not in terms
of whether the reviewer's judgment was correct.
So, what makes a good reviewer and a good review? Our failure to explain
more than 8% of the characteristics of a good reviewer is either because we
did not measure the relevant factors or no consistent pattern exists. In other
words, there are almost as many types of good reviewers as there are good
reviews. If true, the implication for editors is that they simply have to
try new reviewers, assess their performance, and decide whether to continue
to use them. This course of action raises the question of how new reviewers
might learn their trade. It may be time for journals to start training their
reviewers—though this assumes peer review is worthwhile and that people
can be trained. Meanwhile, one suggestion that we can offer editors is to
recruit reviewers with training in epidemiology or statistics, and probably
to enlist people nearer 40 than 60 years of age. Reviewers might also be advised
to spend no longer than 4 hours on their task.
Finally, it is unclear whether these findings are applicable to the
vast majority of biomedical journals that have a specialized rather than a
general focus and some of which may not provide guidance to their reviewers.
Further research in this area might usefully include both types of journal.
1.Stossel TP. Reviewer status and review quality: experience of the
Journal of Clinical Investigation.
N Engl J Med.1985;312:658-659.Google Scholar 2.Evans AT, McNutt RA, Fletcher SW, Fletcher RH. The characteristics of peer reviewers who produce good quality reviews.
J Gen Intern Med.1993;8:422-428.Google Scholar 3.Nylenna M, Riis P, Karlsson Y. Multiple blinded reviews of the same two manuscripts: effects of referee
characteristics and publication language.
JAMA.1994;272:149-151.Google Scholar 4.Yankauer A. Who are the peer reviewers and how much do they review?
JAMA.1990;263:1338-1340.Google Scholar 5.Lock S, Smith J. What do peer reviewers do?
JAMA.1990;263:1341-1343.Google Scholar 6.McNutt RA, Evans AT, Fletcher RH, Fletcher SW. The effects of blinding on the quality of peer review: a randomized
trial.
JAMA.1990;263:1371-1376.Google Scholar 7.van Rooyen S, Godlee F, Evans S, Smith R, Black N. Effect of blinding and unmasking on the quality of peer review: a randomized
trial.
JAMA.1998;280:234-237.Google Scholar