Context.— Little research has been conducted into the quality of peer review and,
in particular, the effects of blinding peer reviewers to authors' identities
or masking peer reviewers' identities.
Objective.— To determine whether concealing authors' identities from reviewers (blinding)
and/or revealing the reviewer's identity to a coreviewer (unmasking) affects
the quality of reviews, the time taken to carry out reviews, and the recommendation
regarding publication.
Design and Setting.— Randomized trial of 527 consecutive manuscripts submitted to BMJ, which were randomized and each sent to 2 peer reviewers.
Interventions.— Manuscripts were randomized as to whether the reviewers were unmasked,
masked, or uninformed that a study was taking place. Two reviewers for each
manuscript were randomized to receive either a blinded or an unblinded version.
Main Outcome Measures.— Mean total quality score, time taken to carry out the review, and recommendation
regarding publication.
Results.— Of the 527 manuscripts entered into the study, 467 (89%) were successfully
randomized and followed up. The mean total quality score was 2.87. There was
little or no difference in review quality between the masked and unmasked
groups (scores of 2.82 and 2.96, respectively) and between the blinded and
unblinded groups (scores of 2.87 and 2.90, respectively). There was no apparent
Hawthorne effect. There was also no significant difference between groups
in the recommendations regarding publication or time taken to review.
Conclusions.— Blinding and unmasking made no editorially significant difference to
review quality, reviewers' recommendations, or time taken to review. Other
considerations should guide decisions as to the form of peer review adopted
by a journal, and improvements in the quality of peer review should be sought
via other means.
PEER REVIEW has a key role in determining which original research is
published and thus becomes part of the accepted body of scientific knowledge.
Despite its central role, little research has been conducted into the relative
benefits or effectiveness of different approaches to peer review.1,2 We decided to examine 2 questions:
what are the effects of blinding reviewers to the identity of the authors
of a manuscript and of unmasking (revealing the identity of) reviewers to
their coreviewers?
There are several reasons for believing that blinding may be beneficial.
First, blinded reviewers may provide less biased reviews.3
Second, some editors believe blinding improves the quality of reviews,1,4-9
a belief supported by one small randomized controlled trial.10
Finally, articles that appear in journals that use blinded review are more
likely to be cited than those published in journals that use nonblinded review.11 Unmasking the identity of reviewers to one another
has not previously been studied, although many journals already carry out
the practice and believe it to result in higher-quality reviewing.
By means of a randomized trial, we set out to evaluate the effects of
blinding (concealing the identity of authors from a reviewer), unmasking (revealing
the identity of a reviewer to a coreviewer), and a combination of the 2 on
the quality of reviews. The study also sought to establish the feasibility
of successful blinding.
Consecutive manuscripts in the categories of research articles, short
reports, and research articles from general practice (also known as family
medicine or primary care) received by BMJ and sent
by editors for peer review between January and June 1997 were eligible for
inclusion. Manuscripts were randomized (stratified by the 3 above-mentioned
categories) into 1 of 3 groups: 2 intervention groups (masked and unmasked)
and an uninformed group (Figure 1).
The randomization process was undertaken by a researcher who was independent
from the editorial decision-making process using a computerized minimization
program with a random component. Each manuscript was sent to 2 paid clinical
reviewers selected by whoever of the 11 editors was responsible for the particular
manuscript. In both the masked and unmasked groups the reports of pairs of
reviewers were exchanged. Reviewers in the unmasked group were asked to consent
to their identity being revealed to their coreviewer. Ethical approval was
granted by a university ethics committee.
Having randomly allocated the manuscripts, the reviewers in the masked
and unmasked groups were randomized to receive either a blinded or an unblinded
version of the manuscript. Blinding consisted of removing authors' details
from the title page and acknowledgments. No attempt was made to remove authors'
details from within the text of the manuscript, the illustrations, or the
references. Blinded reviewers were asked whether they thought they knew the
identity of the author(s), and if so, to detail the name(s) and/or the institution
and to explain why they thought they could tell. All reviewers in the intervention
groups were also asked to record how long they spent on the review and their
recommendation regarding publication of the manuscript. If 1 of the 2 reviewers
of a manuscript in the unmasked group withheld consent, the manuscript was
transferred into a preference arm, and the reviewer's identity was kept concealed.
Since awareness of being in a study might affect the reviewers' behavior,
an uninformed group was included that allowed us to test for a Hawthorne effect.
Manuscripts in the uninformed group were sent to 2 reviewers who were not
informed that a study was taking place. Care was taken that those who had
reviewed manuscripts in the masked or unmasked group were not subsequently
selected to review manuscripts in the uninformed group. At no stage were editors
or authors aware of the group to which a manuscript had been allocated.
On receipt of both reviews of a manuscript, the reviews, with authors'
details removed from them, were passed together with the manuscript to the
responsible editor, who was asked to assess the quality of the reviews. All
the documents were subsequently returned to the researcher, who passed the
manuscript to a second editor randomly selected from the remaining 10 editors
taking part in the study for a second, independent evaluation. The quality
of the reviews was assessed using a validated review quality instrument developed
from an instrument used in a previous study.10
A decision on whether to publish the article was made in the journal's usual
manner. At least 10 days after the decision had been communicated to the authors,
the corresponding author was asked to evaluate the 2 reviews using the review
quality instrument. The authors were also asked whether they thought each
reviewer had been blinded to their identity.
The review quality instrument consisted of 7 items (importance of the
research question, originality, methodology, presentation, constructiveness
of comments, substantiation of comments, interpretation of results), each
scored on a 5-point Likert scale (1=poor, 5=excellent). A total score was
based on the mean of the 7 item scores. A full version of the instrument has
been reported on elsewhere.12 In addition,
a global item seeking an overall assessment of the quality of the review was
included. The quality of each review was based on the means of the 2 editors'
scores for each item and total score and on the corresponding author's scores.
This article considers only the editors' assessments of review quality. The
means of 2 editors' scores were used to improve the reliability of the method.
Data collection from authors is still continuing and will be reported later.
Two additional outcome measures were used: the time taken to carry out the
review and the editorial decision (accept, revise, reject).
It was calculated in advance that in order to detect an editorially
significant difference in review quality scores of 0.4 (α=.05, β=.10,
SD=1.5), 148 manuscripts would be required in each of the masked, unmasked,
and uninformed groups. Recruitment of manuscripts was continued until we were
certain of retaining at least 148 in each group after taking account of exclusions
and losses after randomization.
Analysis used independent comparisons of outcome measures between masked
and unmasked reviewers (excluding manuscripts for which 1 reviewer had withheld
consent for unmasking), paired comparisons between blinded and unblinded reviewers,
and independent comparisons between masked unblinded reviewers and uninformed
reviewers, using t tests. Two-way analysis of variance
was used to compare the 2 factors in the 4 intervention arms of the study.
Recruitment and Randomization
Between January and June 1997, an estimated 570 eligible manuscripts
were sent for peer review. Of these, 43 were not entered into the study, either
as a result of an administrative error or because, in the case of 5 pairs
of articles by the same authors, a decision was made that only the first article
would be included. The 527 manuscripts (92%) included consisted of 393 research
articles, 74 short reports, and 60 general practice research articles. Of
these 527 manuscripts, 60 were excluded after randomization, either because
it proved impossible to obtain 2 suitable reviews without causing an unacceptable
delay in the editorial decision-making process or because a reviewer who was
randomized to receive a blinded manuscript had the authors' identity revealed
in error. The distribution of short reports, research articles, and general
practice articles was similar for the exclusions and for the total sample.
The remaining 467 manuscripts were randomized to the masked group (n=149),
the unmasked group (n=160), and the uninformed group (n=158). For the 160
manuscripts in the unmasked group, 10 of the 320 reviewers did not give consent
to their identity being revealed. These 10 manuscripts were included in the
preference arm (Figure 1). Successful
follow-up was achieved for all 467 manuscripts.
In order to assess the success of randomization, we compared characteristics
of the manuscripts (geographic origin) and the reviewers (mean age, residence
in North America, postgraduate training in epidemiology or statistics, involved
in medical research). There were no striking differences between groups. Exclusions
did not introduce any bias.
Success of Blinding. Of the 309 blinded reviewers, 293 (95%) replied to the question concerning
whether they could identify the authors of the manuscript. With successful
blinding defined as either author not identified or author identified incorrectly,
170 reviewers (58%) were successfully blinded (Table 1). The main reasons given for being able to identify the
author included self-referencing, clues contained within the text of the manuscript,
and a small research field. If successful blinding is extended to include
those who were only partially successful in identifying authorship (for example,
named one author correctly but others incorrectly), then 196 reviewers (67%)
were successfully blinded.
Table 1.—Assessment of the Success of Blinding
Extent of a Hawthorne Effect. There was no evidence of any difference between masked unblinded and
uninformed reviewers and therefore no detectable Hawthorne effect (Table 2).
Table 2.—Comparison of Review Quality and Review Time*
Effect of Blinding and Unmasking on Review Quality. The mean total quality score was 2.87. There was little or no difference
in total or item scores between blinded and unblinded reviewers or between
masked and unmasked reviewers (Table 2). The largest difference in mean total score was only 0.14. Although
some of the differences were statistically significant (P<.05), showing that unmasking tended to produce higher-quality
reviews, in absolute terms these differences were not editorially significant.
Although 2-factor analysis of variance revealed a statistically significant
difference between the masked and unmasked groups (P=.04),
absolute differences were of no editorial significance (blinded/unblinded P =.26, interaction P =.14, overall P =.05). Analyses based only on those successfully blinded
(170 reviewers) led to similar results.
Effect of Blinding and Unmasking on Editorial Decision and Review Time. No significant difference was found between the blinded and unblinded
groups or between the masked and unmasked groups in the time taken for the
reviewers to complete their reports (Table
2). Similarly, χ2 tests found no significant differences
in the recommendations regarding publication (publish with minor amendment,
publish with major amendment, reject) between the blinded and unblinded groups
(P=.24, df=2), between the
masked and unmasked groups (P=.65, df =2), or among the 4 intervention arms (P=.66, df=6). Analyses based only on those successfully blinded
led to similar results.
Blinding and unmasking have little effect on the quality of reviews
of manuscripts. Any differences that have statistical significance are too
small to be of any practical significance in editorial decision making. The
only previous randomized trial10 reported higher-quality
reviews when reviewers were blinded. This difference may have arisen either
because the previous study was based on a more specialized journal, in which
reviewers and authors would be more likely to be familiar with one another's
work, or because of differences in the way review quality was assessed (the
psychometric properties of the instrument used in the earlier study are unknown).
Before discussing the implications of these findings, potential methodologic
shortcomings need to be considered. First, can the sample of manuscripts and
their reviewers be considered truly random? There is no evidence of bias at
any stage, although difficulties in finding suitable reviewers in the uninformed
arm during the latter stages of recruiting due to the large number of reviewers
already recruited to the intervention arms could have been one reason why
we failed to find a Hawthorne effect. Of eligible manuscripts, 92% were recruited,
and 89% of those were successfully followed up. The distribution of the manuscripts
excluded and unavailable for follow-up was similar to that of those followed
up.
Second, the results concerning review quality are completely dependent
on the review quality instrument. This has been validated and has good internal
consistency and interrater and intrarater reliability, and we believe it to
be sufficiently accurate and robust to discriminate between reviews of differing
quality for the purposes of this study. Full details of its development and
validation will be reported elsewhere.
Third, the success rate for blinding is within the range found in previous
studies. Although we were successful with only 58% of reviewers, analyses
based on those actually blinded produced similar results to analyses based
on the intention to blind.
Fourth, the views of authors, which have yet to be analyzed since the
data are still incomplete, may differ from the views of editors. These will
be reported in a subsequent article.
There is little evidence from this study to support changing current
practice by blinding or unmasking to improve the quality of reviews. Blinding
or unmasking might, however, have other advantages in the peer review process,
such as ensuring that the review process is seen to be fair. In view of the
difference between the results of this study and previous research, it is
not possible to generalize from this study to other settings, particularly
the many biomedical journals that are more specialized. Further research should
encompass a wide variety of different types and sizes of journals.
1.Lock S. A Difficult Balance: Editorial Peer Review in Medicine . London, England: Nuffield Provincial Hospitals Trust; 1985.
2.Kassirer JP, Campion EW. Peer review: crude and understudied, but indispensable.
JAMA.1994;272:96-97.Google Scholar 3.Fisher M, Friedman SB, Strauss B. The effects of blinding on acceptance of research papers by peer review.
JAMA.1994;272:143-146.Google Scholar 4.Strasburger VC. Righting medical writing.
JAMA.1985;254:1789-1790.Google Scholar 5.Yankauer A. Peer review again.
Am J Public Health.1982;72:239-240.Google Scholar 6.Shapiro S. The decision to publish: ethical dilemmas.
J Chronic Dis.1985;38:365-372.Google Scholar 7.Ingelfinger FJ. Peer review in biomedical publication.
Am J Med.1974;56:686-692.Google Scholar 8.Robin ED, Burke CM. Peer review in medical journals.
Chest.1987;91:252-255.Google Scholar 9.Feinstein AR. Some ethical issues among editors, reviewers and readers.
J Chronic Dis.1986;39:491-493.Google Scholar 10.McNutt RA, Evans AT, Fletcher RH, Fletcher SW. The effects of blinding on the quality of peer review.
JAMA.1990;263:1371-1376.Google Scholar 11.Laband DN, Piette MJ. A citation analysis of the impact of blinded peer review.
JAMA.1994;272:147-149.Google Scholar 12.Black N, van Rooyen S, Godlee F, Smith R, Evans S. What makes a good reviewer and a good review for a general medical
journal?
JAMA.1998;280:231-233.Google Scholar