Context.— All authors may not be equal in the eyes of reviewers. Specifically,
well-known authors may receive less objective (poorer quality) reviews. One
study at a single journal found a small improvement in review quality when
reviewers were masked to author identity.
Objectives.— To determine whether masking reviewers to author identity is generally
associated with higher quality of review at biomedical journals, and to determine
the success of routine masking techniques.
Design and Setting.— A randomized controlled trial performed on external reviews of manuscripts
submitted to Annals of Emergency Medicine, Annals of Internal
Medicine, JAMA , Obstetrics & Gynecology , and Ophthalmology .
Interventions.— Two peers reviewed each manuscript. In one study arm, both peer reviewers
received the manuscript according to usual masking practice. In the other
arm, one reviewer was randomized to receive a manuscript with author identity
masked, and the other reviewer received an unmasked manuscript.
Main Outcome Measure.— Review quality on a 5-point Likert scale as judged by manuscript author
and editor. A difference of 0.5 or greater was considered important.
Results.— A total of 118 manuscripts were randomized, 26 to usual practice and
92 to intervention. In the intervention arm, editor quality assessment was
complete for 77 (84%) of 92 manuscripts. Author quality assessment was complete
on 40 (54%) of 74 manuscripts. Authors and editors perceived no significant
difference in quality between masked (mean difference, 0.1; 95% confidence
interval [CI], −0.2 to 0.4) and unmasked (mean difference, −0.1;
95% CI, −0.5 to 0.4) reviews. We also found no difference in the degree
to which the review influenced the editorial decision (mean difference, −0.1;
95% CI,−0.3 to 0.3). Masking was often unsuccessful (overall, 68% successfully
masked; 95% CI, 58%-77%), although 1 journal had significantly better masking
success than others (90% successfully masked; 95% CI, 73%-98%). Manuscripts
by generally known authors were less likely to be successfully masked (odds
ratio, 0.3; 95% CI, 0.1-0.8). When analysis was restricted to manuscripts
that were successfully masked, review quality as assessed by editors and authors
still did not differ.
Conclusions.— Masking reviewers to author identity as commonly practiced does not
improve quality of reviews. Since manuscripts of well-known authors are more
difficult to mask, and those manuscripts may be more likely to benefit from
masking, the inability to mask reviewers to the identity of well-known authors
may have contributed to the lack of effect.
IT HAS BEEN suggested that masking reviewers to author identity would
improve the fairness and the quality of peer review1
because well-known authors' work may be reviewed less critically. Yet only
a small fraction of journals routinely mask reviewers.2,3
When editors are asked why they do not mask, they cite an "overwhelming burden"
associated with masking.2,3 Some
question whether it is possible to mask successfully.2,4
One study, conducted at a single journal,5
demonstrated that the quality of masked reviews was statistically higher than
that of unmasked reviews, although that difference was small. We tested the
hypothesis that masking peer reviewers to author identity improves the quality
of peer review at 5 biomedical journals. To increase the generalizability
of our study, we used a masking procedure that is commonly practiced.
Five journals participated in the study: Annals of
Emergency Medicine, Annals of Internal Medicine , JAMA , Obstetrics & Gynecology , and Ophthalmology. Only 1 of these journals, Annals of
Emergency Medicine, routinely masks reviewers to author identity.
Manuscript Enrollment. Eligible manuscripts were submitted between November 1995 and March
1996 and met the following inclusion criteria: (1) the manuscript reported
original research, including meta-analyses but excluding case reports or letters,
(2) the manuscript was sent for external peer review, and (3) the authors
did not object to having their manuscripts enrolled. Authors were notified
that their manuscripts would be included in a study of the peer review process
unless they declined, and that declining to have their manuscripts enrolled
would not affect any editorial decisions regarding their manuscript. No authors
Masking Procedure. Each journal followed a standardized masking procedure that involved
removing author and institutional identity from the title page, running headers
or footers, and acknowledgments of the manuscripts. Self-references in the
text were not removed. In addition, the managing editor at Annals of Internal Medicine removed names and journal identification
(but not titles or other reference information) from self-references in the
text and reference section. Annals of Emergency Medicine also stated in their "Information for Authors" that authors should
not include author names in the running heads.
Study Design. At each journal, the editor followed the journal's usual procedure in
selecting manuscripts for review and identified 2 reviewers for each manuscript.
Once reviewers had agreed to review the manuscript, 2 randomizations were
performed. The first was weighted to assign 25% of manuscripts to usual practice.
These manuscripts were reviewed according to the journal's usual practice
(ie, all but those at Annals of Emergency Medicine
were unmasked reviews). The manuscripts randomized to the intervention arm
had 1 of 2 reviewers randomly selected to receive a manuscript from which
the author identity and institution had been removed. The other reviewer received
the manuscript with author and institution identified. Randomization was performed
using random number tables. In all cases, both reviewers were sent a questionnaire
along with a statement that the manuscript was part of a study of peer review
(described herein). The reviewers returned the manuscripts, reviews, and questionnaire
to the journal. Before sending the reviews to editors and authors, the managing
editors at each journal removed from the reviews any information that would
reveal whether the manuscript had been masked. The manuscript editor and the
corresponding author rated the quality of each review, unaware of the group
to which the review had been assigned.
Questionnaires. Questionnaires were completed by editors, authors, and reviewers. Because
of a miscommunication with the managing editor, no authors at the Annals of Emergency Medicine were sent surveys. Editor and author surveys
each included 4 specific ratings of review quality: how well the review addressed
the clinical or research importance of the study, how well it identified the
study's strengths and weaknesses, whether reviewers were courteous, and whether
reviewers supplied evidence to support their statements. Authors and editors
responded to these questions using a 5-point Likert scale for which a score
of 5 represented the best quality. Authors and editors also were asked to
provide an overall rating of review quality on a 5-point Likert scale. Editors
were asked how much the review had influenced their decision. These questions
were chosen to parallel those used in the prior study of masking.5
To determine masking success, reviewers in the masked group were asked
whether they thought they could identify any of the authors or their institutions,
and if so, to list the authors and institutions. Reviewers in both groups
were asked whether they were familiar with the authors, their previous work,
or the reviewed work.
Analysis. The primary outcome was the difference in quality as assessed by the
editor and author between the masked and unmasked review for each manuscript.
Because this analysis uses a comparison between the quality of a masked and
an unmasked review for each manuscript, a paired t
test was used. The sample size for this analysis was the total number of manuscripts
randomized to the intervention for which masking status and editor's quality
score were complete. A positive difference means that masked reviews were
of better quality than unmasked reviews; a negative difference means that
masked reviews were of lesser quality than unmasked reviews. The Wilcoxon
matched-pairs signed rank test was used to verify the result of the paired t test. A difference of greater than 0.5 on the 5-point
Likert scale was considered editorially important.
The secondary outcome was masking success. We defined a review as successfully
masked if the reviewer did not guess the author's identity or if the reviewer
guessed incorrectly. Exact 95% confidence intervals (CIs) were calculated
using the binomial distribution. This analysis did not require a pair of reviews
per manuscript. Thus, the sample size for this analysis was all reviews that
were randomized to be masked, either because of the usual practice at that
journal or because that reviewer was randomized to receive the masking intervention.
We analyzed masking success overall and by journal and tested for differences
across the 5 journals using the χ24 test. We subsequently
identified 1 journal with a significantly higher success rate comparing each
journal with all other journals using a χ21 test.
Author renown was determined by whether the randomly unmasked reviewer was
familiar with the author, their prior work, or the current work. Logistic
regression, with variance estimates adjusted for multiple observations per
manuscript, was used to explore whether the higher success of masking at the Annals of Emergency Medicine could be explained by differences
in author renown and to test whether author renown was associated with reduced
A total of 118 manuscripts were randomized, 26 to usual practice and
92 to the intervention. Of those randomized to the intervention, 77 (84%)
of 92 had sufficient data to compare review quality based on the editor's
judgment. Only 40 (54%) of 74 had sufficient data to compare review quality
based on the author's judgment (the denominator for author judgment is smaller
because no authors received surveys from Annals of Emergency
Medicine). Of those reviewers randomized to receive masked manuscripts,
99 (93%) of 106 had sufficiently complete data to evaluate the success of
masking reviewers to author identity.
Review Quality. Editors perceived no significant difference in quality between masked
and unmasked reviews (mean difference, 0.1; 95% CI,−0.2 to 0.4). Results
were similar when editors rated the degree to which the review influenced
their decision (mean difference, –0.1; 95% CI,−0.3 to 0.3). Authors
also perceived no overall difference in quality between masked and unmasked
reviews (mean difference,−0.1; 95% CI,−0.5 to 0.4). Differences
in quality did not vary substantially by journal (Table 1). When the analysis of review quality was restricted to
pairs for which masking was successful, no difference in quality was found.
There were also no significant quality differences between masked and unmasked
reviews for the 4 specific components of quality (data not shown). All results
were confirmed when tested using the nonparametric signed rank test.
Table 1.—Mean Difference in Quality Between Masked and Unmasked Reviews*
Table 1.—Mean Difference in Quality Between Masked and Unmasked
Masking Success. Success in masking reviewers to author identity was generally low (68%;
95% CI, 58%-77%) and fairly consistent across all but 1 participating journal, Annals of Emergency Medicine, which achieved a masking
success rate of 90% (95% CI, 73%-98%; Table
2). This corresponds to odds of success 6.5 times that of the other
journals (95% CI, 1.8-23.7). When this journal was excluded from the calculation
of overall masking success, the rate dropped to 58% (95% CI, 45%-70%) and
a χ2 test for significant differences among the remaining journals
was not significant.
Table 2.—Success of Masking Reviewers to Author Identity*
Table 2.—Success of Masking Reviewers to Author Identity*
Author Renown. Manuscripts by authors with whom the unmasked reviewer was familiar
(n=43) were less likely to be successfully masked (53%) (that is, the masked
reviewer was more likely to correctly guess author identity) than those of
authors who were not known to the unmasked reviewer (79%; P =.008; odds ratio [OR], 0.3; 95% CI, 0.1-0.8).
We also considered whether author renown explained the apparent discrepancy
in masking success rates between the Annals of Emergency
Medicine and the other participating journals. When an indicator variable
for Annals of Emergency Medicine was used to predict
masking success in a logistic model, the unadjusted OR was 5.9 (95% CI, 1.6-21.1).
When a multivariate logistic model was used to adjust for author renown, the
OR was reduced but remained significant (OR, 4.8; 95% CI, 1.2-19.3).
McNutt and colleagues5 found that reviews
of masked manuscripts were of marginally higher quality than reviews of unmasked
manuscripts (3.5 vs 3.1 on a 5-point scale). However, they recognized that
this difference was small. We believed it was too small to justify the added
time and cost of masking. We sought to determine whether the difference in
quality due to masking, studied among several journals, might be larger. In
this first multijournal study of peer review at biomedical journals, our 95%
CI excluded an overall improvement in peer review quality of 0.5 or greater
on a 5-point Likert scale, the difference we considered editorially significant.
Poor overall masking success, in combination with the observation that
an author's renown is strongly associated with masking failure, is a possible
explanation for this finding. Notably, the average masking success in our
study was similar to that achieved by McNutt et al5
and Yankauer6 and was obtained using commonly
used procedures that are generalizable to standard journal practices.
The participation of multiple journals, sufficient overall sample size,
and a feasible method of masking author identity make it likely that these
findings are valid for most biomedical journals. However, our study was limited
to biomedical journals and our sample size was too small to eliminate the
possibility of an editorially important difference in quality for individual
journals. Although we achieved a good response rate for editor's quality ratings
(84%), the rate for authors was low (54%). Thus, our conclusions for author
evaluations may not generalize well. Additionally, if the major improvement
in review quality provided by masking would be expected to occur for manuscripts
of renowned authors, then we cannot exclude the possibility that increasing
the rate of successfully masking renowned authors could improve review quality.
Finally, reviewers were aware that they were participating in a study. The
effect of such knowledge on the quality of review is unknown.
Our study did not directly address the question of whether masking improves
fairness. However, if masking is frequently unsuccessful, especially among
well-known authors, it is not likely to improve fairness, no matter how fairness
might be defined. The only potential benefit to a policy of masking that is
largely unsuccessful is the appearance of fairness.
We conclude that masking reviewers to author identity as commonly practiced
does not improve review quality. Further, masking as commonly practiced fails
to hide the identity of renowned authors and therefore may also fail to improve
the fairness of review. Techniques to improve masking success are needed to
determine whether masking the identity of renowned authors improves review
quality or fairness.
Fletcher RH, Fletcher SW. Evidence for the effectiveness of peer review. Sci Eng Ethics.1997;3:35-43.Google Scholar
Cleary JD, Alexander B. Blind versus nonblind review:survey of selected medical journals. Drug Intell Clin Pharm.1988;22:601-602.Google Scholar
Pitkin RM. Blinded manuscript review: an idea whose time has come? Obstet Gynecol.1995;85:781-782.Google Scholar
Moossy J, Moossy YR. Anonymous authors, anonymous referees: an editorial exploration. J Neuropathol Exp Neurol.1985;44:225-228.Google Scholar
McNutt RA, Evans AT, Fletcher RH, Fletcher SW. The effects of blinding on the quality of peer review. JAMA.1990;263:1371-1376.Google Scholar
Yankauer A. How blind is blind review? Am J Public Health.1991;81:843-845.Google Scholar