[Skip to Navigation]
Sign In
Peer Review Congress
July 15, 1998

Effect of Blinding and Unmasking on the Quality of Peer Review: A Randomized Trial

Author Affiliations

From BMJ (Ms van Rooyen and Drs Godlee and Smith) and the London School of Hygiene and Tropical Medicine (Mr Evans and Dr Black), London, England.

JAMA. 1998;280(3):234-237. doi:10.1001/jama.280.3.234
Abstract

Context.— Little research has been conducted into the quality of peer review and, in particular, the effects of blinding peer reviewers to authors' identities or masking peer reviewers' identities.

Objective.— To determine whether concealing authors' identities from reviewers (blinding) and/or revealing the reviewer's identity to a coreviewer (unmasking) affects the quality of reviews, the time taken to carry out reviews, and the recommendation regarding publication.

Design and Setting.— Randomized trial of 527 consecutive manuscripts submitted to BMJ, which were randomized and each sent to 2 peer reviewers.

Interventions.— Manuscripts were randomized as to whether the reviewers were unmasked, masked, or uninformed that a study was taking place. Two reviewers for each manuscript were randomized to receive either a blinded or an unblinded version.

Main Outcome Measures.— Mean total quality score, time taken to carry out the review, and recommendation regarding publication.

Results.— Of the 527 manuscripts entered into the study, 467 (89%) were successfully randomized and followed up. The mean total quality score was 2.87. There was little or no difference in review quality between the masked and unmasked groups (scores of 2.82 and 2.96, respectively) and between the blinded and unblinded groups (scores of 2.87 and 2.90, respectively). There was no apparent Hawthorne effect. There was also no significant difference between groups in the recommendations regarding publication or time taken to review.

Conclusions.— Blinding and unmasking made no editorially significant difference to review quality, reviewers' recommendations, or time taken to review. Other considerations should guide decisions as to the form of peer review adopted by a journal, and improvements in the quality of peer review should be sought via other means.

PEER REVIEW has a key role in determining which original research is published and thus becomes part of the accepted body of scientific knowledge. Despite its central role, little research has been conducted into the relative benefits or effectiveness of different approaches to peer review.1,2 We decided to examine 2 questions: what are the effects of blinding reviewers to the identity of the authors of a manuscript and of unmasking (revealing the identity of) reviewers to their coreviewers?

There are several reasons for believing that blinding may be beneficial. First, blinded reviewers may provide less biased reviews.3 Second, some editors believe blinding improves the quality of reviews,1,4-9 a belief supported by one small randomized controlled trial.10 Finally, articles that appear in journals that use blinded review are more likely to be cited than those published in journals that use nonblinded review.11 Unmasking the identity of reviewers to one another has not previously been studied, although many journals already carry out the practice and believe it to result in higher-quality reviewing.

By means of a randomized trial, we set out to evaluate the effects of blinding (concealing the identity of authors from a reviewer), unmasking (revealing the identity of a reviewer to a coreviewer), and a combination of the 2 on the quality of reviews. The study also sought to establish the feasibility of successful blinding.

Method

Consecutive manuscripts in the categories of research articles, short reports, and research articles from general practice (also known as family medicine or primary care) received by BMJ and sent by editors for peer review between January and June 1997 were eligible for inclusion. Manuscripts were randomized (stratified by the 3 above-mentioned categories) into 1 of 3 groups: 2 intervention groups (masked and unmasked) and an uninformed group (Figure 1). The randomization process was undertaken by a researcher who was independent from the editorial decision-making process using a computerized minimization program with a random component. Each manuscript was sent to 2 paid clinical reviewers selected by whoever of the 11 editors was responsible for the particular manuscript. In both the masked and unmasked groups the reports of pairs of reviewers were exchanged. Reviewers in the unmasked group were asked to consent to their identity being revealed to their coreviewer. Ethical approval was granted by a university ethics committee.

Image description not available.
Randomization of manuscripts to masked and unmasked review. Each manuscript was sent to 2 reviewers, 1 blinded and 1 unblinded to author identity (except for reviewers in the uninformed arm, who all received unblinded versions).

Having randomly allocated the manuscripts, the reviewers in the masked and unmasked groups were randomized to receive either a blinded or an unblinded version of the manuscript. Blinding consisted of removing authors' details from the title page and acknowledgments. No attempt was made to remove authors' details from within the text of the manuscript, the illustrations, or the references. Blinded reviewers were asked whether they thought they knew the identity of the author(s), and if so, to detail the name(s) and/or the institution and to explain why they thought they could tell. All reviewers in the intervention groups were also asked to record how long they spent on the review and their recommendation regarding publication of the manuscript. If 1 of the 2 reviewers of a manuscript in the unmasked group withheld consent, the manuscript was transferred into a preference arm, and the reviewer's identity was kept concealed.

Since awareness of being in a study might affect the reviewers' behavior, an uninformed group was included that allowed us to test for a Hawthorne effect. Manuscripts in the uninformed group were sent to 2 reviewers who were not informed that a study was taking place. Care was taken that those who had reviewed manuscripts in the masked or unmasked group were not subsequently selected to review manuscripts in the uninformed group. At no stage were editors or authors aware of the group to which a manuscript had been allocated.

On receipt of both reviews of a manuscript, the reviews, with authors' details removed from them, were passed together with the manuscript to the responsible editor, who was asked to assess the quality of the reviews. All the documents were subsequently returned to the researcher, who passed the manuscript to a second editor randomly selected from the remaining 10 editors taking part in the study for a second, independent evaluation. The quality of the reviews was assessed using a validated review quality instrument developed from an instrument used in a previous study.10 A decision on whether to publish the article was made in the journal's usual manner. At least 10 days after the decision had been communicated to the authors, the corresponding author was asked to evaluate the 2 reviews using the review quality instrument. The authors were also asked whether they thought each reviewer had been blinded to their identity.

The review quality instrument consisted of 7 items (importance of the research question, originality, methodology, presentation, constructiveness of comments, substantiation of comments, interpretation of results), each scored on a 5-point Likert scale (1=poor, 5=excellent). A total score was based on the mean of the 7 item scores. A full version of the instrument has been reported on elsewhere.12 In addition, a global item seeking an overall assessment of the quality of the review was included. The quality of each review was based on the means of the 2 editors' scores for each item and total score and on the corresponding author's scores. This article considers only the editors' assessments of review quality. The means of 2 editors' scores were used to improve the reliability of the method. Data collection from authors is still continuing and will be reported later. Two additional outcome measures were used: the time taken to carry out the review and the editorial decision (accept, revise, reject).

It was calculated in advance that in order to detect an editorially significant difference in review quality scores of 0.4 (α=.05, β=.10, SD=1.5), 148 manuscripts would be required in each of the masked, unmasked, and uninformed groups. Recruitment of manuscripts was continued until we were certain of retaining at least 148 in each group after taking account of exclusions and losses after randomization.

Analysis used independent comparisons of outcome measures between masked and unmasked reviewers (excluding manuscripts for which 1 reviewer had withheld consent for unmasking), paired comparisons between blinded and unblinded reviewers, and independent comparisons between masked unblinded reviewers and uninformed reviewers, using t tests. Two-way analysis of variance was used to compare the 2 factors in the 4 intervention arms of the study.

Results

Recruitment and Randomization

Between January and June 1997, an estimated 570 eligible manuscripts were sent for peer review. Of these, 43 were not entered into the study, either as a result of an administrative error or because, in the case of 5 pairs of articles by the same authors, a decision was made that only the first article would be included. The 527 manuscripts (92%) included consisted of 393 research articles, 74 short reports, and 60 general practice research articles. Of these 527 manuscripts, 60 were excluded after randomization, either because it proved impossible to obtain 2 suitable reviews without causing an unacceptable delay in the editorial decision-making process or because a reviewer who was randomized to receive a blinded manuscript had the authors' identity revealed in error. The distribution of short reports, research articles, and general practice articles was similar for the exclusions and for the total sample.

The remaining 467 manuscripts were randomized to the masked group (n=149), the unmasked group (n=160), and the uninformed group (n=158). For the 160 manuscripts in the unmasked group, 10 of the 320 reviewers did not give consent to their identity being revealed. These 10 manuscripts were included in the preference arm (Figure 1). Successful follow-up was achieved for all 467 manuscripts.

In order to assess the success of randomization, we compared characteristics of the manuscripts (geographic origin) and the reviewers (mean age, residence in North America, postgraduate training in epidemiology or statistics, involved in medical research). There were no striking differences between groups. Exclusions did not introduce any bias.

Success of Blinding. Of the 309 blinded reviewers, 293 (95%) replied to the question concerning whether they could identify the authors of the manuscript. With successful blinding defined as either author not identified or author identified incorrectly, 170 reviewers (58%) were successfully blinded (Table 1). The main reasons given for being able to identify the author included self-referencing, clues contained within the text of the manuscript, and a small research field. If successful blinding is extended to include those who were only partially successful in identifying authorship (for example, named one author correctly but others incorrectly), then 196 reviewers (67%) were successfully blinded.

Table 1.—Assessment of the Success of Blinding
Table 1.—Assessment of the Success of Blinding
Image description not available.

Extent of a Hawthorne Effect. There was no evidence of any difference between masked unblinded and uninformed reviewers and therefore no detectable Hawthorne effect (Table 2).

Table 2.—Comparison of Review Quality and Review Time*
Table 2.—Comparison of Review Quality and Review Time*
Image description not available.

Effect of Blinding and Unmasking on Review Quality. The mean total quality score was 2.87. There was little or no difference in total or item scores between blinded and unblinded reviewers or between masked and unmasked reviewers (Table 2). The largest difference in mean total score was only 0.14. Although some of the differences were statistically significant (P<.05), showing that unmasking tended to produce higher-quality reviews, in absolute terms these differences were not editorially significant. Although 2-factor analysis of variance revealed a statistically significant difference between the masked and unmasked groups (P=.04), absolute differences were of no editorial significance (blinded/unblinded P =.26, interaction P =.14, overall P =.05). Analyses based only on those successfully blinded (170 reviewers) led to similar results.

Effect of Blinding and Unmasking on Editorial Decision and Review Time. No significant difference was found between the blinded and unblinded groups or between the masked and unmasked groups in the time taken for the reviewers to complete their reports (Table 2). Similarly, χ2 tests found no significant differences in the recommendations regarding publication (publish with minor amendment, publish with major amendment, reject) between the blinded and unblinded groups (P=.24, df=2), between the masked and unmasked groups (P=.65, df =2), or among the 4 intervention arms (P=.66, df=6). Analyses based only on those successfully blinded led to similar results.

Comment

Blinding and unmasking have little effect on the quality of reviews of manuscripts. Any differences that have statistical significance are too small to be of any practical significance in editorial decision making. The only previous randomized trial10 reported higher-quality reviews when reviewers were blinded. This difference may have arisen either because the previous study was based on a more specialized journal, in which reviewers and authors would be more likely to be familiar with one another's work, or because of differences in the way review quality was assessed (the psychometric properties of the instrument used in the earlier study are unknown).

Before discussing the implications of these findings, potential methodologic shortcomings need to be considered. First, can the sample of manuscripts and their reviewers be considered truly random? There is no evidence of bias at any stage, although difficulties in finding suitable reviewers in the uninformed arm during the latter stages of recruiting due to the large number of reviewers already recruited to the intervention arms could have been one reason why we failed to find a Hawthorne effect. Of eligible manuscripts, 92% were recruited, and 89% of those were successfully followed up. The distribution of the manuscripts excluded and unavailable for follow-up was similar to that of those followed up.

Second, the results concerning review quality are completely dependent on the review quality instrument. This has been validated and has good internal consistency and interrater and intrarater reliability, and we believe it to be sufficiently accurate and robust to discriminate between reviews of differing quality for the purposes of this study. Full details of its development and validation will be reported elsewhere.

Third, the success rate for blinding is within the range found in previous studies. Although we were successful with only 58% of reviewers, analyses based on those actually blinded produced similar results to analyses based on the intention to blind.

Fourth, the views of authors, which have yet to be analyzed since the data are still incomplete, may differ from the views of editors. These will be reported in a subsequent article.

There is little evidence from this study to support changing current practice by blinding or unmasking to improve the quality of reviews. Blinding or unmasking might, however, have other advantages in the peer review process, such as ensuring that the review process is seen to be fair. In view of the difference between the results of this study and previous research, it is not possible to generalize from this study to other settings, particularly the many biomedical journals that are more specialized. Further research should encompass a wide variety of different types and sizes of journals.

References
1.
Lock S. A Difficult Balance: Editorial Peer Review in Medicine . London, England: Nuffield Provincial Hospitals Trust; 1985.
2.
Kassirer JP, Campion EW. Peer review: crude and understudied, but indispensable.  JAMA.1994;272:96-97.Google Scholar
3.
Fisher M, Friedman SB, Strauss B. The effects of blinding on acceptance of research papers by peer review.  JAMA.1994;272:143-146.Google Scholar
4.
Strasburger VC. Righting medical writing.  JAMA.1985;254:1789-1790.Google Scholar
5.
Yankauer A. Peer review again.  Am J Public Health.1982;72:239-240.Google Scholar
6.
Shapiro S. The decision to publish: ethical dilemmas.  J Chronic Dis.1985;38:365-372.Google Scholar
7.
Ingelfinger FJ. Peer review in biomedical publication.  Am J Med.1974;56:686-692.Google Scholar
8.
Robin ED, Burke CM. Peer review in medical journals.  Chest.1987;91:252-255.Google Scholar
9.
Feinstein AR. Some ethical issues among editors, reviewers and readers.  J Chronic Dis.1986;39:491-493.Google Scholar
10.
McNutt RA, Evans AT, Fletcher RH, Fletcher SW. The effects of blinding on the quality of peer review.  JAMA.1990;263:1371-1376.Google Scholar
11.
Laband DN, Piette MJ. A citation analysis of the impact of blinded peer review.  JAMA.1994;272:147-149.Google Scholar
12.
Black N, van Rooyen S, Godlee F, Smith R, Evans S. What makes a good reviewer and a good review for a general medical journal?  JAMA.1998;280:231-233.Google Scholar
×