Context Peer review should evaluate the merit and quality of abstracts but may be biased by geographic location or institutional prestige. The effectiveness of blinded peer review at reducing bias is unknown.
Objective To evaluate the effect of blinded review on the association between abstract characteristics and likelihood of abstract acceptance at a national research meeting.
Design and Setting All abstracts submitted to the American Heart Association's annual Scientific Sessions research meeting from 2000-2004. Abstract review included the author's name and institution (open review) from 2000-2001, and this information was concealed (blinded review) from 2002-2004. Abstracts were categorized by country, primary language, institution prestige, author sex, and government and industry status.
Main Outcome Measure Likelihood of abstract acceptance during open and blinded review, by abstract characteristics.
Results The mean number of abstracts submitted each year for evaluation was 13 455 and 28.5% were accepted. During open review, 40.8% of US and 22.6% of non-US abstracts were accepted (relative risk [RR], 1.81; 95% confidence interval [CI], 1.75-1.88), whereas during blinded review, 33.4% of US and 23.7% of non-US abstracts were accepted (RR, 1.41; 95% CI, 1.37-1.45; P<.001 for comparison between peer review periods). Among non-US abstracts, during open review, 31.1% from English- speaking countries and 20.9% from non−English-speaking countries were accepted (RR, 1.49; 95% CI, 1.39-1.59), whereas during blinded review, 28.8% and 22.8% of abstracts were accepted, respectively (RR, 1.26; 95% CI, 1.19-1.34; P<.001). Among abstracts from US academic institutions, during open review, 51.3% from highly prestigious and 32.6% from nonprestigious institutions were accepted (RR, 1.57; 95% CI, 1.48-1.67), whereas during blinded review, 38.8% and 29.0% of abstracts were accepted, respectively (RR, 1.34; 95% CI, 1.26-1.41; P<.001).
Conclusions This study provides evidence of bias in the open review of abstracts, favoring authors from the United States, English-speaking countries outside the United States, and prestigious academic institutions. Moreover, blinded review at least partially reduced reviewer bias.
Peer review of research should be based solely on scientific merit and research quality. Bias occurs if a review is influenced by other criteria, such as geographic location or institutional prestige. Several studies have examined the likelihood of acceptance of openly reviewed manuscripts or brief reports to journals by various author characteristics. They found differences in acceptance between authors from prestigious and nonprestigious institutions1 and between those submitted within and from outside the United States,2-4 but no differences were found between male and female authors.3,5 However, these studies could not distinguish whether differences were because of quality or bias.
Blinded peer review, in which the author's identity and institutional affiliation are concealed from the reviewer, is commonly used to reduce reviewer bias.6,7 However, blinded review is variably used by journals and scientific meetings.8-10 The reluctance to adopt blinded review may, in part, be because little is known about the effectiveness of blinded review at reducing bias.
Our objective was to evaluate the effect of blinded review on the likelihood of abstract acceptance to the American Heart Association's Scientific Sessions, an annual meeting attended by more than 30 000 health care professionals that includes the presentation of nearly 4000 research abstracts. We hypothesized that certain characteristics would be associated with a greater likelihood of abstract acceptance during open review than during blinded review, providing evidence of both reviewer bias during open review and the effectiveness of blinded review at reducing bias.
In 2000 and 2001, abstracts submitted to the American Heart Association's Scientific Sessions were reviewed openly: the author's name and institution were included with the abstract for evaluation. However, in 2002, after membership stimulated an internal debate about the influence of reviewer bias, the review policy was changed. From 2002 through 2004, abstracts were reviewed blindly, concealing the author's name and institution. This policy change presented a unique opportunity to study the effect of blinded review. Using American Heart Association databases created each year to track abstract submissions from 2000 through 2004, all submitted electronically, we conducted a retrospective analysis of all submitted abstracts using a pre-post design. Yale University Human Investigation Committee approval was obtained prior to the study.
Each abstract submitted to the Scientific Sessions was independently evaluated by 8 to 10 reviewers. During the study period, reviewers scored abstracts from 1 to 10 (1 = poor, 10 = excellent). Reviewers were instructed to evaluate an abstract's scientific merit and research quality based on the following: organization, practicality, presentation, and technical quality. Furthermore, the reviewers were guided to score 25% of abstracts 8 or greater (“must/should accept”) and another 10% to 15% equal to 7 (“accept only if space”). There was no predetermined acceptance rate because it varies slightly from year to year, reflecting convention center size and scheduling logistics. Each reviewer evaluated 100 or more abstracts within a research category. The research categories were consistent from year to year, numbering approximately 100, and were distributed among 21 cardiology subspecialties within the basic, clinical, and population sciences. Finally, throughout the study period, reviewers were instructed to recuse themselves from evaluating abstracts recognized from their own institution or if there were other conflicts of interest.
All abstracts were categorized by several characteristics, using the first name and institution of the corresponding author. Abstracts were categorized by country as being from the United States or elsewhere. Non-US abstracts were categorized by the country's official language as English or non-English.11 Countries whose official language is English and from which abstracts were received included the following: Antigua and Barbuda, Australia, Canada, Grenada, India, Ireland, New Zealand, South Africa, Trinidad and Tobago, and the United Kingdom (England, Scotland, and Wales).
All US abstracts from academic institutions were categorized by institution prestige. For this categorization, we created a composite score based on the mean monetary value of research and training grants and contracts funded by the National Institutes of Health (NIH) for fiscal years 2000 through 200412 and the mean “heart and heart surgery” hospital rankings by US News & World Report from 2000 through 2004.13 These scores reflect investigative success and clinical reputation. An institution was given 2 points for receiving mean NIH awards exceeding $300 million or 1 point for exceeding $100 million. An institution was given 2 points for having an affiliated hospital with a mean ranking by US News & World Report in the top 10 for heart and heart surgery facilities or 1 point for being in the top 30. Each institution received a total score from 0 to 4. Based on the abstract acceptance distribution during open peer review, institutions were subsequently categorized as highly prestigious for scoring 3 or 4 points (n = 12) or moderately prestigious for scoring 1 or 2 points (n = 41), and the remainder were categorized as nonprestigious.
All US abstracts were categorized by author sex as male, female, or uncertain. These categorizations were used to attempt to capture clear assignations by sex by the predominance of reviewers. For example, the sex of “David” and “Susan” were thought to be easily assigned as male and female, respectively. In contrast, the sex of “Sydney,” “Biykem,” and “Tomoyuki” were thought not to be easily assigned and were categorized as uncertain. An assessment of the 200 most frequent author first names categorized independently by 2 investigators were found to be 95.5% in agreement; disagreements were resolved by consensus. Non-US abstracts were not categorized by sex because of the high proportion of uncertain categorizations.
The institution type for all US abstracts was categorized as academic, government agency, or industry. Abstracts were received from several US government agencies, but they were primarily from the NIH or the Centers for Disease Control and Prevention. Abstracts were received from a wide array of private corporations, predominantly from the pharmaceutical and biotechnology industries. We were unable to perform an analysis of study sponsorship because the submission process did not require information detailing receipt of public or private funding.
The main outcome measure was abstract acceptance for presentation. We categorized the peer review period as open (2000-2001) or blinded (2002- 2004). We used descriptive statistics to summarize the total number of abstracts submitted and overall proportion accepted, as well as the distribution by abstract characteristics. We then assessed the relative risk (RR) of acceptance within categories (ie, US vs non-US abstracts) during open and blinded review. Finally, we used the Breslow-Day test for homogeneity to examine if the RR of acceptance was different between open and blinded review. We performed exploratory subgroup analyses by submission category (basic, clinical, or population sciences) for all abstract characteristics except government or industry status because of small sample sizes. Data analysis was performed using SAS version 9.1 (SAS Institute Inc, Cary, NC). All statistical tests were 2-tailed. The a priori level of significance was set at P <.05.
The mean number of abstracts submitted each year for evaluation was 13 455 (range, 13 023-13 878), totaling 67 275. The total number and proportion of submitted abstracts categorized by abstract characteristics were consistent over the study period (Table 1). There were 19 198 total abstracts accepted for presentation (mean, 28.5%; range, 26.7%-30.3%), although the proportion declined slightly from 29.7% during open review to 27.8% during blinded review. The mean proportion of reviewers from the United States was 85.0% (range, 84.4%-85.7%); from English-speaking countries outside the United States it was 38.9% (range, 37.8%-40.9%; Table 2).
During open review, 40.8% of US and 22.6% of non-US abstracts were accepted. After implementation of blinded review, 33.4% and 23.7% of abstracts were accepted, respectively. Blinding significantly attenuated the association between country and likelihood of abstract acceptance, as the RR of acceptance for US compared with non-US abstracts decreased significantly from open to blinded review (RR, 1.81; 95% confidence interval [CI], 1.75-1.88 vs RR, 1.41; 95% CI, 1.37-1.45; P<.001; Table 3). In subgroup analyses by submission category, blinding significantly attenuated the association between country and likelihood of acceptance for basic science (RR, 1.83 [95% CI, 1.73-1.94] vs RR, 1.38 [95% CI, 1.31-1.45]; P<.001) and clinical science (RR, 1.80 [95% CI, 1.71-1.89] vs RR, 1.41 [95% CI, 1.35-1.47]; P<.001) abstracts, and it tended toward significant attenuation for population science abstracts (RR, 1.86 [95% CI, 1.59-2.17] vs RR, 1.54 [95% CI, 1.37-1.73]; P = .05).
Among non-US abstracts, during open review, 31.1% from English-speaking countries and 20.9% from non−English- speaking countries were accepted. After implementation of blinded review, 28.8% and 22.8% of abstracts were accepted, respectively. Blinding significantly attenuated the association between language and likelihood of abstract acceptance, as the RR of acceptance decreased significantly from open to blinded review (Table 3). In subgroup analyses by submission category, blinding significantly attenuated the association between language and likelihood of acceptance for basic science (RR, 1.42 [95% CI, 1.28-1.57] vs RR, 1.19 [95% CI, 1.09-1.30]; P = .02) and clinical science (RR, 1.58 [95% CI, 1.45-1.73] vs RR, 1.30 [95% CI, 1.20-1.41]; P = .001) abstracts, but not for population science (RR, 1.39 [95% CI, 1.07-1.80] vs RR, 1.41 [95% CI, 1.15-1.73]; P = .86) abstracts.
Effect by Institutional Prestige
Among abstracts from US academic institutions, during open review, 51.3% from highly prestigious, 42.7% from moderately prestigious, and 32.6% from nonprestigious institutions were accepted. After implementation of blinded review, 38.8%, 34.3%, and 29.0% of abstracts were accepted, respectively. Blinding significantly attenuated the association between institution prestige and likelihood of abstract acceptance, as the RR of acceptance decreased significantly from open to blinded review when comparing highly with nonprestigious institutions (P<.001), moderately with nonprestigious institutions (P = .002), and highly with moderately prestigious institutions (P = .02; Table 3). In subgroup analyses by submission category, for the comparison of highly with nonprestigious institutions, blinding significantly attenuated the association between prestige and likelihood of acceptance for basic science (RR, 1.58 [95% CI, 1.45-1.73] vs RR, 1.30 [95% CI, 1.19-1.41]; P<.001) and clinical science (RR, 1.58 [95% CI, 1.45-1.72] vs RR, 1.34 [95% CI, 1.23-1.45]; P<.001) abstracts, but not for population science (RR, 1.44 [95% CI, 1.11-1.86] vs RR, 1.52 [95% CI, 1.28-1.81]; P = .76) abstracts. Whereas for the comparison of moderately with nonprestigious institutions, the association was significantly attenuated by blinding only for clinical science abstracts (RR, 1.35 [95% CI, 1.24-1.47] vs RR, 1.18 [95% CI, 1.09-1.28]; P = .008), and not for basic (RR, 1.25 [95% CI, 1.14-1.37] vs RR, 1.16 [95% CI, 1.07-1.26]; P = .15) or population science (RR, 1.47 [95% CI, 1.20-1.81] vs RR, 1.30 [95% CI, 1.10-1.54]; P = .29) abstracts.
Among US abstracts, during open review, 41.7% from male authors and 41.6% from female authors were accepted. After implementation of blinded review, 33.1% and 33.4% of abstracts were accepted, respectively. No association was determined between sex and likelihood of abstract acceptance during either open or blinded review (Table 3).
Effect by Institution Type
Among US abstracts, during open review, 65.2% from government agencies and 41.0% not from government agencies were accepted. After implementation of blinded review, 45.5% and 33.5% of abstracts were accepted, respectively. Blinding significantly attenuated the association between government status and likelihood of abstract acceptance, as the RR of acceptance decreased significantly from open to blinded review (RR, 1.59; 95% CI, 1.42-1.79 vs RR, 1.36; 95% CI, 1.18-1.57; P = .02; Table 3). Among US abstracts, during open review, 41.0% not from private industry and 29.3% from industry were accepted. After implementation of blinded review, 33.5% and 29.1% of abstracts were accepted, respectively. Blinding significantly attenuated the association between industry status and likelihood of abstract acceptance, as the RR of acceptance decreased significantly from open to blinded review (RR, 1.40 [95% CI, 1.22-1.61] vs RR, 1.15 [95% CI, 1.02-1.29]; P = .02; Table 3).
Among abstracts submitted to the American Heart Association's Scientific Sessions, blinded peer review significantly attenuated associations between abstract acceptance and nearly all abstract characteristics. Although we were unable to assess abstract quality, variations in quality over time are unlikely to account for our results because we determined that the proportion of abstracts accepted and the proportions of abstracts submitted by country, language, institutional prestige, author sex, and government and industry status were all consistent over our short study period. In addition, the American Heart Association's policy of reviewer recruitment was not formally altered from 2000-2004, and the proportions of reviewers by country and language were consistent over this period. Hence, it is unlikely that our findings were the result of a change among reviewers. Therefore, these results provide evidence of reviewer bias in the open review of abstracts, favoring authors from the United States, English-speaking countries outside the United States, and prestigious academic institutions, and likely favoring authors from US government agencies and authors not from private industry. We found no evidence of bias of sex among US authors. In addition, we found that blinded review at least partially reduced reviewer bias.
Blinded review attenuated but did not eliminate differences in the likelihood of abstract acceptance. The associations found during blinded review may reflect true differences in the quality of research. Research quality may vary by authors at prestigious vs nonprestigious institutions or US vs non-US institutions. Quality may be associated with institutional funding for facilities and staff, better educated or trained faculty and staff, better or more widely available mentoring, an institution culture prioritizing research, or many other reasons. Thus, there may be genuine quality differences in abstract submissions, and these differences may account for the persistent differences by author characteristics.
Nevertheless, we cannot exclude the possibility of residual reviewer bias. Successful blinding may require more than removing the author's name and institution. Reviewers may have identified an author's identity or location based on the abstract's content. However, blinding abstracts is far more straightforward than blinding manuscripts, which can involve the line-by-line removal of study setting references, sample descriptors, data source, and citations. In fact, blinded reviewers of manuscripts have been found to correctly identify between 20% and 60% of authors, varying widely by journal.14,15 Moreover, abstract blinding success may be easier to achieve for larger than for smaller scientific meetings, just as it may be easier to achieve for basic or clinical science than for population science submissions. The strategy used by the American Heart Association, concealing an abstract author's name and institution from the reviewer, is remarkably simple, practical, and electronically straightforward for blinding abstract review at minimal complication and cost.
Our study focused on abstract review, which may be more susceptible to bias than manuscript review. First, abstract review relies on brief summaries of scientific work, likely making assessments more variable. Second, abstract review requires reviewers to evaluate many submissions, possibly leading to time constraints that make a reviewer more likely to use criteria other than scientific merit and research quality in an evaluation. Finally, abstract reviewers are responsible for a broad category of submissions, rather than a specific submission topic, so reviewers may have less expertise in the subject. It is unclear if our results can be generalized to manuscript review. However, even if manuscript review is not as susceptible to reviewer bias and is more difficult to blind effectively, future research should evaluate the effect of blinded peer review on manuscript reviewer bias.
Other limitations need to be considered. Our study evaluated one scientific meeting. However, our findings are unlikely to be exceptional, and the scientific community should address potential bias at research meetings. In addition, we used a nonvalidated measure to assess institution prestige. Our purpose was to create a simple assessment with face validity. Our categorization approach suggests appropriateness, as we found increasing abstract acceptance for each successive level of prestige, regardless of whether it was categorized by 5 levels or 3 levels (as presented), during both open and blinded review.
Our study provides evidence of reviewer bias in the open review of abstracts, favoring authors from the United States, from English-speaking countries outside the United States, and from prestigious academic institutions and likely favoring authors from US government agencies and not from private industry. Also, blinded review at least partially reduces bias. Our results suggest that adoption of blinded peer review by scientific research meetings is a reasonable, low-cost intervention with substantial benefit.
Corresponding Author: Harlan M. Krumholz, MD, SM, Yale University School of Medicine, 333 Cedar St, PO Box 208088, New Haven, CT 06520-8088 (harlan.krumholz@yale.edu).
Author Contributions: Drs Ross and Krumholz had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Ross, Gross, Grant, Daniels, Krumholz.
Acquisition of data: Ross, Hong, Gibbons, Krumholz.
Analysis and interpretation of data: Ross, Gross, Desai, Daniels, Hachinski, Gibbons, Gardner, Krumholz.
Drafting of the manuscript: Ross, Krumholz.
Critical revision of the manuscript for important intellectual content: Ross, Gross, Desai, Hong, Grant, Daniels, Hachinski, Gibbons, Gardner, Krumholz.
Statistical analysis: Ross, Gross, Desai, Krumholz.
Administrative, technical, or material support: Ross, Hong, Krumholz.
Study supervision: Krumholz.
Financial Disclosures: None reported.
Funding/Support: The American Heart Association was involved in the collection of the data and approved the manuscript but provided no financial or material support for the work. This project was not directly supported by any external grants or funds. Dr Ross is a scholar in the Robert Wood Johnson Clinical Scholars Program at Yale University sponsored by the Robert Wood Johnson Foundation.
Role of the Sponsor: Neither the American Heart Association nor the Robert Wood Johnson Foundation had any role in the design and conduct of the study; management, analysis, or interpretation of the data; or preparation or review of the manuscript.
Previous Presentation: Presented at the International Congress on Peer Review and Biomedical Publication; September 16, 2005; Chicago, Ill.
1.Garfunkel JM, Ulshen MH, Hamrick HJ, Lawson EE. Effect of institutional prestige on reviewers' recommendations and editorial decisions.
JAMA. 1994;272:137-1388015125
Google ScholarCrossref 3.Olson CM, Rennie D, Cook D.
et al. Publication bias in editorial decision making.
JAMA. 2002;287:2825-282812038924
Google ScholarCrossref 4.Kliewer MA, DeLong DM, Freed K, Jenkins CB, Paulson EK, Provenzale JM. Peer review at the
American Journal of Roentgenology: how reviewer and manuscript characteristics affected editorial decisions on 196 major papers.
AJR Am J Roentgenol. 2004;183:1545-155015547189
Google ScholarCrossref 5.Gilbert JR, Williams ES, Lundberg GD. Is there gender bias in JAMA's peer review process?
JAMA. 1994;272:139-1428015126
Google ScholarCrossref 6.Godlee F, Jefferson T. Peer Review in Health Sciences. 2nd ed. London, England: BMJ Publishing Group; 2003:101-102
7.Kassirer JP, Campion EW. Peer review: crude and understudied, but indispensable.
JAMA. 1994;272:96-978015140
Google ScholarCrossref 8.Hojat M, Gonnella JS, Caelleigh AS. Impartial judgment by the “gatekeepers” of science: fallibility and accountability in the peer review process.
Adv Health Sci Educ Theory Pract. 2003;8:75-96
Google ScholarCrossref 9.Bachand RG, Sawallis PP. Accuracy in the identification of scholar and peer-reviewed journals and the peer-review process across disciplines.
Ser Libr. 2003;45:39-59
Google ScholarCrossref 10.Eldredge J. Characteristics of peer reviewed clinical journals.
Med Ref Serv Q. 1999;18:13-2610557841
Google ScholarCrossref 11. TIME 2005 Almanac. Needham, Mass: Pearson Education Inc; 2004
14.Cho MK, Justice AC, Winker MA.
et al. PEER Investigators. Masking author identity in peer review: what factors influence masking success?
JAMA. 1998;280:243-2459676669
Google ScholarCrossref 15.Fisher M, Friedman S, Strauss B. The effects of blinding on acceptance of research papers by peer review.
JAMA. 1994;272:143-1468015127
Google ScholarCrossref