Customize your JAMA Network experience by selecting one or more topics from the list below.
von Elm E, Poglia G, Walder B, Tramèr MR. Different Patterns of Duplicate Publication: An Analysis of Articles Used in Systematic Reviews. JAMA. 2004;291(8):974–980. doi:10.1001/jama.291.8.974
Author Affiliations: Division of Anesthesiology, Department of Anesthesiology, Pharmacology, and Surgical Intensive Care, Geneva University Hospitals, Geneva, Switzerland. Dr von Elm is now with the Department of Social and Preventive Medicine, University of Bern, Bern, Switzerland; and Mrs Poglia is now with the Department of Psychiatry, Geneva University Hospitals, Geneva, Switzerland.
Context Duplicate publication is publication of an article that overlaps substantially
with an article published elsewhere. Patterns of duplication are not well
Objective To investigate duplication patterns and propose a decision tree for
Data Sources We searched a comprehensive list of systematic reviews (1989 through
August 15, 2002) in anesthesia and analgesia that is accessible on the Internet.
We selected published full articles of duplicates that had been identified
in these systematic reviews. Abstracts, letters, or book chapters were excluded.
Study Selection and Data Extraction Authors of 56 (40%) of 141 systematic reviews acknowledged identification
of duplicates. Duplication patterns were identified independently by all investigators
comparing samples and outcomes of pairs of duplicates and main articles. Information
on cross-reference, sponsorship, authorship, and publication characteristics
was extracted from the articles.
Data Synthesis The 56 systematic reviews included 1131 main articles (129 337
subjects) and excluded 103 duplicates (12 589 subjects) that originated
from 78 main articles. Sixty articles were published twice, 13 three times,
3 four times, and 2 five times. We identified 6 duplication patterns: (1A)
identical samples and identical outcomes (21 pairs); (1B) same as 1A but several
duplicates assembled (n = 16); (2) identical samples and different outcomes
(n = 24); (3A) increasing sample and identical outcomes (n = 11); (3B) decreasing
sample and identical outcomes (n = 11); (4) different samples and different
outcomes (n = 20). The prevalence of covert duplicate articles (without a
cross-reference to the main article) was 5.3% (65/1234). Of the duplicates,
34 (33%) were sponsored by the pharmaceutical industry, and 66 (64%) had authorship
that differed partly or completely from the main article. The median journal
impact factor was 1.8 (range, 0.1-29.5) for duplicates and 2.0 (range, 0.4-29.5)
for main articles (P = .13). The median annual citation
rate was 1.7 (range, 0-27) for duplicates and 2.1 (range, 0-31) for main articles
(P = .45). The median number of authors was 4 (range,
1-14) for duplicates and 4 (range, 1-15) for corresponding main articles (P = .02). The median delay in publication between main
articles and duplicates was 1 year (range, 0-7 years).
Conclusions Duplication goes beyond simple copying. Six distinct duplication patterns
were identified after comparing study samples and outcomes of duplicates and
corresponding main articles. Authorship was an unreliable criterion. Duplicates
were published in journals with similar impact factors and were cited as frequently
as main articles.
Duplicate publication is the publication of an article that overlaps
substantially with an article published elsewhere.1 This
practice may be acceptable in particular situations. However, authors must
acknowledge the main article overtly by using a cross-reference. Covert duplicate
publication has been widely disapproved.2,3 This
practice is wasteful of the time and resources of editors, peer reviewers,
and readers, and it is misleading because undue weight is given to observations
that are being reported repeatedly. When duplicates are inadvertently included
in a systematic review, the conclusion of that systematic review may change.4 Finally, covert duplicate publication is dishonest;
it undermines the integrity of science.5
Little is known about patterns of duplicate publication. Also, characteristics
of duplicates are not well understood, and there is no common agreement on
how to classify them. We set out to investigate patterns of duplicate publication
and to propose a decision tree for their classification. We have chosen systematic
reviews as a source of information because duplicates are often identified
during the rigorous process of a systematic review.6
We used a comprehensive list of systematic reviews (1989 through August
15, 2002) in perioperative medicine (anesthesia, analgesia, and critical care)
that is regularly updated through searches in electronic databases, hand-searching
of specialty journals, and contact with experts.7 The
average methodological quality of these reviews was considered satisfactory.8
We selected all systematic reviews of anesthesia and analgesia topics
that acknowledged identification of duplicates. If there was no information
on duplicates, we contacted the authors of the reviews and asked them if there
were none. We only considered duplicates that were published as full articles.
We excluded abstracts, letters, and book chapters. We also disregarded a duplicate
when it was excluded from a review for reasons that were not related to duplication
(ie, for validity reasons). We regarded duplicates as such independent of
whether they had a cross-reference or not. If systematic reviews had overlapping
topics and included the same articles, we considered each article only once.
We did not search systematically for additional duplicates. We obtained hard
copies of all duplicates and of corresponding main articles.
We identified clusters (ie, groups of ≥2 articles) that originated
from a single study. We then designated duplicates and corresponding main
articles within each cluster. Several duplicates could originate from a main
article, and several main articles could be the origin of a duplicate. We
regarded the oldest or the largest article of a cluster as the main article,
irrespective of whether it had been considered the main article or duplicate
by the authors of the systematic reviews.
From each main article and duplicate, we extracted information on cross-reference,
sponsorship, publication characteristics, and authors. A cross-reference was
considered clear if the corresponding article was acknowledged and referenced
(for instance, in the bibliography or in a footnote) and the link between
the 2 articles was evident.1 It was considered
unclear if the corresponding article was referenced, but the relationship
between the articles was obscured. Pharmaceutical sponsorship was assumed
if (1) it was disclosed as such; (2) a pharmaceutical company provided funds
or study material; (3) the publishing journal was sponsored (for instance,
the article appeared in a journal supplement); or (4) an author was an employee
of a pharmaceutical company. All other funding was regarded as nonpharmaceutical.
Impact factors were taken from the Institute for Scientific Information Journal Citation Report,9 but
they were coded as "missing" for journal supplements. Citation numbers were
taken from the Science Citation Index Expanded10 and were converted to annual rates. We compared authors
of duplicates and main articles; depending on the degree of similarity, we
distinguished between articles as having complete, incomplete, or no matching
Using a randomly chosen subset of 25 clusters, 2 of the authors (G.P.
and B.W.) searched for suitable criteria to define the link between duplicates
and main articles. The matching of study samples and the matching of study
outcomes were the best criteria that we found. Four combinations and thus
duplication patterns were possible: (1) identical samples and identical outcomes;
(2) identical samples and different outcomes; (3) different samples and identical
outcomes; and (4) different samples and different outcomes.
All investigators independently read all main articles and duplicates,
designated clusters, assembled pairs of duplicates and main articles within
each cluster, and applied the proposed decision tree to assign each duplicate
to 1 of the 4 proposed duplication patterns. Consensus was reached by discussion.
If there was uncertainty about duplication, we asked the authors of the suspicious
articles for clarification. Data from pairs of main articles and duplicates
were compared using the Wilcoxon signed rank test; for clusters with 2 or
more duplicates, mean values of data from duplicates were taken. P<.05 indicated statistical significance. All statistical analyses
were performed using STATA statistical software (Version 8, STATA Corp, College
Of 141 systematic reviews, 42 reported spontaneously on duplicates (Figure 1). We contacted the principal authors
of the other 99 and 69 (70%) responded. Fourteen had identified duplicates
without reporting on them. Thus, authors of 56 (40%) of 141 systematic reviews
acknowledged identification of duplicates.11-66 These
reviews were published between 1989 and 2000 and covered a wide range of topics
in anesthesia and analgesia (Table 1).
Forty-six reviews (82%) considered data from randomized controlled trials
and a meta-analysis. The authors of the 56 reviews regarded 1234 articles
as potentially valid and eligible for inclusion. However, 1131 were main articles
with data on 129 337 subjects and 103 were recognized as duplicates with
data on 12 589 subjects. Thus, the prevalence of duplicates independent
of whether they had a cross-reference or not was 8.3% and duplicated data
was 8.9% (Table 1).
The 103 duplicates originated from 78 main articles; thus, there were
181 articles in 78 clusters. Data from 60 articles were published twice, 13
three times, 3 four times, and 2 five times.
Sixty-three percent of the duplicates had no cross-reference at all
(Table 2); the prevalence of covert
duplication in this cohort of systematic reviews was 5.3% (65/1234). Twelve
percent of the duplicates were translations. Of those, only 1 had a clear
cross-reference. Thirty-four (33%) declared pharmaceutical sponsorship; of
those, 16 were published in a supplement. The number of authors of main articles
was higher (median, 4; range, 1-15) than that of corresponding duplicates
(median, 4; range, 1-14) (P = .02). For 64% of pairs
of duplicates and corresponding main articles, there was no matching or only
incomplete matching of authorship. The median number of duplicated subjects
was 56 (range, 1-1044). The median year of publication of main articles was
1989 (range, 1971-1998) and duplicates was 1990 (range, 1973-1999). The median
delay in publication between duplicates and corresponding main articles was
1 year (range, 0-7 years). Two thirds of duplicates were published within
2 years before or after the corresponding main article (Figure 2). Median annual citation rates of duplicates and main articles
were similar (1.7 vs 2.1; P = .45). Median impact
factors of the publishing journals were also similiar (1.8 vs 2.0; P = .13). Most journals that published duplicates and main articles
belonged to the Journal Citation Report subject categories
of anesthesiology, critical care
medicine, general and internal
medicine, pharmacology and pharmacy, and surgery.9 The median impact
factor of theses journals was 0.9 (range, 0-29.5).
The decision tree eventually yielded 6 distinct patterns of duplication
(ie, 4 patterns according to the possibilities of combinations of similarity
of study sample and similarity of study outcomes and 2 subgroups) (Figure 3). All pairs of duplicates and main
articles could be assigned to 1 of the 6 patterns; none of the articles fell
into several categories.
Patterns showed particular characteristics (Table 2). A pattern 1A duplicate was a reproduction of an already
published article using an identical sample and outcomes. The first published
article was considered the main article. Most 1A duplicates (76%) had no cross-reference
at all to the main article. Almost one third (29%) were translations; only
1 had a cross-reference. Pattern 1B duplicates were similar to 1A duplicates.
However, 2 or more main articles were assembled to produce yet another article.
We regarded all contributing articles as main articles, independent of order
of publication. Pattern 1B duplicates had the highest proportion of pharmaceutical
sponsorship (81%)—63% were published in supplements and had pharmaceutical
sponsorship. Pattern 1B had the highest number of duplicated subjects (median,
169), and their delay in publication was shortest (median, 0.5 years). There
was the smallest number of authors (median, 1), and the highest proportion
of articles with nonmatching authorship (31%). Pattern 2 duplicates originated
from 1 study sample but reported on different outcomes. The first published
article was considered the main article. Pattern 2 duplicates had the highest
proportion of unclear cross-references (25%), a high proportion of nonpharmaceutical
sponsorship (38%), and the lowest annual citation rate (median, 1.1). Pattern
3A and 3B duplicates were about increasing or decreasing trial size. Pattern
3A consisted of expanded articles that were written when new data were added
to a preliminary article. These articles had the longest delay in publication
(median, 2 years), the highest annual citation rate (median, 3.2), and were
published in journals with the highest impact factors (median, 2.7). Pattern
3B consisted of articles that documented parts of a large trial and reported
identical outcomes. None of these duplicates had a clear cross-reference,
55% declared pharmaceutical sponsorship, 45% were translations (none of which
had a cross-reference), and the number of duplicated subjects was high (median,
105). In two 3B clusters, authors selected 2 or more (but not all) groups
of an already published randomized trial to produce a new article. For both
3A and 3B patterns, the article reporting on the largest study sample was
regarded as main article, independent of whether it was published before or
after the duplicate. In pattern 4 duplicates, both samples and outcomes were
different from the main article. Confirmation of duplication was only possible
through contact with the original authors. These duplicates had a high proportion
of nonpharmaceutical sponsorship (55%) and incomplete matching of authors
(75%). The article reporting on the largest study sample was regarded as main
We systematically analyzed a cohort of 103 duplicates and 78 corresponding
main articles and were able to identify 6 mutually exclusive patterns of duplication.
Our decision tree was based on 2 criteria: similarity of study samples and
similarity of study outcomes. Authorship was not a suitable criterion; depending
on the duplication pattern, authors of between 18% and 57% of duplicates and
main articles matched completely.
Some duplication patterns showed typical features. A pattern 1A duplicate,
for instance, corresponds to what is usually known as a copy. For pattern
1B, it may be assumed that an author who is not necessarily involved in research
or development of a drug is asked by a pharmaceutical company to assemble
some main articles on that drug for a publicity article. The duplicate is
then typically published in a sponsored supplement. Both short delay between
the publication of main articles and duplicates and changing authorship make
it difficult to identify duplication through peer review. Also, supplements
are not always peer-reviewed.67 Pattern 2 duplicates
represent the well-known fragmentation of scientific information; this may
lead to what was previously termed the least publishable
unit.68 This practice was associated
with nonpharmaceutical sponsorship and unclear or missing cross-references.
Publicly funded researchers may be driven to produce multiple articles to
justify previously received grants. Pattern 3A duplication has been described
as a meat extender.69 It
is the expanding of a preliminary article through the addition of more data
to produce the definitive article. These duplicates were published in journals
with the highest impact factors and the articles had the highest citation
rates, suggesting that they were about new and perhaps innovative treatments.
Early dissemination of preliminary data about new treatments is often warranted.
However, this does not justify the high rate of articles without a cross-reference
to the preliminary article. Pattern 3B may be seen as the opposite of pattern
3A. Typically, a multicenter trial (the main article) is fragmented and individual
parts (the duplicates) are published separately; this has been called disaggregation.5 Multicenter
trials are often multinational, large, and sponsored by pharmaceutical companies.
Indeed, 3B duplicates were often translations, included a high number of duplicated
subjects, and were frequently sponsored by the pharmaceutical industry. Pattern
4 was the most chaotic practice described herein; both study sample and outcomes
of duplicates and main articles were different despite evidence that both
articles originated from the same study. Definite confirmation was only possible
through contact with the authors. It was particularly disturbing that all
pattern 3B and 4 duplicates were described as randomized controlled trials,
although it was impossible to maintain the initial study architecture, and
thus randomization. Authors of systematic reviews may choose to exclude a
duplicate cluster when these patterns are involved.
We do not know if these cases of duplication happened deliberately,
accidentally, or by negligence. Duplicate publication may be acceptable to
foster dissemination of important scientific information, for instance, through
translation of a pivotal trial into another language. Then, however, we would
expect a cross-reference to unmistakably show the relationship between the
translation (ie, the duplicate) and the main article. There were 12 translations;
only 1 had a cross-reference to the main article. Seventeen percent of duplicates
cited the corresponding main article but left the reader unaware of the relationship
between the 2 articles. In another study, partial referencing was found in
11% of duplicates.70 Sixty-three percent of
the duplicates had no cross-reference at all to the main article; the prevalence
of covert duplication in the reviewed literature was 5%. Estimates of duplicate
publication have been reported by others who concentrated on (1) a particular
drug class6 or a single drug4;
(2) the literature in nursing71 or surgery72; or (3) 1 journal.70,73-75 To
adequately appraise the significance of our estimate, 2 issues have to be
considered. First, as in previous studies,6 our
estimate concerns covert duplication. Second, we focused on published full
articles that were considered for inclusion in systematic reviews. Most reviews
included randomized controlled trials only. Also, impact factors of the journals
that published main articles and duplicates were higher than those of all
journals in the corresponding categories of the Journal
Citation Report. This suggests that articles from mainly higher ranking
journals were included in these systematic reviews. Impact factors and citation
rates of duplicates and main articles were similar; it is tempting to believe
that duplicates are often cited erroneously by authors who believe that they
are citing a main article.4
There are several limitations to this study. First, we focused on articles
from anesthesia and analgesia; the proposed classification may not be generalizable
to other clinical areas. Second, further studies may find yet another pattern
of duplication or a combination of those described herein; additional criteria
may help to refine our classification. Third, we analyzed duplicates that
were identified in a published list of systematic reviews in perioperative
medicine7; selection bias cannot be ruled out.
However, this is unlikely because these reviews had been gathered through
systematic searches and the list had not been compiled with the purpose of
studying duplicate publication. Fourth, our estimate of covert duplicate publication
may be flawed. We may have overestimated the true prevalence because we included
systematic reviews only when duplication was acknowledged. Or, we may have
underestimated the true prevalence because many reviews did not include any
statement about duplicate articles. Even after contact with the authors of
the reviews, it was sometimes unclear if they knew about this potential pitfall.
Some had identified duplicates but they, or the editors and peer reviewers,
did not judge the information important enough to be mentioned in the article.
Also, we ignored duplicates that were excluded from the systematic reviews
for validity reasons or that were abstracts, letters, or book chapters. Fifth,
we did not quantify the impact of covert duplicate publication on systematic
reviews. It has been shown previously that removing duplicated data may change
the results of a systematic review.4 It was
not our intention to replicate this finding.
Duplicate publication in the medical literature is a reality, but it
may not necessarily be harmful. However, to produce an article that overlaps
substantially with an already published article without adequate cross-referencing
is misconduct. We have shown that duplication goes beyond simple copying.
The proposed classification distinguishes mutually exclusive patterns of duplicate
publication and may become a useful tool for those who have to deal with duplication
(ie, editors, peer reviewers, and authors) in systematic reviews. Because
systematic reviews are produced by conducting an exhaustive literature search
and critical appraisal, they are an effective way to unearth duplication.
Indeed, authors of systematic reviews frequently encounter serious difficulties
while dealing with duplicate articles;4,5,76,77 they
should be encouraged to make duplication public. A statement on duplication
could also be included in the Quality of Reporting of Meta-analyses checklist
for the reporting of systematic reviews.78 Exposure
of cases of duplicates is likely to improve awareness and lower the tolerance
of this publication malpractice.
Create a personal account or sign in to: