Campbell EG, Clarridge BR, Gokhale M, Birenbaum L, Hilgartner S, Holtzman NA, Blumenthal D. Data Withholding in Academic GeneticsEvidence From a National Survey. JAMA. 2002;287(4):473-480. doi:10.1001/jama.287.4.473
Author Affiliations: Institute for Health Policy, Massachusetts General Hospital, Boston (Drs Campbell and Blumenthal and Mss Gokhale and Birenbaum); Department of Medicine, Harvard Medical School (Drs Campbell and Blumenthal); Center for Survey Research, University of Massachusetts, Boston (Dr Clarridge); Health Policy and Management Department, Epidemiology Department, and Institute for Genetic Medicine, Johns Hopkins University, Baltimore, Md (Dr Holtzman); and Department of Science and Technology Studies, Cornell University, Ithaca, NY (Dr Hilgartner).
Context The free and open sharing of information, data, and materials regarding
published research is vital to the replication of published results, the efficient
advancement of science, and the education of students. Yet in daily practice,
the ideal of free sharing is often breached.
Objective To understand the nature, extent, and consequences of data withholding
in academic genetics.
Design, Setting, and Participants Mailed survey (March-July 2000) of geneticists and other life scientists
in the 100 US universities that received the most funding from the National
Institutes of Health in 1998. Of a potential 3000 respondents, 2893 were eligible
and 1849 responded, yielding an overall response rate of 64%. We analyzed
a subsample of 1240 self-identified geneticists and made a limited number
of comparisons with 600 self-identified nongeneticists.
Main Outcome Measures Percentage of faculty who made requests for data that were denied; percentage
of respondents who denied requests; influences on and consequences of withholding
data; and changes over time in perceived willingness to share data.
Results Forty-seven percent of geneticists who asked other faculty for additional
information, data, or materials regarding published research reported that
at least 1 of their requests had been denied in the preceding 3 years. Ten
percent of all postpublication requests for additional information were denied.
Because they were denied access to data, 28% of geneticists reported that
they had been unable to confirm published research. Twelve percent said that
in the previous 3 years, they had denied another academician's request for
data concerning published results. Among geneticists who said they had intentionally
withheld data regarding their published work, 80% reported that it required
too much effort to produce the materials or information; 64%, that they were
protecting the ability of a graduate student, postdoctoral fellow, or junior
faculty member to publish; and 53%, that they were protecting their own ability
to publish. Thirty-five percent of geneticists said that sharing had decreased
during the last decade; 14%, that sharing had increased. Geneticists were
as likely as other life scientists to deny others' requests (odds ratio [OR],
1.39; 95% confidence interval [CI], 0.81-2.40) and to have their own requests
denied (OR, 0.97; 95% CI, 0.69-1.40). However, other life scientists were
less likely to report that withholding had a negative impact on their own
research as well as their field of research.
Conclusions Data withholding occurs in academic genetics and it affects essential
scientific activities such as the ability to confirm published results. Lack
of resources and issues of scientific priority may play an important role
in scientists' decisions to withhold data, materials, and information from
other academic geneticists.
Without the free exchange of published scientific information and resources,
researchers may unknowingly build on something less than the total accumulation
of scientific knowledge or work on problems already solved.1
However, a number of instances of data withholding (defining data to include
the full range of research results, techniques, and materials useful in future
investigations and withholding as the failure to share such published data)
have been reported.2- 7
A 1994-1995 survey of academic life scientists found that 34% of respondents
were denied research results requested from a fellow university scientist
in the previous 3 years, and 8.9% said they had denied a request from another
university scientist for access to research results.8
Weinberg9 asserts that secrecy is more
common in genetics and particularly human genetics than in other areas. Reasons
may include the increased scientific competitiveness of the field and the
opportunities for commercial applications.10
Research has shown that scientists who reported conducting research on goals
similar to that of the Human Genome Project (HGP) were more likely to deny
requests for information, data, and materials than were other life scientists.8
Understanding the withholding of information, data, and materials may
be particularly important in genetics for a number of reasons. First, since
academic geneticists publish more articles in peer-reviewed journals, teach
more, and serve in more leadership roles in their university and discipline
than do their colleagues in other biomedical specialties, the sharing and
withholding practices of geneticists may have a disproportionate impact on
university policy, the behavior of junior faculty, and the training and socialization
of graduate students and postdoctoral fellows.11
Second, understanding the role of genetics in human disease is believed
to be important to the future of medicine.12
Clearly, the progress made in mapping and sequencing the human genome represents
a major step toward scientific breakthroughs in genetic-based diagnostics,
preventive technologies, and therapeutics. The rate of progress in realizing
these medical benefits may depend somewhat on the extent to which the results
of genetic investigations flow freely among scientists in the field.
There is scant empirical evidence regarding sharing and withholding
in academic genetics. For example, little is known about the extent to which
geneticists share and withhold information and how these behaviors have changed
over time. Nor do we know much about the reasons researchers withhold information,
data, or materials from other academicians and what impact this behavior has
on individual researchers or on the field of genetics as a whole. To address
these issues, we conducted a national study of data sharing and data withholding
in academic genetics, with a comparison group of other life sciences.
As in our previous work, the sample of 3000 life scientists was selected
in a multistep process.13,14 First,
using lists from the National Institutes of Health (NIH), we identified the
100 US educational institutions that received the most funding from the NIH
in 1998. Second, at each institution we selected all departments and programs
in genetics and human genetics and then randomly selected up to 3 additional
life science departments and programs from lists of clinical (medicine, pathology,
psychiatry, pediatrics, and surgery) and nonclinical (biochemistry, microbiology,
pharmacology, physiology, and anatomy) departments. These specific clinical
and nonclinical departments were selected for inclusion because they, among
all departments, received the largest number of NIH grants in 1998. Third,
using data from the Association of American Medical Colleges faculty roster
system, Peterson's Graduate Programs in the Biological Sciences,15 school and individual Web sites,
college bulletins, and direct contact with departments, we identified all
full-time faculty at the rank of assistant professor and higher in each selected
department and program. In addition, because of our special interest in genetics
generally and human genetics specifically, we identified all faculty members
who were principal investigators on at least 1 research grant from the HGP
administered by the National Human Genome Research Institute (NHGRI) and the
Department of Energy in the 5 years preceding the study (excluding those who
received grants only from the Ethical, Legal, and Social Implications of Human
Genetics Research program).
Finally, a stratified sample of 3000 faculty members was selected. The
sample included all 219 grantees of the HGP and all 1547 faculty members in
genetics or human genetics departments. The remainder of the sample (n = 1234)
was randomly selected so that half came from nonclinical departments (n =
617) and half from clinical departments (n = 617). To avoid including clinical
department faculty who were not actively engaged in research, we excluded
clinical faculty who had not published at least 1 article in the MEDLINE database
in the 3 years preceding the study.
The design of the survey instrument was informed by 2 focus group discussions,
20 semistructured interviews with knowledgeable biomedical researchers, discussions
with colleagues, and a review of the literature. The survey instrument was
pretested by using 9 cognitive interviews conducted by professional interviewers
at the Center for Survey Research of the University of Massachusetts in Boston.
The Center for Survey Research administered the survey by mail between
March and July 2000. Subjects were sent a letter, a fact sheet describing
the study, a survey instrument, and a postage-paid postcard. They were asked
to complete the survey and mail the postcard separately from the completed
survey to the center. This process enabled us to track nonrespondents via
the postcard while ensuring respondents' complete anonymity, since the survey
instrument had no unique identifying information. Nonrespondents were mailed
a letter encouraging their participation, mailed additional surveys, and then
contacted by telephone and encouraged to participate. Of the potential 3000
respondents, 4 were in the sample twice, 7 had died, and 96 were ineligible
because they were retired, out of the country, not located at the sampled
institution, or lacking faculty appointments. Of the remaining 2893 subjects,
1849 responded, yielding an overall response rate of 64%.
In addition, 256 nonrespondents were interviewed briefly by telephone
to determine how they differed from the respondents. Nonrespondents were significantly
more likely than respondents to be full professors and less likely to be geneticists.
They were also significantly more likely to receive a high number of requests
for information, data, and materials related to their published research than
Respondents identified themselves as geneticists by responding yes to
the following question: "Do you consider yourself a genetics researcher? By
genetics researcher we mean someone whose research involves any of the following:
(1) identification of genomes, genes, or gene products in any organism; (2)
study of the structure, function, or regulation of genes or genomes; (3) comparison
of genes and genomes between species or populations." Respondents who answered
no were considered other life scientists.
Geneticists were asked what their primary fields of genetic research
were. The response categories were behavioral; biochemical; bioinformatics;
cancer; common complex disorders (other than cancer); cytogenetics; developmental;
mapping and sequencing; mutagenesis; pharmacogenetics; population genetics,
evolution, and epidemiology; prenatal and perinatal; single gene (mendelian)
disorders; structure and function (including genotype-phenotype relations,
transgenesis); gene therapy; and other (respondents were asked to specify).
We asked geneticists what organism they worked with. The categories
were amphibians, bacteria, Drosophila, fungi, humans
(including materials of human origin), mammals (other than human), nematodes,
plants, viruses, yeast, zebra fish, and other.
The survey used multiple measures of data withholding. First we asked
geneticists how many times in the last 3 years they had asked other scientists
to provide information, data, or materials concerning published research.
We then asked those who had made such requests to estimate the number of times
their requests were denied. Respondents who indicated that at least 1 of their
requests was denied were considered to have had data withheld from them.
We also asked how many times in the last 3 years geneticists had received
requests from other academic scientists for information, data, or materials
concerning their published research. We then asked those who received such
requests to estimate the number that they denied. Respondents who reported
that they had denied another's request were considered to have engaged in
postpublication data withholding.
We asked geneticists whether, in the last 3 years, they had requested
any of the following from another academic scientist after a finding was published.
The follow-up questions asked about additional information about laboratory
techniques not included in the publication, pertinent findings that were not
included in the publication, phenotypic information not included in the publication,
genetic sequences not included in the publication, and biomaterials (probes,
cell lines, tissues, reagents, and organisms) mentioned in the publication.
For each of the follow-up questions, the response categories were "I made
no request," "I received all of what I requested on every request," and "One
or more of my requests was denied."
Respondents were asked the following: "On those occasions when you have
intentionally withheld information, data, or materials about your published
results from other academic scientists, how important was each of the following
as a motivating factor?" The follow-up questions concerned their need to protect
their ability to publish or that of a graduate student, postdoctoral fellow,
or junior faculty member; their need to honor the requirements of an industrial
research sponsor; their need to protect the commercial value of the results;
their need to preserve patient confidentiality; the effort required to actually
produce the materials or information; the financial cost of actually providing
the materials or information transfer; and the likelihood that the other person
would never reciprocate. The response categories were "very important," "moderately
important," "not very important," "not at all important," and "does not apply."
Responses were coded into dichotomous variables, with "very important" and
"moderately important" coded as 1 and "not very important" and "not at all
important" coded as 0.
The survey had 2 batteries of questions regarding the potential effects
of withholding. The first asked whether, as a result of another academic scientist's
failure to share information, data, or materials, the researcher had ever
had a publication significantly delayed, been unable to confirm others' published
research, abandoned a promising line of research, stopped collaborating with
another academic scientist, complained to a funding agency, journal, or professional
association, refused to share with that person or group, or delayed sharing
with that person or group. The response categories were yes and no.
The second battery asked researchers how data withholding among academic
scientists affected the progress of science in their field, the level of communication
in their field, the education of students and postdoctoral fellows, the progress
of their research, the quality of their relationships with other academic
scientists, and their satisfaction with their professional career. The response
categories were "no effect," "detracts somewhat," and "detracts greatly."
We asked respondents to estimate how the overall willingness of academic
scientists in their area of research to share information, data, and materials
had changed in the last decade. The response categories were "They are much
more willing to share now," "They are somewhat more willing to share now,"
"Remained the same," "They are somewhat less willing to share now," and "They
are much less willing to share now."
Several measures were used as control variables when geneticists were
compared with nongeneticists on measures of withholding. The control variables
included sex; whether the respondent trained in the United States; whether
in the last 3 years the respondent had received research grants or contracts
from companies whose work was related to their area of scientific expertise;
whether their university research had resulted in any commercial activities,
including a patent application, a patent granted, a patent licensed, a product
under regulatory review, a product on the market, or a start-up company; the
number of peer-reviewed articles they had published in the last 3 years (low,
0-5; medium, 6-15; high, ≥16); and if their research involved living humans
as research subjects.
In addition, we created variables representing the volume of requests
that respondents received (low, 1-6; high, ≥7) and made to others (low,
1-6; high, ≥7) in the last 3 years. These variables were used only when
the likelihood of engaging in data withholding was examined.
Because we were primarily interested in withholding in academic genetics,
the primary analytic group was the subsample of survey respondents who considered
themselves geneticists. The responses of these self-identified geneticists
were analyzed with standard statistical procedures to generate means for continuous
variables and percentages for categorical or nominal variables. All analyses
were weighted to adjust for differences caused by the likelihood of being
sampled and for differences in nonresponse rates within survey strata. Differences
in proportions were tested with logistic regression analyses. All analyses
were conducted by using SUDAAN (Research Triangle Institute, Research Triangle
Park, NC), a statistical package that correctly computes the SEs when determining
statistical significance for survey data derived from complex sampling methods.
When comparing geneticists with other life scientists, we used multivariate
logistic regression controlling for the effects of sex, whether the scientists
were trained in the United States, whether they used humans as research subjects,
the number of publications in the last 3 years, whether they had research
funding from industry, and whether they had engaged in commercial activities.
These variables were selected according to the results of our previous research
into the causes of data withholding in the life sciences8
and the unpublished results of the personal interviews and focus groups.
In addition, for multivariate analyses that examined the likelihood
of denying a request, we included variables representing the volume of requests
received in the last 3 years. For analyses that examined the likelihood of
having a request denied, we included a variable representing the volume of
requests made of others in the last 3 years.
Of the 1849 life scientists who responded to the survey, 1240 considered
themselves geneticists. Table 1
shows the characteristics of geneticists and nongeneticists among our respondents.
Of the geneticists, three quarters were male, about half were full professors
(49%), 27% were associate professors, and 23% were assistant professors. In
terms of professional activities, 35% had engaged in commercial activities,
31% published 5 or fewer articles in the last 3 years, and 20% published 16
or more. Seventy-seven percent trained in the United States, and 30% reported
that their research subjects included living humans.
Table 1 also shows that
geneticists and other life scientists were similar in terms of their sex and
academic rank. However, other life scientists were significantly more likely
than geneticists to have trained in the United States, less likely to have
research support from industry, and less likely to have engaged in commercial
activities. In addition, nongeneticists were significantly more likely to
conduct research involving living humans and also more likely to have a low
number of publications in the last 3 years.
Among geneticists, the most frequently mentioned research field was
structural and functional genetics (52%), followed by biochemical genetics
(45%), mapping and sequencing (28%), cancer genetics (28%), developmental
genetics (27%), mutagenesis (27%), single-gene disorders (16%), common complex
disorders (15%), population studies (11%), bioinformatics (10%), and gene
therapy (10%). The most frequently reported research organisms were nonhuman
mammals (59%), humans (58%), bacteria (46%), yeast (19%), viruses (17%), and Drosophila (8%). Data regarding the research field and
primary research organism of nongeneticists were not collected.
Eighty-four percent of geneticists had made at least 1 request in the
previous 3 years of another academic researcher for additional information,
data, or materials concerning published research. Ninety-two percent reported
receiving such a request. Among those who made a request of another academician,
47% reported that at least 1 request was denied. However, only 12% denied
a request they received from another academic researcher. Respondents estimated
that they had made an average of 8.8 requests for information, data, or materials
regarding published research in the previous 3 years, of which 10% of all
requests were denied.
For perspective on the prevalence and consequences of data withholding
in genetics, we compared the responses of geneticists with those of 600 other
life scientists who responded to the survey (9 respondents did not classify
themselves as either a geneticist or other life scientist). Statistically
controlling for the independent effects of sex, academic rank, having trained
in the United States, having industry research support, using humans as research
subjects, engaging in commercial activities, and the number of publications
in the last 3 years, we found that the odds of geneticists making a request
for information, data, and materials were significantly higher than for other
life scientists (odds ratio [OR], 4.28; 95% confidence interval [CI], 3.11-5.89)
(Table 2). However, geneticists
were no more likely than other life scientists to report that their requests
were denied (OR, 0.97; 95% CI, 0.69-1.40) after the volume of requests made
in addition to the control variables mentioned above were controlled for (Table 2).
The odds of geneticists having received a request for information, data,
or materials were significantly higher than for other life scientists (OR,
5.41; 95% CI, 3.73-7.87). However, after the volume of requests received in
addition to the control variables mentioned above were controlled for, geneticists
were no more likely than other life scientists to report denying such requests
(OR, 1.39; 95% CI, 0.81-2.40) (Table 2).
The 2 factors that were significantly associated with an increased likelihood
of denying others' requests were having received a high number of requests
in the last 3 years (OR, 1.78; 95% CI, 1.10-2.90) and having engaged in commercial
activities (OR, 1.72; 95% CI, 1.06-2.81).
Geneticists requesting biomaterials from another academician after a
publication were most likely to report having a request denied (35%) compared
with requesting sequence information (28%), pertinent findings (25%), phenotypic
information (22%), and additional information regarding laboratory techniques
not included in the publication (16%).
Figure 1 shows the reasons
that geneticists, who constituted 67% of the respondents, gave for intentionally
withholding information, data, or materials concerning their own published
research from other scientists. Investigators who reported having 1 of their
requests denied were not significantly more likely than those who had not
had a request denied to say they had themselves denied a request (13% vs 8%; P = .09).
Twenty-eight percent of all geneticists reported that they had been
unable to replicate published research as a direct result of another academic
scientist's unwillingness to share information, data, or materials. Twenty-eight
percent also reported that they had ended a collaboration as a result of withholding.
Other consequences included having a publication significantly delayed (24%),
abandoning a promising line of research (21%), delaying sharing with that
person or group (18%), and refusing to share with that person or group (13%).
Seventy-seven percent of geneticists felt that data withholding detracted
somewhat or greatly from the level of communication in science; 73%, that
data withholding slowed the rate of progress in their field of science; and
63%, that data withholding harmed the quality of their relationships with
peers. Geneticists also reported adverse effects of data withholding on their
own research (58%), the education of students and postdoctoral fellows (56%),
and their satisfaction with their careers (45%).
Other life scientists were significantly less likely than geneticists
to report that data withholding detracted from the progress of their research
(38% vs 58%, respectively; P<.001) and the overall
level of progress in their scientific field (56% vs 73%, respectively; P<.001).
Thirty-five percent of geneticists thought other academic scientists
were somewhat or much less willing to share information, data, and materials
compared with a decade ago; 51%, that willingness had remained unchanged;
14%, that it had increased.
We found that 95% of HGP-funded geneticists had received a postpublication
request for information, data, and materials compared with 92% of non–HGP-funded
geneticists (P = .20). Among those who received a
request, HGP-funded geneticists were no less likely to deny a request than
were non–HGP-funded geneticists (15% vs 12%; P
This study provides the first detailed, systematic, quantitative portrait
of the phenomenon of data withholding in genetics or any other field of academic
investigation. Our findings suggest that data withholding in genetics is not
widespread, given than 12% of geneticists reported denying requests from other
academicians for information, data, and materials. At the same time, the impact
of withholding appears to be much more widespread, given that almost half
of all geneticists who had made a request of another academic for information,
data, and materials related to published research had had that request denied.
Further, it is possible that withholding has increased in recent years because
more than one third (35%) of geneticists also believe that data withholding
is becoming more common in their field.
A critical question is whether withholding affects the daily work of
genetics investigators or the health of this field of investigation. A number
of respondents reported adverse effects on their ability to reproduce the
work of other investigators, the timeliness of their own publications, and
their ability to pursue chosen research directions. Further, large numbers
of geneticists see adverse effects on communication within their field, the
education of young scientists, and the rate of scientific progress. The adverse
effects on research progress at the individual and field level were more likely
to be reported by geneticists than by investigators who had experienced data
withholding in other life science fields.
Although genetics investigators are clearly concerned about the effects
of data withholding on their field, its significance for the well-being of
genetics in particular and the scientific enterprise generally remains to
be fully explored. The progress of the HGP and daily reports of advances in
genetics suggest the field remains vibrant. This and previous work indicating
that geneticists publish more than other investigators and are more productive
in other academic domains lends credence to this appearance of health and
dynamism.14 Data withholding may paradoxically
occur most commonly during extremely rapid progress, since scientists are
generating large numbers of new findings that stimulate much jockeying for
scientific priority. The commercial applications of genetics research, along
with increasing dependence on industry funding and the rise of commercial
norms in the academy, may be partially responsible as well for data withholding.
From a policy perspective, the question is not whether progress in genetics
continues, but whether it is as rapid as it could be if data sharing were
maximized. The NHGRI has taken a leadership position in encouraging openness
among its investigators by encouraging the rapid release and dissemination
of new sequence data by its funded investigators. A number of journals also
require as a condition of publication that data and materials be placed in
public depositories or otherwise made available to other scientists. Unfortunately,
our research suggests that these inducements not have been sufficient to prevent
the adverse effects of data withholding on genetics as a field. Our findings
also do not suggest that data withholding is less common among HGP-funded
investigators, despite the NHGRI's laudable efforts to encourage sharing.
This result makes particularly relevant our findings on the types of
data that are most commonly withheld and the reasons scientists give for doing
so. First, scientists are most likely to encounter refusals when they approach
other academic investigators for access to biomaterials. Some of these denials
likely stem from the scarcity of precious materials or from human subjects
concerns. However, it may be that material transfer agreements have become
so complex and demanding that they inhibit sharing.16
If so, our findings suggest a pressing need to clarify and expedite the process
for sharing biomaterials.
Second, some of the reasons respondents gave for refusing requests may
be remediable through several potential interventions. These reasons include
the time and effort required to comply with requests and the actual costs
of doing so. In principle, these obstacles to sharing could be reduced by
providing investigators additional resources that are explicitly devoted to
disseminating the results of their research after publication. Traditionally,
federal sponsors of biomedical research seem to have assumed that investigators'
commitment to the norms of science would ensure that they would absorb the
costs of honoring those values, including costs of data-sharing activities.
Under the circumstances of modern research, this assumption may no longer
be reasonable. Additional funds may not sufficiently address all of the reasons
for data withholding, such as protecting one's commercial and academic priority.
We found that geneticists were no more likely than nongeneticists to
deny requests for access to published information, data, and materials from
other faculty.8 This finding was unlike that
in our previous research, probably because our previous analyses did not control
for the number of requests received, which significantly affects the likelihood
of denials for those receiving a high number of requests. However, as in our
previous research, having engaged in commercialization of university-based
research was significantly associated with increased likelihood of data withholding.8
This study has several limitations inherent in survey research. Because
we relied on self-reporting, our estimate of the percentage of faculty who
withheld results from others likely constitutes a lower bound estimate of
the proportion who actually participate in this behavior, since respondents
are often reticent to admit engaging in behavior that may be perceived as
less than desirable.17,18 Further,
since the 256 nonrespondents were significantly more likely than respondents
to have received a high number of postpublication requests (results not shown)
and since receiving a high number of requests is a significant predictor of
denials (Table 2), nonrespondents
may have been more likely to deny a request than respondents, which would
have resulted in an underestimate of the percentage of geneticists who denied
others' requests. Another limitation is that our respondents may not represent
the universe of academic geneticists, since half reported that they were full
professors, suggesting that our sample may be tilted toward more senior faculty.
Similarly, since our sample included only research-intensive universities,
our results may not be applicable to faculty in institutions receiving less
extramural research support. Finally, this report examines only 1 form of
data withholding: refusal to share data associated with published research
results. Other forms of secrecy, such as significant delays in honoring requests,
delays in publication, refusals to publicly present research findings, and
not discussing research with others, may also affect the progress of science
and need to be factored into future research and policy formulation in genetics
and other fields of investigation.
Many questions about the prevalence and consequences of data withholding
in genetics and other disciplines remain unanswered. However, our findings
suggest that data withholding occurs in one of the most salient scientific
disciplines of our times, that current efforts to reduce it have not been
fully effective, and that additional measures to improve openness of communication
in genetics and the sharing of published information data and materials seem