Korenman SG, Berk R, Wenger NS, Lew V. Evaluation of the Research Norms of Scientists and Administrators Responsible for Academic Research Integrity. JAMA. 1998;279(1):41-47. doi:10.1001/jama.279.1.41
From the Departments of Medicine (Drs Korenman and Wenger), Sociology (Dr Berk), and Statistics (Drs Berk and Lew), University of California, Los Angeles. Dr Lew is now affiliated with the UCLA School of Public Policy.
Context.— The professional integrity of scientists is important to society as
a whole and particularly to disciplines such as medicine that depend heavily
on scientific advances for their progress.
Objective.— To characterize the professional norms of active scientists and compare
them with those of individuals with institutional responsibility for the conduct
Design.— A mailed survey consisting of 12 scenarios in 4 domains of research
ethics. Respondents were asked whether an act was unethical and, if so, the
degree to which they considered it unethical and to select responses and punishments
for the act.
Participants.— A total of 924 National Science Foundation research grantees in 1993
or 1994 in molecular or cellular biology and 140 representatives from the
researchers' institutions to the US Department of Health and Human Services
Office of Research Integrity.
Main Outcome Measures.— Percentage of respondents considering an act unethical and the mean
malfeasance rating on a scale of 1 to 10.
Results.— A total of 606 research grantees and 91 institutional representatives
responded to the survey (response rate of 69% of those who could be contacted).
Respondents reported a hierarchy of unethical research behaviors. The mean
malfeasance rating was unrelated to the characteristics of the investigator
performing the hypothetical act or to its consequences. Fabrication, falsification,
and plagiarism received malfeasance ratings higher than 8.6, and virtually
all thought they were unethical. Deliberately misleading statements about
a paper or failure to give proper attribution received ratings between 7 and
8. Sloppiness, oversights, conflicts of interest, and failure to share were
less serious still, receiving malfeasance ratings between 5 and 6. Institutional
representatives proposed more and different interventions and punishments
than the scientists.
Conclusions.— Surveyed scientists and institutional representatives had strong and
similar norms of professional behavior, but differed in their approaches to
an unethical act.
CONFIDENCE IN scientific progress provides the basis for the public
support of research. In medicine we depend on research to understand human
biology, develop diagnostic and therapeutic techniques, generate rational
health care policies, and make intelligent personal health decisions.
The conduct of science depends on the intellectual integrity of individual
scientists,1 so public support of science can
be eroded by the negative publicity surrounding allegations of misconduct.
Of late the integrity of the scientific enterprise has been subjected to much
scrutiny.2- 4 Stimulated
in part by federal rules5- 7
responding to these concerns, institutions have established policies regarding
the ethical conduct of research.8- 10
Usually, these guidelines have been designed by administrators, senior scientists,
or both.5- 10
Yet, how well these rules capture shared understandings of proper scientific
conduct is an empirical question. Indeed, the policies may reflect the needs
of research institutions more accurately than those of the scientific community.
Moreover, in many cases, the policies lack an explicit theoretical or empirical
foundation in the professional norms of scientists.11
For sociologists, norms have 2 critical elements.12
First, norms articulate obligatory actions and are, therefore, not opinions
or attitudes. Second, norms are shared by members of a particular group. These
2 elements applied to scientific research norms can be examined by allowing
members of the relevant groups to evaluate descriptions of research practices
within the framework of a randomized experiment. Such approaches are useful
when the information desired derives from complex multidimensional judgments.13 Examples include research on perceptions of what constitutes
sexual assault14 and of appropriate punishment
for criminal behavior.15 It is important to
stress, however, that research of this kind can only consider whether certain
actions are perceived to be obligatory. Then how widely the perceptions are
shared becomes a statistical issue.
This study was designed to assess the ethical beliefs surrounding research
practice of a selected group of scientists and their institutional representatives
(IRs), how they feel they should respond to unethical behavior, and the types
of punishments they would consider to be appropriate.
Limited to National Science Foundation awardees, we defined the scientist
population as those receiving funding from the Division of Molecular and Cellular
Biology of the Biology Directorate of the National Science Foundation during
the years 1993 or 1994, a total of 924 investigators.
The population of IRs was derived from the 1994 US Department of Health
and Human Services Office of Research Integrity list of representatives from
entities conducting federally supported research. That list of 517 names was
pared down to the 140 officials from the scientists' institutions. A survey
of both complete populations was conducted.
The content of the survey instrument was based on our experiences in
teaching research ethics, the literature, and the findings of 3 focus groups
devoted to identifying the norms of scientists and IRs.16
The focus groups consisted of 2 groups of scientists and 1 group of IRs, similar
to but not included in the study populations. The focus groups discussed professional
norms, ethical violations and their harms, factors contributing to violations,
and ways to improve scientific conduct. Perceptions of appropriate punishments
were also explored. Based on the focus group findings, the range of unethical
behaviors, responses to the behaviors, and possible punishments were identified.
Each respondent received a questionnaire containing 12 scenario cases
describing research practices. The practices reflected 4 domains of professional
behavior: (1) performance and reporting of research, (2) appropriation of
ideas of others, (3) conflicts of interest or commitment, and (4) collegiality
and sharing. Each scenario was constructed within a fractional factorial design13 in which the scenario consisted of sentences derived
from randomly assigned phrases, each consisting of 1 level from a dimension.
The dimensions represented factors that theoretically might affect a respondent's
reaction to a scenario, while a level was a particular manifestation of a
dimension. Thus, if the dimension was the status of the investigator, laboratory
chief and assistant professor might be levels. Behavior of the scientist was
the core dimension, and acts were selected for each of the 4 domains to encompass
the range of values (ethical to maximally unethical) (Table 1). The other dimensions contained in each scenario were sex;
status of the scientist (tenured, prestigious head of laboratory, tenured
senior researcher, untenured junior researcher); the immediate harm that resulted
from the behavior; the larger consequences of the act; and whether this was
a first offense (first time, prior offense, no mention). Since there were
several dimensions and levels within each domain, there were a total of 8364
possible scenarios of which each respondent received a random sample of 12.
The result was a design of sufficient power for estimation of main effects
and 2-way interactions, with guarantees that, on the average, all main effects
are independent. This report focuses on the main effects.
For each scenario, respondents were asked whether they considered the
act unethical. If so, they estimated the severity of the unethical behavior
on a scale from 1 to 10. They then selected which, if any, actions they would
take in response to the behavior. Finally, if they considered the act unethical,
they were asked to indicate whether they thought punishment was warranted
and, if so, to select any of the options provided. A sample scenario is given
in Figure 1.
Respondents were also asked about their demographic characteristics,
academic position, and research experience. The instrument was pretested with
a group of 40 scientists and trainees at 1 institution. Responses were kept
completely confidential. An institutional review board approval waiver was
obtained for the study.
The scenario instrument was sent by priority mail with a postage-paid
reply envelope to the cohorts of scientists and IRs. A cover letter stressed
the importance of the research and the confidentiality of the responses. Follow-up
contacts were made by telephone and mail. Among the 924 scientists, 49 were
not available. This left 875 as the target population. Of the 140 IRs, 8 could
not be contacted, leaving 132 as the target population. If an IR had been
replaced, the replacement was targeted for the survey. Sixty-nine percent
of both the scientists (606) and the IRs (91) completed the survey. Nonresponders
were recontacted, and 63% of them provided their age, academic rank, and sex.
The responses were collected and analyzed statistically as previously
described.17- 19 Each
factor was broken out from the scenarios and related to each act to estimate
the specific role of each factor in the degree to which an act was considered
unethical. Given the design, simple means and proportions could be used to
obtain unbiased estimates of the impact of the various scenario dimensions
and levels on respondents' judgments about malfeasance, allowing us to compare,
for example, the mean malfeasance rating from fabrication vs an honest mistake
or the role of sex in determining the malfeasance rating. The fact that each
respondent evaluated multiple scenarios did not reduce the statistical power
by a meaningful amount.20
The scientists had a mean age of 40.5 years, and 436 (72%) were male.
One hundred nine (18%) were assistant professors, 152 (25%) associate professors,
230 (38%) professors, and 115 (19%) other (chair, administrator, dean). Ninety-nine
percent of the group had PhD degrees. The IRs had a mean age of 52.0 years;
69 (76%) were male, and 75 (82%) held PhD degrees. One hundred ninety-four
(32%) of the scientists and 86 (94%) of the IRs had institutional responsibility
for the performance of science, such as animal protection or radiation safety
committee service, and 97 scientists (16%) and 80 IRs (88%) had specific responsibility
for the ethical conduct of research, such as misconduct investigations or
policy development. Three hundred fifteen scientists (52%) described their
principal research activities as cellular or molecular biology, 212 (35%)
structural biology/biochemistry, 12 (2%) chemistry, and 67 (11%) microbiology/immunology.
One hundred seventy scientists (28%) and 12 IRs (13%) held patents or were
responsible for patents held by their institutions, and 84 (12%) of the entire
group received personal income from commercial sources.
Comparing the age, sex, and academic rank of responders and nonresponders,
there were no significant differences in the IRs. Nonresponder scientists
did not differ in rank or sex from the responders, but they were somewhat
older (P=.009). But, as we will see later, since
age and rank of the respondent are unrelated to the outcomes measured, we
are hopeful that this difference had no material effect.
Table 1 summarizes the main
results. The malfeasance rating was defined as the response to question 2
in Figure 1. The results are reported
by domain of behavior in descending order of scientists' malfeasance ratings.
We describe below the major results in each domain of scientific behavior.
The hyphenated numbers in parentheses indicate the specific scenario acts.
Fabrication (1-1) and falsification (1-2) were condemned almost universally
and gave malfeasance ratings near 10 with low SDs. Somewhat fewer respondents
considered misleading behaviors (1-3, 1-4) to be unethical and gave them malfeasance
ratings of 6 to 7. Selectivity (1-5) and sloppiness (1-6) were considered
somewhat unethical by two thirds of the respondents, and their malfeasance
ratings were about 5. An honest mistake (1-7), which we thought would be considered
ethical, was considered unethical by one third of respondents who gave mean
malfeasance ratings of 4.7 by scientists and 4.8 by the IRs. Respondent comments
to this question revealed that some thought that the investigator might have
known that the results were in error when reporting them and gave them a very
high malfeasance rating because then the investigator was believed to be lying.
Most respondents found behaviors that failed to give proper attribution
to the work or ideas of others unethical. Deliberate plagiarism was universally
condemned, whether it was derived from text (2-1) or a research proposal (2-2),
with mean malfeasance ratings of 8.2 or higher. Respondents were critical
of the use of material from a research proposal (2-4) no matter what the cause
of the failure of attribution, but in general, deliberate appropriation of
the ideas of others without attribution was found to be much more unethical
than an accidental failure to cite. One half of the respondents thought that
failure to cite one's own work was unethical.
Conflicts of interest were condemned most strongly when there was failure
to disclose a financial interest (3-3) or mandatory involvement of a trainee
(3-1). We probed attitudes toward conflicts of commitment by varying the number
of days per week dedicated to consulting while drawing a full academic salary
(3-2). If the number of days was unstated, more than half of the IRs and scientists
considered the behavior unethical. Specifying the number of days of consulting
demonstrated that 3 days a week of consulting was considered unacceptable
academic behavior especially to the IRs (Table 2).
A significantly higher percentage of scientists than IRs considered
the acts in this domain to be unethical (P=.01).
Failure to share at all (4-1), sharing only long after publication (4-2),
and sharing only in return for authorship (4-3) were similarly disapproved
of. Sharing only materials that were in plentiful supply was considered unethical
by a minority of respondents, and they gave low malfeasance ratings to the
The scenarios contained dimensions, including the scientist's sex and
academic seniority, to determine whether the malfeasance rating of an act
was influenced by who performed it. It was not. There were also dimensions
describing immediate and long-term harmful consequences of the behavior to
determine whether the malfeasance rating was influenced by the consequences
of the act. For example, if an act stimulated a grant proposal or adversely
affected institutional reputation, the integrity of science itself, or collegiality
among investigators, we postulated that it would be viewed more seriously
than if there were a less consequential outcome. But adverse consequences
had no effect on the malfeasance rating.
The only dimension that influenced the mean malfeasance rating was whether
the investigator was a repeat offender (Table 3). A repeat offender received significantly higher malfeasance
ratings by both scientists (P=.001) and IRs (P=.04) compared with first-time offenders.
Respondent sex, age, academic rank, and scientific field were not associated
with a meaningful difference in malfeasance ratings. Other respondent factors
failing to influence the malfeasance ratings included responsibility for research
conduct, patents, commercial income, or whether the individual had been personally
affected by scientific misconduct. As noted above, a smaller percentage of
IRs than scientists thought acts in the sharing domain were unethical.
Respondents considering an act unethical would by and large communicate
that information, whether it be to the individuals themselves or to colleagues,
superiors, deans, journal editors, or funding agencies, depending on the infraction.
In general, the higher the malfeasance rating, the more such responses were
given. Few scientists or IRs felt it would be right to keep the information
to themselves, communicate with the scholarly or general media, or notify
a professional society. Figure 2
compares the percentage of scientists vs IRs indicating each of the most common
responses to the 33 behaviors (Table 1).
If the same percentage of scientists and IRs favored a response, then the
symbol would lie on the solid line, while if the IRs proposed responses more
frequently, a greater proportion of the symbols would be above the line. As
can be seen, the IRs proposed almost twice as many responses as the scientists.
The scientists and IRs also preferred different responses. While both groups
felt about equally strongly about communicating with the researcher, the scientists
were much more likely to inform colleagues, while the IRs were much more likely
to inform supervisors and deans. Neither group was eager to communicate with
funding agencies or journal editors.
Of those considering an act unethical, most proposed punishments. The
IRs were more likely than scientists to propose punishments at each level
of malfeasance rating. The distribution of punishments is shown in Figure 3, which compares the percentage of
scientists and IRs proposing the 8 most commonly suggested sanctions for the
33 acts described in Table 1. The
punishments preferred by the respondents were different. The IRs more often
proposed a warning from a supervisor and a notation in the personal file than
the scientists. Scientists proposed to punish with a forced retraction and
a notice in a journal more frequently than IRs. Both groups commonly proposed
requiring an ethics course.
There are more than 1 million active scientists in the United States
whose activities contribute greatly to our prosperity and on whose integrity
we rely. Yet, to our knowledge, there have been no previous attempts to formally
delineate their norms of research behavior. Rather, studies have focused on
reports of unethical behaviors. In 1992 the American Academy for the Advancement
of Science reported the results of an opinion poll completed by 31% of 1500
randomly selected members.3 They showed that
the scientific community felt that it should be self-regulating, with laboratory
directors and bench scientists playing the key roles in ethics education and
in detecting and reporting misconduct. The study revealed that the overwhelming
majority of instances of suspected misconduct did not result in an outcome
reflecting an admission or demonstration of actual misconduct.
Twenty-seven percent (549/2010) of graduate and postdoctoral students
at 1 institution responded to a survey of trainees' perceptions of research
ethics.21 One hundred twenty nine (23%) had
received no training in research ethics, 195 (36%) had observed some kind
of scientific misconduct, and 83 (15%) would be willing to "select, omit or
fabricate data to win a grant or publish a paper." The authors concluded that
it was essential to improve students' knowledge of and attitudes toward ethical
research behavior. Both of these studies were hampered by low response rates.
A large study by Anderson and Louis22
on the subscription of graduate students to a list of scientific norms and
"counter norms" concluded that there are differences between fields of research,
that pressures on the students affect their perceptions of right and wrong,
and that international graduate students have less adherence to the classical
norms of science than do US-trained graduate students.22
They suggested that we need to teach classical scientific norms to our trainees
or accept an alteration in expected behavior.
In contrast to the above, our study was designed to evaluate the professional
norms of scientists and compare them with those of IRs. Scenarios of scientific
practice were used to elicit views on professional malfeasance. Respondents
were asked to respond in a framework in which personal implications of their
responses were deliberately omitted. The study also attempted to examine the
views of scientists and IRs about responses to an act and punishments for
malfeasance. These views were taken to be indicators of scientific norms.
This approach revealed a hierarchy of unethical acts by displaying a
range of percentages of respondents considering an act unethical superimposed
on a range of malfeasance ratings. The IRs and scientists gave indistinguishable
malfeasance ratings, suggesting that there are indeed professional norms of
scientists and that standards are high. The fact that the 2 groups of respondents
were selected by different criteria and that the IRs came from a broad range
of disciplines strengthens the perception that these results relate to the
underlying professional norms of scientists.
It was reassuring that the characteristics of the respondents, the characteristics
of the investigators in the scenarios, and the adverse consequences of the
behavior had no measurable influence on the malfeasance ratings. This lack
of biases supports the expectation that professional norms refer to the science
not to the scientists. On the other hand, if the investigator was known to
be a repeat offender (Table 3),
higher malfeasance ratings were given.
Deliberate deviations from honesty were awarded the highest malfeasance
ratings. Of primary concern seemed to be acts that would undermine the binding
norms of the scientific enterprise. This would seem to support the premise
that when trust is compromised, so is science.
On the other hand, inadvertent errors, most conflicts of interest, and
failure to share were lesser violations than deliberate dishonesty. Exploiting
a graduate student and conflicts of commitment were the most serious conflicts
of interest. Severely limited sharing was given malfeasance ratings in the
5 range, but a majority of the IRs did not think some of those scenario acts
were unethical. "Communality," implying sharing, was one of the principles
Merton23 used to describe scientific norms,
and some of the premier journals require sharing. However, a number of respondents
commented that sharing in contemporary science can be extremely expensive,
reduce a laboratory's competitiveness, and actually delay scientific progress.
These results are consistent with the wide range of views regarding sharing
expressed recently in Science24
and raise the question as to whether sharing of resources remains a viable
norm of science at this time.
Although they had essentially identical beliefs about ethical behaviors,
IRs diverged substantially from the scientists in proposing more as well as
different responses and punishments (Figure
2 and Figure 3). The differences
between the IRs and scientists may be attributable to the different kinds
of worlds in which they compete. We believe from our focus group studies16 and personal experience that university administrators
view themselves as temporary guardians of their institutions regardless of
their professional backgrounds, and their main role is to protect, preserve,
and enhance the institution. Since research institutions compete in an arena
where "ownership of science" is the paramount indicator of success and "reputation"
is the primary medium of exchange, reputations are easily damaged by scientific
misconduct. Thus, sanctions aimed at punishment and deterrence make sense
to IRs. Damage to the social fabric of science may be a lesser consideration.
Scientists, on the other hand, prefer social constraints and peer pressure
to handle misbehavior, including communication with investigators and colleagues
and mandated exposure of incorrect results. For scientists, integrity of the
scientific community is essential, and that requires trust and cooperation.
In this light, it should not be surprising that sometimes there are tensions
between scientists and institutional administrators. And it should not be
surprising if such tensions lead to complaints by scientists about the integrity
of their institutions.16
This study may also illuminate the problems surrounding the new definition
of scientific misconduct proposed by the Commission on Research Integrity.25 Consistent with the views of IRs, broad definitions
and the use of legalistic approaches to allegations of misconduct were proposed
by the commission. They were met with suspicion by practicing scientists26- 28 who, as noted above,
prefer a collegial approach, especially to lesser degrees of malfeasance.
Perhaps this represents a form of dissonance based on the different arenas
in which government, research institutions, and scientists find themselves,
and the different constituencies they must satisfy. If further investigation
demonstrates generalizability of the results of this study, perhaps models
could be developed whereby the most serious types of misconduct would be subject
to sanctions, and lesser offenses would be handled by the scientific community
or their institutions, as suggested by Guenin11
and to a degree in the commission's report.25
But no matter what sorts of interventions one might suggest, it is critical
not to lose sight of the fact that any intervention on behalf of scientific
integrity must not undermine the social structure of science. Better still,
these interventions should reinforce it.
What might be done to improve the ethics of scientific practice? The
value of ethics education was underscored (Figure 3). Furthermore, reinforcing the standards of scientists
by specific institutional actions to inhibit "survivalist" behavior (eg, in
promotion policies) might contribute to maintenance of the high professional
norms of scientists.
This study has several limitations. First, the study was limited to
a relatively homogeneous, funded group of basic scientists. That implies that
generalizations to other scientists and scientific fields would be risky.
However, the close agreement between the malfeasance ratings of the scientists
and IRs suggests that at least some of the shared understandings are generalizable,
because the IRs are not likely to have been in the same fields as the scientists.
Second, some of the comparisons in which the absence of a relationship
was found might be subject to a type 2 error in that the numbers were insufficient
to identify small difference, but we had plenty of power to identify the large
differences which were what interested us.
Third, a small percentage of the scenarios were ambiguous, as might
be expected from computer-generated phrase combinations of this kind. For
example, the honest but serious mistake (1-7) and unintended failure of attribution
to one's own work (2-12) were designed to be the most ethical extremes of
domains 1 and 2. We know that in 1-7 some investigators were concerned that
the investigator was reporting the honest mistake as true even after knowing
that it was a mistake. Others may consider 2-12 to have indicated self-plagiarism.
However, ambiguity is as typical of real-life behaviors as well as scenarios.
In fact, by limiting the scenarios to the essentials, we may have reduced
the uncertainty surrounding the acts.
Fourth, since the nonresponders were similar to the responders in age,
rank, and sex and we achieved response rates of 69%, we remain hopeful that
nonresponse may not have contributed significant bias to the study.