Kao LS, Aaron BC, Dellinger EP. Trials and TribulationsCurrent Challenges in Conducting Clinical Trials. Arch Surg. 2003;138(1):59-62. doi:10.1001/archsurg.138.1.59
Randomized controlled trials are the gold standard for the evaluation of new therapies and surgical procedures and as such require strict attention to study design and statistical analysis. There are, however, multiple challenges in conducting a well-designed clinical trial. This article describes the difficulties encountered at a single institution participating in a multicenter drug study and reviews the challenges involved in developing a high-quality randomized controlled study.
"First, do no harm" has served as the guiding principle in medicine for centuries, in the everyday practice of medicine and in the realm of clinical investigation. Nonetheless, the goals of medicine extend beyond merely doing no harm. The challenge of balancing the potential benefits of a new drug against the risks of a poor outcome, either from adverse reactions or from ineffectiveness of the drug, is central to the design of clinical trials for the development of new drug therapies.
Prior to marketing and widespread use, all new drugs and therapies are submitted to a rigorous evaluation and several tiers of study. A phase 1 study is initially performed to assess the pharmacokinetics and metabolism as well as toxicities of the drug in humans. A phase II study evaluates the efficacy of the drug in a small number of patients. A phase III study compares the new drug or therapy with a standardized treatment.
While the randomized controlled trial is considered the gold standard in clinical research, few surgical studies conform to this standard. Nonetheless, clinical decisions are made every day based on the results of retrospective case series and even anecdotal experience. Despite the rigor of the clinical design or statistical analysis employed, the physician ultimately must decide on the clinical applicability of the data; statistical significance is not always equivalent to clinical relevance. Nonetheless, clear hypotheses, proper study design, careful definition of the study population, and irreproachable statistical analyses are all essential elements of a well-conducted clinical trial. Recently, investigators at the University of Washington (Seattle) participated in a multicenter phase III clinical trial evaluating Cox-2 inhibitors for pain control following laparoscopic cholecystectomy. The University of Washington experience exemplifies a few of the difficulties in conducting a clinical trial, not just in surgery but across all fields.
A total of 66 patients underwent planned laparoscopic cholecystectomy at the University of Washington during the period of enrollment between June 15, 2001, and November 30, 2001. Inclusion and exclusion criteria are presented in Table 1. Study participants were prospectively randomized to receive either standard oral narcotics (hydrocodone/acetaminophen) after laparoscopic cholecystectomy or oral narcotics in combination with a Cox-2 inhibitor. The study required participants to answer questions regarding pain severity in the recovery room for up to 4 hours postoperatively. In addition, they were asked to keep a daily pain diary for 1 week and to answer questions over the phone on postoperative days 1 and 2 and then in the clinic 1 week postoperatively. The sample size was calculated to require 100 patients per arm to have an 80% power to detect a clinically significant difference in postoperative pain, with a P value of .05. A dropout rate of 15% was estimated, therefore yielding a total sample size of 230 patients.
Of the 66 patients who underwent laparoscopic cholecystectomy during the designated time period, only 7 (11%) could be enrolled for the study. The reasons for exclusion from the study are presented in Table 2 and included medical, psychosocial, and technical factors. Standard exclusionary criteria were employed, including potentially pregnant women, children and elderly patients, patients who could not understand informed consent (ie, non-English speakers), and patients who might not consider recruitment voluntary (ie, prisoners). Patients with medical comorbidities, such as ischemic heart disease and abnormal liver function, were also excluded from participation in the study. Other patients were excluded based on weight, prior or ongoing narcotic use, active malignancy, contraindication to a laparoscopic procedure, or failure to consent. Last, patients were excluded based on potential adverse reactions to the study drugs. In a few patients, refusal to participate in the study was based on fear of the complications listed as potential adverse reactions on the informed consent, such as death or liver failure from acetaminophen, a component of a commonly prescribed, marketed combination narcotic given to many ambulatory surgery patients in the United States and to patients in both arms of this study.
Sufficient sample size is essential for achieving statistical power. The sample size is determined by the magnitude of difference in outcome between treatment arms that is considered clinically significant and the desired power and significance of the statistical results. The power refers to the likelihood of discovering a true difference of the magnitude specified. The significance, or P value, refers to the probability that such a difference could occur only by chance in the absence of a true difference between treatments. The magnitude of difference considered important, the power, and the significance, all contribute to the number of patients necessary to achieve the desired answer. In the University of Washington study, the underlying disease being studied, cholelithiasis, is a common disorder, with more than 700 000 cholecystectomies being performed in the United States per year.1 Given the prevalence of cholelithiasis, accrual of enough patients for a study on pain control after laparoscopic cholecystectomies would seem fairly straightforward. If enrollment for a study on a common problem such as symptomatic cholelithiasis fared so poorly, what difficulties can be expected for adequately studying more unusual disorders? Sadly, however, an enrollment rate of 11% for a clinical trial is similar or even superior to that of other large prospective randomized trials.
Despite the importance of the power of the study, few surgical studies report their calculations for determining sample size. In a study assessing the methods of 40 randomized controlled laparoscopic studies, including 12 studying laparoscopic cholecystectomies, only one quarter of the articles reviewed described an adequate prospective calculation of the sample size.2 The lack of adequate calculation of sample size in randomized controlled studies is not limited to the surgical literature. In a review of clinical trials published in 4 anesthesiology journals, merely 20% of trials between 1991 and 1995 reported sample size calculations.3 The percentage reporting sample size calculations improved to a still dismal 45% of trials in 2000. The lack of these calculations prohibits assessment of the power of the study and hinders interpretation if a difference between groups is not demonstrated. Failure to find a difference between treatment groups does not mean that there is no difference if the power of the study was low. It only means that a difference was not observed when the chances of seeing a true difference were low. In addition, the traditional P value of .05 is an arbitrary convention that does not define true or false. It simply refers to a probability. A study showing a difference with a P value of .10 has demonstrated a result that is 90% likely to be true and 10% likely to be false, a result that is better than the average results obtained for appendectomy performed by experienced surgical hands.
Another potential difficulty in conducting a randomized controlled study is failure to accrue the predetermined number of patients in a reasonable time frame. One method for increasing patient enrollment is to perform a multicenter study. If the above trial on Cox-2 inhibitors were performed at a single institution only, it would take 16 years to reach a goal sample size of 230 patients at this accrual rate! In a study on Cox-2 inhibitors for rheumatoid arthritis, 132 centers worldwide were necessary to accrue 655 patients.4 This averages out to 5 patients per center, a ratio not so different from the University of Washington experience. However, one must ask, if 89% of our patients could not be enrolled in a study of this very common procedure, are the results of this study relevant to most patients who require laparoscopic cholecystectomy?
Multicenter participation trades more rapid recruitment for the potential drawback of increased heterogeneity in the results, requiring complex multivariate statistical analysis. For example, international trials in cardiology, such as the PURSUIT (Platelet glycoprotein IIb/IIIa in Unstable angina: Receptor Suppression Using Integrilin [eptifibatide] Therapy)5 or the OASIS (Organisation to Assess Strategies for Ischaemic Syndromes)6 studies, have contributed significantly to the treatment of coronary artery disease. These trials accrue thousands of patients. However, these patients may differ in genetic and environmental factors that may in turn contribute to differences in cardiovascular morbidity and mortality rates. For example, in a review of international cardiology trials, subgroup analysis of the effects of geographic differences in socioeconomics revealed a strong correlation between gross national product and risk of mortality from cardiovascular disease.7 These regional differences should be taken into account in interpreting the results of the larger trials. The risk of large international studies is the failure to identify subgroup differences and the generation of results that may not be applicable across all populations.
The study population is defined by the inclusion and exclusion criteria. If the criteria are too narrow, enrollment of a sufficient number of patients in a study may not be achieved in a reasonable time frame. Therefore, the approval and institution of effective new therapies may be delayed. Results derived from a subject pool that is too homogeneous have a greater chance of demonstrating a difference based on the experimental intervention but they may be biased and applicable only in that homogeneous population. The standard exclusion of pregnant women and patients at the extremes of age or weight mean that study results may or may not apply to such patients. On the other hand, inclusion criteria that are too broad may yield a heterogeneous subject group, thus providing a challenge in terms of deriving statistically significant results.
Perhaps the role of inclusion and exclusion criteria in the success and failure of clinical investigation can best be illustrated in the critical care literature. For years, the main challenge in this field has been in the definition of sepsis or septic shock. The inability to clearly define these entities has led to difficulties in study design. An international group convened by the United Kingdom Medical Research Council reported on several issues related to clinical trials in sepsis.8 One of their main criticisms of prior studies is the failure to enroll appropriate patients, in part due to nonspecific criteria for sepsis. The lack of a single pathogen and the differences in disease severity provide difficulties in targeting a single mechanism to treat and in isolating a homogeneous patient population to study. The nondiscriminatory inclusion of patients in a study may decrease the perceived therapeutic benefit of the drug, resulting in a negative trial. On the other hand, more restrictive criteria may exclude patients who may benefit from the experimental therapy. Future directions in critical care research are focused on both redefining sepsis and developing bedside tests that can quickly enroll or exclude patients from sepsis studies.
Another factor that defines the study population is the study setting. Most physicians involved in either laboratory or clinical research tend to cluster in academic settings, such as a university. Patients at a tertiary referral center tend to have an increased number of comorbidities or an atypical form of a common disease that drives the referral. In a study from the Veterans Affairs National Surgical Quality Improvement Program, patient populations at teaching and nonteaching hospitals were compared. The patients at teaching hospitals were found to have a higher prevalence of risk factors, undergo more complex operations, and have longer operation times.9 The differences in the general surgical patients' preoperative characteristics were statistically significant in terms of functional status, steroid use, emergent status, anemia, hypoalbuminemia, and hyperbilirubinemia.9 Therefore, patient accrual in an academic setting may be more difficult because of a greater likelihood of failure to meet inclusion criteria. In addition, the results of the study may demonstrate worse outcomes and fail to demonstrate superior therapeutic benefit because of the skewed population. Thus, the results may not be generalizable to the population at large.
Another essential component of enrolling patients into a clinical trial is informed consent. While often perceived as an additional obstacle to the completion of a clinical trial, informed consent should not be trivialized, as it serves to protect patients. A complete informed consent describes the purpose of the trial, the proposed risks and benefits, and alternative procedures or treatments. Nonetheless, the informed consent may prevent patients from enrolling in studies because of perceived risks based on the extensiveness of the descriptions of potential adverse effects mandated by the institutional review boards. For example, the Cox-2 trial required the informed consent to list all of the potential adverse reactions from all of the medications, including commonly used, marketed, and approved nonstudy medications given to both treatment arms, which the patients commonly received without special consent if they did not participate in the study. One such reaction was death secondary to liver failure from acetaminophen. This perceived risk accounted for several of the refusals to participate in the study.
Obtaining informed consent for trials involving investigational drugs can be challenging, especially in the surgical setting. Investigators should be sensitized to the heightened anxiety associated with surgical procedures, and may improve enrollment by approaching patients in a calm environment. Timing may be important as well in obtaining consent. Discussing the trial with the patient in the preoperative period may increase already heightened anxiety regarding the procedure and may decrease the likelihood of participation in the study. The investigators should be sure to use simple and clear language to describe the potential risks and benefits of the study so as not to confuse the patient.
Multiple factors affect a patient's decision to enroll in a clinical trial, including altruism or the perception of improved care. Factors against a patient's decision to enroll include fear of adverse effects, preconceived bias regarding one of the treatment arms, and failure to understand the details of the trial. A study by Myles et al10 describes several factors that predicted enrollment in clinical trials in the immediate preoperative period. The authors, however, demonstrated no difference in the rates of enrollment if randomization was performed prior to or after obtaining consent. The timing of consent also did not affect enrollment rates. Age, male patient–male researcher interactions, and English-speaking at home predicted recruitment. Ultimately, the ability of the researcher to interact with the patient and address his or her concerns plays a significant role in recruiting patients.
Performing clinical trials of investigational drugs and therapies can be fraught with challenges. Nonetheless, despite the difficulties of designing and conducting clinical trials, the potential benefits to society outweigh the efforts required. High-quality clinical trials require prospective calculations of sample size, careful definition of inclusion and exclusion criteria, and good statistical analysis. Last but not least, clinical trials depend on the good-heartedness of patients and their willingness to serve as research subjects in an endeavor that may benefit future patients but is unlikely to benefit them personally.
Corresponding author: E. Patchen Dellinger, MD, Department of Surgery, University of Washington, Box 356410, 1959 NE Pacific St, Seattle, WA 98125 (e-mail: firstname.lastname@example.org).
Accepted for publication August 10, 2002.