Sample size of comparison groups and sample size differences between comparison groups (control and test groups) in randomized controlled trials.
Adetugbo K, Williams H. How Well Are Randomized Controlled Trials Reported in the Dermatology Literature?. Arch Dermatol. 2000;136(3):381–385. doi:10.1001/archderm.136.3.381
DamianoAbeniMD, MPHMichaelBigbyMDPaolaPasquiniMD, MPHMoysesSzkloMD, MPH, DrPHHywelWilliamsMD
To assess the methodological quality of the design and reporting of randomized controlled trials published in one major dermatology specialty journal.
Design and Data Sources
In a survey of all published parallel group randomized controlled trials, we found 73 reports with allocation described as randomized from all issues of Clinical and Experimental Dermatology from its inception in 1976 through 1997.
Main Outcome Measures
Direct and indirect measures of the adequacy of randomization, trial sample size, baseline comparisons, and intention-to-treat analysis.
Hand searching identified 73 randomized controlled trials, but only 31 of these were found by searching MEDLINE for the publication type clinical trials. Of the 73 randomized controlled trials, 68 contained sufficient information to include in the analysis. Only 1 study (1%) reported the method of random sequence generation, and only 5 studies (7%) reported adequate concealment of allocation. Among 38 trials that used simple randomization, the sample sizes in the comparison groups were identical in 22 occasions, raising the possibility that simple randomization might not have been adequately generated or concealed. Most trials (88%) excluded some randomized participants from their analysis. The median sample size was 23 per trial. Only 1 trial reported sample size and statistical power considerations and had an a priori main hypothesis.
Hand searching is important for locating all relevant trials. There is the need for higher methodological quality in clinical trial reporting in dermatology journals. The adoption of the CONSORT (Consolidated Standards of Reporting Trials) statement and checklist for the reporting of trials should enhance the validity of and strengthen the evidence from clinical trials reports.
WELL-DESIGNED clinical trials yield strong evidence for the effects of health care interventions. The randomized controlled trial (RCT) is the criterion standard for clinical trials and offers the most robust and rigorous way of determining the causal relationship between treatment and effect and for assessing the cost-effectiveness of a treatment.1- 7 Random assignment greatly reduces the potential for bias in the allocation of interventions. It ensures that comparison groups are as similar as possible to each other in terms of both known and unknown possible predictors of treatment response,6 and RCTs have been consistently shown to give more reliable evidence of therapeutic effectiveness than nonrandomized studies.3 Even so, there are some limitations and skepticism regarding the use and value of randomized trials.7- 10
Essential features of the RCT include (1) allocation of participants to intervention groups in a truly unpredictable, randomized sequence (including the concealment of the allocation schedule until the assignment is made); (2) blinding or masking of participants and/or observers as to what treatment an individual is receiving; (3) use of an intention-to-treat analysis, the principle of which is to include in the analysis all randomized participants in the group into which they were allocated, whether or not they received or completed the intervention, so as to preserve random allocation and thus the comparability of the study groups; and (4) use of a prospectively stated and explicitly defined trial hypothesis.11 All these features minimize bias in the selection of participants, allocation of intervention, and analysis of trial outcomes.
However, to maximize usefulness, the RCT should be accessible and assessable. Accessibility allows RCTs to be combined with similar trials in a systematic review or in a meta-analysis, often regarded as the highest level of evidence for the effects of health care.3,12 For such pooling, the quality of the design and execution of the RCT needs to be evaluated, especially as the quality of the research evidence from meta-analyses depends on the quality of the trials that form them.13 Complete reporting aids this evaluation. It has been shown that the low quality of randomization in RCTs is associated with bias and that poor evidence of randomization (such as an inadequate or unclear method of treatment allocation) significantly exaggerates estimates of treatment effects by 30% to 41%.14 Therefore, the evidence from such studies is that much less accurate. Consequently, the patients and the scientific community can derive maximum benefit from the various contributions to trials only if they are rigorously designed and executed and completely and unambiguously reported. This consideration has led to the formulation of a standard for the reporting of RCTs (Table 1), embodied in the CONSORT (Consolidated Standards of Reporting Trials) statement of 1996.15
The Cochrane Skin Group16 was recently formed as part of the evidence-based health care movement. It prepares, maintains, and disseminates systematic reviews of the effects of skin treatments and maintains a specialized register of skin trials.17 The aim of its work is to facilitate the development and practice of evidence-based dermatology. Important for this development are high-quality skin trials and the clear and complete reporting of their information,13 as all skin trials (published or not) may be potentially included in the systematic reviews and meta-analyses. As part of the group's work of assembling and assessing all the evidence for the effectiveness of skin treatments, we identified, by hand searching,18 all the clinical trial reports in Clinical and Experimental Dermatology, a major source of reports of skin trials. Earlier research suggested that the methodological quality of skin trials may be limited,19,20 and we wanted to see the overall quality of the reporting of these trials. We assessed the methodological and reporting qualities of each of the trial reports according to the dimensions of the main features of RCTs.
We hand searched18 all issues of Clinical and Experimental Dermatology from its inception in 1976 through 1997 (volumes 1-22) as part of the systematic effort of the Cochrane Collaboration to identify all published trials. Each journal article, review, letter, and meeting abstract was examined to determine if it was a randomized controlled trial or controlled clinical trial (CCT) publication type as defined by the National Library of Medicine.18,21 We then downloaded the MEDLINE record for each trial (RCT or CCT) identified by hand searching into a bibliographic database (ProCite, version 3.1; Research Information Systems, Carlsbad, Calif), and we recorded its publication type according to MEDLINE.
In this study, a clinical trial involved at least 1 test treatment and 1 control treatment and concurrent enrollment and follow-up of the treatment groups. A trial was classified as an RCT if there was an explicit statement that the comparison groups were established by random allocation,18,21 such as by a table of random numbers. Trials that allocated treatment using coin flips; odd, even, patient Social Security, or medical record numbers; days of the week; or other pseudorandom or quasi-random processes and other trials for which the authors did not explicitly state that random allocation was made were classified as CCTs.18,21
We assessed each of the RCT reports for the following: the reporting of the methods of random sequence generation, concealment of allocation schedule, adequacy of sample size, use of masking, and intention-to-treat analysis (or inclusion of all randomized participants in the trial analysis), and comparability of appropriate baseline measures for comparison groups.22
(It is desirable that the comparison groups in a randomized trial be as similar as possible in factors or characteristics that might influence the participants' response to the intervention. Stratified randomization is used to ensure that equal numbers of participants with a characteristic thought to affect prognosis or response to the intervention are allocated to each comparison group. For example, in a trial of women with psoriasis, it may be important to have similar numbers of smokers and nonsmokers in each comparison group, and stratified randomization could be used to achieve this balance. Stratified randomization is carried out either by performing separate randomization [often using random permuted blocks] for each stratum or by using minimization. Random permuted blocks [or block randomization] ensures that at any point in a trial roughly equal numbers of participants are allocated to the comparison groups. Permuted blocks are often used in combination with stratified randomization. With small trials, simple randomization often produces unbalanced groups. Minimization is a method of allocation used, particularly in small trials, to ensure that comparison groups are closely similar for several variables [that may be important prognostic factors]. It can be done with or without a component of randomization, and it is best performed centrally with a computer program to ensure allocation concealment.)
Acceptable methods of random sequence generation were the following: computerized random number generator, table of random numbers, and minimization. It is important for the reduction of bias that the allocation schedule is concealed up to the point of the allocation of treatment, and we accepted the following approaches to allocation concealment as adequate: central randomization; central dispensation of intervention at a pharmacy; numbered or coded containers; and sequentially numbered, opaque, and sealed envelopes.22 We recorded the trial size and the sample size of each comparison group. We looked for a prior definition of an acceptable difference in outcome measures or a hypothesis to be tested and the calculation of sample sizes around this definition. Also, we recorded whether all randomized participants were included in the analysis of the trial outcomes (intention-to-treat analysis). We looked for an account of the mechanism to ensure that neither trialist nor trial participant was able to identify the intervention being provided or that comparison treatments could not be distinguished by taste or appearance. Moreover, we also looked in the reports for baseline comparisons of appropriate variables for the intervention groups using measures of SD, range, raw data, or χ2 tests for nominally scaled variables.
The data were extracted onto forms, and thereafter, the assessments were entered into a worksheet (Excel; Microsoft Corp, Redmond, Wash) and analyzed using simple descriptive statistics.
We found, by hand searching, 96 reports of clinical trials. There were 73 RCTs with 5629 participants and 22 CCTs with 1412 participants. One report was an overview that combined several separate clinical trials. All the trial reports came from 88 full-length journal articles, 2 letters, and 6 meeting abstracts.
Records of 88 of these 96 studies were recoverable from MEDLINE: there were no records for 4 RCTs and 4 CCTs (1 letter and 3 abstracts in each category). Conversely, only 59 of the 96 reports identified by hand searching as CCTs or RCTs were recorded as clinical trials according to MEDLINE (31 RCTs, 13 CCTs, and 15 clinical trials) (Table 2).
Of the 73 RCTs, 68 were reviewed; the 3 abstracts and 1 letter did not contain enough information for assessment, and 1 other publication also was eliminated from this assessment because of inadequate information. The results of the quality assessment are based on 68 RCT reports and are summarized in Table 3.
All the studies randomly allocated trial participants and units to comparison groups, but only 1 of the studies reported a method for random sequence generation (use of a table of random numbers). In addition, 5 studies reported using stratified randomization or blocking. Of the 68 studies, 5 (7%) reported an adequate method of concealment of allocation.
Of 38 RCT reports of unblocked parallel studies in which the unit of randomization was the individual, 22 (58%) had identical numbers of participants in each comparison group (Figure 1). These 22 included reports of 6 trials with 3 or 4 comparison groups. In another 5 studies, the sizes of comparison groups were not different from those achieved through simple alternation. Thus, only 11 (29%) of 38 presumably simply RCTs reported a difference of greater than one individual in the sizes of the comparison groups.
Sample sizes were generally small, with a median of 23 randomized units per intervention group (interquartile range, 15-40 units [containing 50% of the sample size values]) (Figure 1). Only 1 report included sample size and power considerations. Thirty-two studies (47%) were reported as double masked while 26 (38%) did not report any masking (and were presumably not masked).
Only 4 (6%) of the 68 reports specifically mentioned that they did not exclude randomized participants from the analysis of the study result (intention to treat). In another 4, there were apparently no dropouts, as all were accounted for in the analysis of trial results. In 60 reports, the number of participants analyzed was less than those randomized. Thus, 60 (88%) of the 68 reports excluded (or appeared to have excluded) some randomized participants from the analysis of trial results.
Validity is the degree to which a result is likely to be free of bias, and so, true. Internal validity expresses the extent to which the observed effects are true for the study sample, while external validity (or generalizability) is the extent to which the observed results reflect what is true for the general population. The reporting of RCTs should allow an assessment of their internal and external validity, especially as these trials contribute strong primary evidence for the effects of health care.
There is now consensus on the formats for this reporting15 in the face of empirical evidence that poorly and incompletely reported trials23 are associated with greater bias and an exaggeration of treatment effects,14 and there are continuing efforts to improve the quality of reporting of RCTs.15,24
Our results showed that the reporting of RCTs in Clinical and Experimental Dermatology was generally not complete and could be improved. Almost all the reports of trials did not describe the methods of generating random sequences or of concealing (protecting) random allocation schedules. Sample sizes were generally small, making the trials prone to type II errors (falsely claiming no treatment effect when a clinically useful effect might still exist),20 and the studies did not report sample size and power calculations based on a main a priori hypothesis. Almost two fifths of the studies did not report masking and were presumably not adequately masked. Almost 90% excluded (or appeared to have excluded) some randomized participants from their analyses. These problems suggest that many of these studies may have exaggerated treatment effects.14 Our results agree with other surveys19,20,22,23,25,26 that described inadequate execution and/or reporting of RCTs.
The issues highlighted may represent incomplete reporting, methodological and design flaws, or both. In one research report,23 in which clarification from the authors was sought, the results suggested that inadequate reporting was a reflection of inadequate methods. Thus, deficient reporting of data in RCTs can be used as proxy for their defective design and execution.
The importance of the concealment or protection of the allocation schedule14,27 in RCTs is probably not as widely appreciated as that of generating a random sequence for allocation to comparison groups. Yet, it is an integral part of the randomization process, and the bias introduced from failure to conceal the allocation schedule can be more serious than that from defective random sequence generation.27 Therefore, for complete reporting of the randomization process, it is desirable to specify the steps taken to protect the allocation code.
The close similarity between the sizes of intervention groups within trials (Figure 1) that used simple randomization was greater than that attributable to chance alone. Based on the expected distribution of difference in sample sizes according to the total trial size, none of the 38 trials were outside the boundary, where we expected 50% of the trials to be.28 This observation suggests that in some of the randomized trials allocation to groups may not have been strictly carried out following a random, unpredictable sequence, as previously described and discussed,22 and it may be that trialists erroneously thought that a more uniform distribution of sizes of groups conferred greater credibility on their trials.
Moreover, the comparison sample sizes were generally small, such that most have enough power (ie, the ability to detect a difference in treatment outcomes, if there is a real one) to detect only greater than a doubling of relative treatment effects for categorical outcomes.29 The methodological flaw of small sample sizes and insufficient statistical power is not peculiar to skin trials. Reviewing 71 "negative" RCTs, Freiman et al30 reported that, because of small sample sizes, 67 could have missed a treatment difference of 25% and 50 could have missed a difference of 50%. It does not seem that trialists are responsive enough to the requirement that studies have sufficient power to detect, within statistical significance, a treatment effect of an a priori specified size. Formulating a clear trial hypothesis necessarily leads to the determination of the sample size needed to minimize both type I (falsely claiming a treatment effect when there is none) and type II errors.
The problems of quality of RCTs highlighted above have been equally documented for other specialties and journals19,20,22,23,25,26 and underscore the importance of standards of reporting across medical disciplines. Again, the quality of reporting of RCTs is taken as a proxy for the quality of the trials themselves. It may be argued that the general acceptance of a reporting standard may merely improve reporting without enhancing the design and execution of trials, the critical steps in obtaining high quality evidence. This need not be so. The greater awareness of and adherence to a reporting standard (eg, CONSORT) can increase the awareness of the best RCT practice and thus promote better design and execution of trials. Consequently, some journals (eg, Lancet), to encourage good principles in the design and execution of RCTs, accept the protocols of trials and meta-analyses for assessment and for clinical and statistical review31 and, if they meet the basic criteria, guarantee the publication of the results of the completed trial or meta-analysis. This innovation promotes good clinical trials practice and reporting. To encourage the delivery of high-quality evidence, dermatology journals should consider putting this scheme in place and adopt the CONSORT statement.
In this study, hand searching improved the yield of trials by 60% compared with electronic searching, validating the efforts that the Cochrane Collaboration is expending on hand searching. It is the only sure way of finding all published trials and, consequently, making all the published evidence available for a systematic review or meta-analysis. One recommendation of the CONSORT statement is that reports of RCTs include some form of the word randomized in the title (only 1 of 68 did in this survey). The universal adoption of this style by dermatology journals will considerably improve the electronic searching yield of studies and obviate the need for prospective hand searching. Use of the approaches described herein will facilitate the emergence of high-quality evidence in systematic reviews and meta-analyses summarizing the effects of health care interventions.
Accepted for publication December 19, 1999.
This study was supported in part by the United Kingdom National Health Service Research and Development Programme, London, England.
A cooperative effort of the Clinical Epidemiology Unit of the Istituto Dermopatico dell'Immacolata–Istituto di Recovero e Cura a Carattere Scientifico (IDI-IRCCS) and the Archives of Dermatology
Corresponding author: Kayode Adetugbo, PhD, 15 Dolphin Rd, Northolt, UB5 6UQ, Middlesex, England (e-mail: firstname.lastname@example.org).