Context Although it is widely recommended that clinical trials undergo some
type of quality review, the number and variety of quality assessment scales
that exist make it unclear how to achieve the best assessment.
Objective To determine whether the type of quality assessment scale used affects
the conclusions of meta-analytic studies.
Design and Setting Meta-analysis of 17 trials comparing low-molecular-weight heparin (LMWH)
with standard heparin for prevention of postoperative thrombosis using 25
different scales to identify high-quality trials. The association between
treatment effect and summary scores and the association with 3 key domains
(concealment of treatment allocation, blinding of outcome assessment, and
handling of withdrawals) were examined in regression models.
Main Outcome Measure Pooled relative risks of deep vein thrombosis with LMWH vs standard
heparin in high-quality vs low-quality trials as determined by 25 quality
scales.
Results Pooled relative risks from high-quality trials ranged from 0.63 (95%
confidence interval [CI], 0.44-0.90) to 0.90 (95% CI, 0.67-1.21) vs 0.52 (95%
CI, 0.24-1.09) to 1.13 (95% CI, 0.70-1.82) for low-quality trials. For 6 scales,
relative risks of high-quality trials were close to unity, indicating that
LMWH was not significantly superior to standard heparin, whereas low-quality
trials showed better protection with LMWH (P<.05).
Seven scales showed the opposite: high quality trials showed an effect whereas
low quality trials did not. For the remaining 12 scales, effect estimates
were similar in the 2 quality strata. In regression analysis, summary quality
scores were not significantly associated with treatment effects. There was
no significant association of treatment effects with allocation concealment
and handling of withdrawals. Open outcome assessment, however, influenced
effect size with the effect of LMWH, on average, being exaggerated by 35%
(95% CI, 1%-57%; P=.046).
Conclusions Our data indicate that the use of summary scores to identify trials
of high quality is problematic. Relevant methodological aspects should be
assessed individually and their influence on effect sizes explored.
Although randomized controlled trials provide the best evidence of the
efficacy of medical interventions, they are not immune to bias. Studies relating
methodological features of trials to their results have shown that trial quality
influences effect sizes. For populations of trials examining treatments in
myocardial infarction,1 perinatal medicine,2 and various disease areas,3
it has consistently been shown that inadequate concealment of treatment allocation,
resulting, for example, from the use of open random-number tables, is associated
on average with larger treatment effects. One of these studies2
also found larger average effect sizes if trials were not double-blind. Analyses
of individual trials suggest that in some instances effect sizes are also
overestimated if some participants, for example, those not adhering to study
medications, were excluded from the analysis.4-6
Informal qualitative research has indicated that investigators sometimes undermine
the random allocation of study participants, for example, by opening assignment
envelopes or holding translucent envelopes up to a light bulb.7
In response to this situation, guidelines on the conduct and reporting of
clinical trials and scales to measure the quality of published trials have
been developed.8,9
The quality of trials is of obvious relevance to meta-analysis. If the
raw material used is flawed, then the conclusions of meta-analytic studies
will be equally invalid. Following the recommendations of the Cochrane Collaboration
and other experts in the field,10-12
many meta-analysts assess the quality of trials and exclude trials of low
methodological quality in sensitivity analyses. In a meta-analysis of trials
comparing low-molecular-weight heparin (LMWH) with standard heparin for thromboprophylaxis
in general surgery, Nurmohamed et al13 found
a significant reduction of 21% in the risk of deep vein thrombosis (DVT) with
LMWH (P=.012). However, when the analysis was limited
to trials with strong methods, as assessed by a scale consisting of 8 criteria,
no significant difference between the 2 heparins remained (relative risk [RR]
reduction, 9%; P=.38). The authors therefore concluded
that "there is at present no convincing evidence that in general surgery patients
LMWHs, compared with standard heparin, generate a clinically important improvement
in the benefit to risk ratio."13 In contrast,
another group of meta-analysts did not consider the quality of trials and
concluded that "LMWHs seem to have a higher benefit to risk ratio than unfractionated
heparin in preventing perioperative thrombosis."14
Although widely recommended, the method of assessing and incorporating
the quality of clinical trials is a matter of ongoing debate.15
This is reflected by the plethora of available instruments. In a search covering
the years up to 1993, Moher et al9 identified
25 different quality assessment scales. Most of these scoring systems lack
a focused theoretical basis and their objectives are unclear. The scales differ
considerably in terms of dimensions covered, size, and complexity, and the
weight assigned to the key domains most relevant to the control of bias (randomization,
blinding, and withdrawals)16 varies widely
(Table 1).
We repeated the meta-analysis of Nurmohamed et al13
using different scales and thus examined whether the type of scale used for
assessing the quality of trials affects the conclusions of meta-analytic studies.
Scales Used to Assess Trial Quality
We used the 25 scales described by Moher et al.9
When necessary, we adapted items that were developed for specific situations.
For example, 1 item in a scale developed to assess trials of corticosteroids
in alcoholic hepatitis28 considered the similarity
of the prognostic variables total bilirubin, prothrombin time, and hepatic
encephalopathy at baseline. We included this item, but considered the variables
that Nurmohamed et al13 identified as of prognostic
importance (for example, the sex of patients, the duration of operation, and
the presence of malignancies). If instructions for use of a quality assessment
instrument were unclear, we obtained additional published information or contacted
the authors. Ambiguities remained for some scales and we developed our own
a priori rules to deal with these situations.
All 17 general surgery trials comparing LMWH with standard heparin that
were included in the original meta-analysis were assessed with each of the
25 scales. To maintain comparability with the original meta-analysis,13 we included a single-center report40
from a multicenter study,41 which had erroneously
been included in addition to the main report.42
Information on authors, author affiliation, study centers, drugs, results,
conclusions, source, year of publication, references, funding, and acknowledgments
was concealed by a person unrelated to the study, using a black marker and
subsequent photocopying. A few items made unblinding of some of this information
(in most cases the results section) necessary. Reports were independently
assessed by 2 of the authors (P.J., A.W.). One assessor rated trials in the
opposite order, also reversing the sequence of scales. Interobserver reliability
was determined for each scale using the intraclass correlation coefficient.43 Disagreements were resolved by consensus.
We repeated the original meta-analysis using each of the 25 quality
assessment scales, keeping all other aspects constant. The same fixed-effects
model was used for combining trials, with data being provided by the authors
of the original meta-analysis.13 Effect estimates
were weighted according to the inverse of their variance. The primary end
point was DVT and major bleeding was the secondary end point. We performed
stratified analyses dividing trials into high-quality and low-quality groups,
using the definitions given by the authors of the scales. If no definitions
were available from authors, we considered trials with scores above the median
as high quality. We also performed analyses using the median as the cutoff
point for all scales. To determine whether the restriction of scales to domains
most relevant to the control of bias (randomization, blinding, and withdrawals)16 affects results, we deleted all items that were unrelated
to these domains, recalculated scores, and repeated stratified analyses using
the median as the cutoff point.
Meta-regression analyses were performed to examine the association of
global quality scores with estimated effects on the risk of DVT. The random-effects
regression model, described in detail elsewhere,44
relates the treatment effect to study quality, assuming a normal distribution
for the residual errors with both a within-study and an additive between-studies
component of variance. The between-studies variance was estimated by an iterative
procedure, using an estimate that is based on a restricted maximum likelihood
method.
We standardized scores by subtracting the median from individual values
and dividing the result by the interquartile range. All scores had a standardized
distribution with a median of 0 and an interquartile range of 1 and regression
coefficients were therefore comparable. For each scale we calculated the expected
RR for hypothetical trials with the highest and lowest possible score. As
a measure of overall agreement between the 25 scales, we calculated the intraclass
correlation coefficient for the standardized scores.43
Finally, we examined the separate influence of the 3 key domains that have
been shown to be associated empirically with bias: concealment of treatment
allocation,1-3
blinding of outcome assessments,2,45
and handling of dropouts and withdrawals in the analysis.4-6
Finally, we performed sensitivity analyses using a random-effects model46 for combining trials, inspecting funnel plots, and
testing for the presence of publication bias.47
We used Meta-Analyst software (Joseph Lau, Boston, Mass) for fixed-effects
meta-analysis and the program Metareg48 in
Stata (Stata Corporation, College Station, Tex) for meta-regression analysis.
Results are given as RRs with 95% confidence intervals (CIs). All P values are 2-sided.
Interrater reliability was excellent for most scales. Intraclass correlation
coefficients were above 0.9 for 12 scales (48%), 0.8 to 0.9 for 10 scales
(40%), and less than 0.8 for 3 scales (12%). The median quality of the 17
trials as assessed by the 25 scales ranged from 38.5% to 82.9% of the maximum
score (Table 2). The authors of
16 scales defined a threshold for high quality, with the median threshold
corresponding to 60% of the maximum score. Agreement for standardized scores
between the 25 scales was substantial (intraclass correlation coefficient,
0.72 [95% CI, 0.59-0.86]).
Analyses Stratified by Trial Quality
For all trials combined, the RR of DVT comparing LMWH with standard
heparin was 0.79 (95% CI, 0.65-0.95) and thus identical to the results of
the original analysis.13Figure 1 shows the results of analyses stratified by quality using
the 25 scales. Pooled RRs ranged from 0.63 to 0.90 for high-quality trials,
and from 0.52 to 1.13 for low-quality trials. Six scales with pooled RRs of
high-quality trials more than 0.79 and CIs overlapping 1 indicated that LMWH
was not significantly superior to standard heparin, whereas low-quality trials
assessed by these scales showed significantly better protection with LMWH
(P<.05). Seven scales showed the opposite: high-quality
trials indicated that LMWH was beneficial (P<.05)
with RRs of less than 0.79, whereas low-quality trials showed no significant
difference. For the remaining 12 scales, pooled results from low-quality and
high-quality strata indicated similar effects. Results were not materially
altered when using the median score as the cutoff point for high-quality trials
throughout. In meta-regression no significant difference in effect estimates
between high-quality and low-quality trials was evident for any of the scales
used (P>.10). Significant differences in the risk
of major bleeding were not observed between the 2 heparins overall or when
stratified by quality.
Summary Quality Scores and Effect Sizes
Meta-regression analysis confirmed the differences between scales that
were observed in the stratified analysis. The coefficients (per point increase
of standardized scores) ranged from −0.177 to 0.169, demonstrating that
depending on the scale used, the effect size either increased or decreased
with increasing trial quality. Figure 2
illustrates the relationship between RRs and quality scores for 3 scales.17,29,31 None of the 25 scales
yielded a statistically significant association between summary scores and
effect sizes. For hypothetical trials of maximum quality the RR of DVT comparing
LMWH with standard heparin ranged from 0.57 to 0.95, whereas for hypothetical
trials of minimum quality RRs ranged from 0.51 to 1.32. Heterogeneity between
scales was reduced little when restricting scales to the domains most relevant
to the control of bias (randomization, blinding, and withdrawals): RRs for
DVT ranged from 0.63 to 0.98 for high-quality trials and from 0.68 to 0.89
for low-quality trials.
Key Domains and Effect Sizes
The association between methodological key domains and estimates of
treatment effects on the risk of DVT is explored in Table 3. There was no significant association with allocation concealment
and handling of dropouts and withdrawals. Trials with open assessment of the
outcome, however, overestimated treatment effects by 35% (95% CI, 1%-57%; P=.046). This association remained when all 3 key domains
were included in a multivariate analysis (P=.030).
Meta-analysis of the 6 trials with open assessment of DVT suggested that LMWH
was superior to standard heparin with an RR of 0.59 (95% CI, 0.42-0.84; P=.004). Conversely, the 11 trials with blinded outcome
assessment showed no significant difference (RR, 0.89; 95% CI, 0.71-1.11; P=.29).
Meta-analysis is widely used to summarize the evidence on the benefits
and risks of medical interventions. However, the findings of several meta-analyses
of small trials have been contradicted subsequently by large controlled trials.47,49-51 The
fallibility of meta-analysis is not surprising, considering the various biases
that may be introduced by the process of locating and selecting studies, including
publication bias,52 language bias,53 and citation bias.54
Low methodological quality of component studies is another potential source
of systematic error. The critical appraisal of trial quality is therefore
widely recommended and a large number of different instruments are currently
in use. In a hand search of 5 general medicine journals dating 1993 to 1997
(Annals of Internal Medicine, BMJ, JAMA, Lancet, and New England Journal of Medicine) we identified 37 meta-analyses using
26 different instruments to assess trial quality.
Our study shows that the type of scale used to assess trial quality
can dramatically influence the interpretation of meta-analytic studies. Using
25 different scales, we reanalyzed a meta-analysis which, based on trials
considered to be of high methodological quality, found little difference between
LMWH and standard heparin in the prevention of postoperative thrombosis. Whereas
for some scales these findings were confirmed, the use of others would have
led to opposite conclusions, indicating that the beneficial effect of LMWH
was particularly robust for trials deemed to be of high quality. Similarly,
in meta-regression analysis effect size was negatively associated with some
quality scores, but positively associated with others. Accordingly, RRs estimated
for hypothetical trials of maximum or minimum quality varied widely between
scales.
These discrepant results are not surprising when considering the heterogeneous
nature of the instruments.15 Many scales include
items that are more closely related to reporting quality, ethical issues,
or to the interpretation of results rather than to the internal validity of
trials. For example, some scales assessed whether the rationale for conducting
the trial was clearly stated, whether the trialists' conclusions were compatible
with the results obtained, or whether the report stated that participants
provided written informed consent. Important differences also exist between
scales that focus on internal validity. For example, the scale developed by
Jadad et al,29 which has been widely advocated,3,9,15 gives more weight to
the quality of reporting than to actual methodological quality. A statement
on withdrawals and dropouts will earn the point allocated to this domain,
independently of whether the data were analyzed according to the intention-to-treat
principle. The instrument addresses randomization but does not assess allocation
concealment. The use of an open random-number table would thus be considered
equivalent to concealed randomization using a telephone or computer system
and earn the maximum points foreseen for randomization. Conversely, the scale
developed by Chalmers et al20 allocates 0 points
for unconcealed but the maximum of 3 points for concealed randomization. The
authors of the different scales clearly had different perceptions of trial
quality, but definitions were rarely given, and the ability of the scales
to measure what they are supposed to measure remains unclear.
Our study was based on a single meta-analysis and, strictly speaking,
the results are only applicable to the 17 trials examined. It is unlikely,
however, that agreement across scales would be better in other situations.
Interestingly, in a recent review of treatment effects from trials deemed
to be of high or low quality, Kunz and Oxman55
found that in some meta-analyses there were no differences whereas in other
meta-analyses high-quality trials showed either larger or smaller effects.
In 1 analysis evaluating the effect of antiestrogen treatment in male infertility,
the results were reversed with adverse effects on pregnancy rates in studies
of high quality.56 Different scales had been
used for assessing quality and, in light of our study, it is possible that
the choice of the scale contributed to the discrepant associations observed
in these meta-analyses.
In our sample of trials we found that blinding of outcome assessment
was the only factor significantly associated with effect size, with RRs on
average being exaggerated by 35% if outcome assessment was open. When restricting
the analysis to 11 trials with blinded outcome assessment, no significant
difference between the 2 heparins was evident indicating that in general surgery
patients, the 2 heparins may be equally effective. This was recently confirmed
in an updated meta-analysis that included 25 double-blind trials.57 The combined odds ratio for DVT was 0.99 (95% CI,
0.83-1.18). It is now generally agreed that the advantages of LMWH (reduced
risk of heparin-induced thrombocytopenia58
and convenience of once daily dosing) must be balanced against the greater
cost of LMWH.59
The importance of blinding could have been anticipated considering that
the interpretation of the test (fibrinogen leg scanning) used to detect DVT
can be subjective.60 In other situations, blinding
of outcome assessment may be irrelevant, such as when examining the effect
of an intervention on overall mortality. In contrast to studies including
large numbers of trials,1-3
we did not find a significant association of concealment of treatment allocation
with effect estimates. Our meta-analysis could have been too small to show
this effect. Alternatively, concealment of treatment allocation may not have
been relevant in the context of our study. The importance of allocation concealment
may to some extent depend on whether strong beliefs exist among investigators
regarding the benefits or risks of assigned treatments or whether equipoise
of treatments is accepted by all investigators involved.7
Strong beliefs are probably more prevalent in trials comparing an intervention
with placebo than in trials comparing 2 similar, active interventions. Other
aspects may need to be considered in specific situations, such as whether
the trial was terminated prematurely, the tests used for measuring end points,
or the type of statistical model used. The importance of individual quality
domains and, possibly, the direction of potential biases associated with these
domains will thus vary according to the context, and the mechanistic application
of scales with fixed weights allocated to a standard set of items may dilute,
or entirely miss, associations.61 Indeed, in
our sample of 17 trials none of the 25 composite scales was significantly
associated with treatment effect. Many meta-analysts probably would have dismissed
trial quality as a source of bias based on the analysis of summary scores
and concluded that there was robust evidence favoring LMWH.
Although improved reporting practices should facilitate the assessment
of methodological quality in the future, incomplete reporting continues to
be an important problem when assessing trial quality.8
Because small single-center studies may be more likely to be of inadequate
quality and more likely to be reported inadequately than large multicenter
studies, the sample size and number of study centers may sometimes be useful
proxy variables for study quality. Analyzing the effect of sample size will
also shed light on the possible presence of publication bias.47
One should not forget, however, that associations between domains of trial
quality and treatment effects are subject to the potential biases of observational
studies. Confounding could exist between measures of trial quality and other
characteristics of trials, such as the setting, the characteristics of the
participants, or the treatments.
The use of quality scores as weights when pooling studies has also been
advocated, such as multiplying scores with the precision of effect estimates.3 Such procedures will affect both the combined effect
estimate and its CI. As could be expected, we obtained different results depending
on the scale used as the weighting factor when performing such analyses (data
not shown). As pointed out previously by Detsky et al,24
the incorporation of quality scores as weights lacks statistical or empirical
justification.
The assessment of the methodological quality of randomized trials and
the conduct of sensitivity analyses should be considered routine procedures
in meta-analysis. Although composite quality scales may provide a useful overall
assessment when comparing populations of trials, for example, trials published
in different languages or disciplines, such scales should not generally be
used to identify trials of apparent low quality or high quality in a given
meta-analysis.61 Rather, the relevant methodological
aspects should be identified, ideally a priori, and assessed individually.
This should always include the key domains of concealment of treatment allocation,
blinding of outcome assessment or double blinding, and handling of withdrawals
and dropouts. Finally, the lack of well-performed and adequately sized trials
cannot be remedied by statistical analyses of small trials of questionable
quality.
1.Chalmers TC, Celano P, Sacks HS, Smith Jr H. Bias in treatment assignment in controlled clinical trials.
N Engl J Med.1983;309:1358-1361.Google Scholar 2.Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias.
JAMA.1995;273:408-412.Google Scholar 3.Moher D, Pham B, Jones A.
et al. Does quality of reports of randomised trials affect estimates of intervention
efficacy reported in meta-analyses?
Lancet.1998;352:609-613.Google Scholar 4.Sackett DL, Gent M. Controversy in counting and attributing events in clinical trials.
N Engl J Med.1979;301:1410-1412.Google Scholar 5.May GS, DeMets DL, Friedman LM, Furberg C, Passamani E. The randomized clinical trial: bias in analysis.
Circulation.1981;64:669-673.Google Scholar 6.Peduzzi P, Wittes J, Detre K, Holford T. Analysis as-randomized and the problem of non-adherence.
Stat Med.1993;12:1185-1195.Google Scholar 7.Schulz KF. Subverting randomization in controlled trials.
JAMA.1995;274:1456-1458.Google Scholar 8.Begg C, Cho M, Eastwood S.
et al. Improving the quality of reporting of randomized controlled trials:
the CONSORT statement.
JAMA.1996;276:637-639.Google Scholar 9.Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S. Assessing the quality of randomized controlled trials.
Control Clin Trials.1995;16:62-73.Google Scholar 10.Mulrow CD, Oxman AD. Cochrane Collaboration handbook [Cochrane Review on CD-ROM]. Oxford, England: Cochrane Library, Update Software; 1998: issue 4.
11.Cook DJ, Sackett DL, Spitzer WO. Methodologic guidelines for systematic reviews of randomized control
trials in health care from the Potsdam consultation on meta-analysis.
J Clin Epidemiol.1995;48:167-171.Google Scholar 12.Pogue J, Yusuf S. Overcoming the limitations of current meta-analysis of randomised controlled
trials.
Lancet.1998;351:47-52.Google Scholar 13.Nurmohamed MT, Rosendaal FR, Buller HR.
et al. Low-molecular-weight heparin versus standard heparin in general and
orthopaedic surgery: a meta-analysis.
Lancet.1992;340:152-156.Google Scholar 14.Leizorovicz A, Haugh MC, Chapuis FR, Samama MM, Boissel JP. Low molecular weight heparin in prevention of perioperative thrombosis.
BMJ.1992;305:913-920.Google Scholar 15.Moher D, Jadad AR, Tugwell P. Assessing the quality of randomized controlled trials.
Int J Technol Assess Health Care.1996;12:195-208.Google Scholar 16.Chalmers I. Applying overviews and meta-analyses at the bedside.
J Clin Epidemiol.1995;48:67-70.Google Scholar 17.Andrew E. Method for assessment of the reporting standard of clinical trials
with roentgen contrast media.
Acta Radiol Diagn (Stockh).1984;25:55-58.Google Scholar 18.Beckerman H, de Bie RA, Bouter LM, de Cuyper HJ, Oostendorp RAB. The efficacy of laser therapy for musculoskeletal and skin disorders.
Phys Ther.1992;72:483-491.Google Scholar 19.Brown SA. Measurement of quality of primary studies for meta-analysis.
Nurs Res.1991;40:352-355.Google Scholar 20.Chalmers I, Adams M, Dickersin K.
et al. A cohort study of summary reports of controlled trials.
JAMA.1990;263:1401-1405.Google Scholar 21.Chalmers TC, Smith Jr H, Blackburn B.
et al. A method for assessing the quality of a randomized control trial.
Control Clin Trials.1981;2:31-49.Google Scholar 22.Cho MK, Bero LA. Instruments for assessing the quality of drug studies published in
the medical literature.
JAMA.1994;272:101-104.Google Scholar 23.Colditz GA, Miller JN, Mosteller F. How study design affects outcomes in comparisons of therapy, I: medical.
Stat Med.1989;8:441-454.Google Scholar 24.Detsky AS, Naylor CD, O'Rourke K, McGeer AJ, L'Abbé KA. Incorporating variations in the quality of individual randomized trials
into meta-analysis.
J Clin Epidemiol.1992;45:255-265.Google Scholar 25.Evans M, Pollock AV. A score system for evaluating random control clinical trials of prophylaxis
of abdominal surgical wound infection.
Br J Surg.1985;72:256-260.Google Scholar 26.Goodman SN, Berlin JA, Fletcher SW, Fletcher RH. Manuscript quality before and after peer review and editing at
Annals of Internal Medicine.
Ann Intern Med.1994;121:11-21.Google Scholar 27.Gøtzsche PC. Methodology and overt and hidden bias in reports of 196 double-blind
trials of nonsteroidal antiinflammatory drugs in rheumatoid arthritis.
Control Clin Trials.1989;10:31-56.Google Scholar 28.Imperiale TF, McCullough AJ. Do corticosteroids reduce mortality from alcoholic hepatitis?
Ann Intern Med.1990;113:299-307.Google Scholar 29.Jadad AR, Moore RA, Carroll D.
et al. Assessing the quality of reports of randomized clinical trials: is
blinding necessary?
Control Clin Trials.1996;17:1-12.Google Scholar 30.Kleijnen J, Knipschild P, ter Riet G. Clinical trials of homoeopathy.
BMJ.1991;302:316-323.Google Scholar 31.Koes BW, Assendelft WJ, van der Heijden GJ, Bouter LM, Knipschild PG. Spinal manipulation and mobilisation for back and neck pain: a blinded
review.
BMJ.1991;303:1298-1303.Google Scholar 32.Levine J. Trial assessment procedure scale (TAPS). In: Spilker B, ed. Guide to Clinical Trials.
New York, NY: Raven Press; 1991:780-786.
33.Linde K, Clausius N, Ramirez G.
et al. Are the clinical effects of homoeopathy placebo effects?
Lancet.1997;350:834-843.Google Scholar 34.Onghena P, Van Houdenhove B. Antidepressant-induced analgesia in chronic non-malignant pain.
Pain.1992;49:205-219.Google Scholar 35.Poynard T. Evaluation de la qualité méthodologique des essais thérapeutiques
randomisés.
Presse Med.1988;17:315-318.Google Scholar 36.Reisch JS, Tyson JE, Mize SG. Aid to the evaluation of therapeutic studies.
Pediatrics.1989;84:815-827.Google Scholar 37.Smith K, Cook D, Guyatt GH, Madhavan J, Oxman AD. Respiratory muscle training in chronic airflow limitation: a meta-analysis.
Am Rev Respir Dis.1992;145:533-539.Google Scholar 38.Spitzer WO, Lawrence V, Dales R.
et al. Links between passive smoking and disease.
Clin Invest Med.1990;13:17-42.Google Scholar 39.ter Riet G, Kleijnen J, Knipschild P. Acupuncture and chronic pain: a criteria-based meta-analysis.
J Clin Epidemiol.1990;43:1191-1199.Google Scholar 40.Verardi S, Cortese F, Baroni B, Boffo V, Casciani CU, Palazzini E. Deep vein thrombosis prevention in surgical patients.
Curr Ther Res.1989;46:366-372.Google Scholar 41.Verardi S, Casciani CU, Nicora E.
et al. A multicentre study on LMW-heparin effectiveness in preventing postsurgical
thrombosis.
Int Angiol.1988;7:19-24.Google Scholar 42.Leizorovicz A, Haugh M, Boissel JP. Meta-analysis and multiple publication of clinical trial reports.
Lancet.1992;340:1102-1103.Google Scholar 43.Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability.
Psychol Bull.1979;86:420-428.Google Scholar 44.Thompson SG, Sharp S. Explaining heterogeneity in meta-analysis.
Stat Med.In press.Google Scholar 45.Noseworthy JH, Ebers GC, Vandervoort MK, Farquhar RE, Yetisir E, Roberts R. The impact of blinding on the results of a randomized, placebo-controlled
multiple sclerosis clinical trial.
Neurology.1994;44:16-20.Google Scholar 46.DerSimonian R, Laird N. Meta analysis in clinical trials.
Control Clin Trials.1986;7:177-188.Google Scholar 47.Egger M, Smith GD, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test.
BMJ.1997;315:629-634.Google Scholar 49.Villar J, Carroli G, Belizàn JM. Predictive ability of meta-analyses of randomised controlled trials.
Lancet.1995;345:772-776.Google Scholar 50.Cappelleri JC, Ioannidis JPA, Schmid CH.
et al. Large trials vs meta-analysis of smaller trials: how do their results
compare?
JAMA.1996;276:1332-1338.Google Scholar 51.Le Lorier J, Grégoire G, Benhaddad A, Lapierre J, Derderian F. Discrepancies between meta-analyses and subsequent large randomized,
controlled trials.
N Engl J Med.1997;337:536-542.Google Scholar 52.Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication bias in clinical research.
Lancet.1991;337:867-872.Google Scholar 53.Egger M, Zellweger-Zähner T, Schneider M, Junker C, Lengeler C, Antes G. Language bias in randomised controlled trials published in English
and German.
Lancet.1997;350:326-329.Google Scholar 54.Gøtzsche PC. Reference bias in reports of drug trials.
BMJ.1987;295:654-656.Google Scholar 55.Kunz R, Oxman AD. The unpredictability paradox: review of empirical comparisons of randomised
and non-randomised clinical trials.
BMJ.1998;317:1185-1190.Google Scholar 56.Khan KS, Daya S, Jadad A. The importance of quality of primary studies in producing unbiased
systematic reviews.
Arch Intern Med.1996;156:661-666.Google Scholar 57.Koch A, Bouges S, Ziegler S, Dinkel H, Daures JP, Victor N. Low molecular weight heparin and unfractionated heparin in thrombosis
prophylaxis after major surgical intervention: update of previous meta-analyses.
Br J Surg.1997;84:750-759.Google Scholar 58.Warkentin TE, Levine MN, Hirsh J.
et al. Heparin-induced thrombocytopenia in patients treated with low-molecular-weight
heparin or unfractionated heparin.
N Engl J Med.1995;332:1330-1335.Google Scholar 59.Weitz JI. Low-molecular weight heparins.
N Engl J Med.1997;337:688-698.Google Scholar 60.Lensing AW, Hirsh J. I
125-fibrinogen leg scanning.
Thromb Haemost.1993;69:2-7.Google Scholar 61.Greenland S. Quality scores are useless and potentially misleading [reply to re:
a critical look at some popular analytic methods].
Am J Epidemiol.1994;140:300-302.Google Scholar