lnOR indicates natural log of the odds ratio; SE, standard error.
OR indicates odds ratio. Dotted line indicates the expected type I error rate (ie, 10%).
Peters JL, Sutton AJ, Jones DR, Abrams KR, Rushton L. Comparison of Two Methods to Detect Publication Bias in Meta-analysis. JAMA. 2006;295(6):676-680. doi:10.1001/jama.295.6.676
Author Affiliations: Centre for Biostatistics and Genetic Epidemiology, Department of Health Sciences, University of Leicester, Leicester, England (Drs Sutton, Jones, and Abrams, and Ms Peters); MRC Institute for Environment and Health, Leicester, England (Dr Rushton). Dr Rushton is now with the Department of Epidemiology and Public Health, Imperial College London, London, England.
Context Egger's regression test is often used to help detect publication bias in meta-analyses. However, the performance of this test and the usual funnel plot have been challenged particularly when the summary estimate is the natural log of the odds ratio (lnOR).
Objective To compare the performance of Egger's regression test with a regression test based on sample size (a modification of Macaskill's test) with lnOR as the summary estimate.
Design Simulation of meta-analyses under a number of scenarios in the presence and absence of publication bias and between-study heterogeneity.
Main Outcome Measures Type I error rates (the proportion of false-positive results) for each regression test and their power to detect publication bias when it is present (the proportion of true-positive results).
Results Type I error rates for Egger's regression test are higher than those for the alternative regression test. The alternative regression test has the appropriate type I error rates regardless of the size of the underlying OR, the number of primary studies in the meta-analysis, and the level of between-study heterogeneity. The alternative regression test has comparable power to Egger's regression test to detect publication bias under conditions of low between-study heterogeneity.
Conclusion Because of appropriate type I error rates and reduction in the correlation between the lnOR and its variance, the alternative regression test can be used in place of Egger's regression test when the summary estimates are lnORs.
Systematic reviews and meta-analyses are commonly used to identify and evaluate evidence about interventions or exposures in human health. Even when conducted thoroughly, systematic reviews and meta-analyses can be subject to publication bias—studies being less likely to be published, hence less likely to be included in a systematic review or meta-analysis because of the size and/or statistical significance of their estimate of effect.1 If publication bias occurs, the subsequent systematic review or meta-analysis of published literature may be misleading.
Of the methods available to researchers for the detection of publication bias, one of the simplest is the funnel plot.2 This is a scatterplot of the estimate of effect from each study in the meta-analysis against a measure of its precision, usually 1/SE (Figure 1A). In the absence of bias, the plot should resemble a “funnel shape,” as smaller, less precise studies are more subject to random variation than larger studies when estimating an effect. In the presence of publication bias, some smaller studies reporting negative results will be missing, leading to an asymmetrical funnel plot. Of course, publication bias is not the only possible explanation for observed (or tested) funnel plot asymmetry.3 Between-study heterogeneity and small-study effects (the tendency for smaller studies to show greater effects than larger studies) are also possible explanations.3 However, when the study summary estimates are odds ratios (ORs), there is a correlation between the natural log of OR (lnOR) and its SE, since the variance is a function of lnOR.4 This correlation is stronger the further the estimated OR is from unity.
Thus, some asymmetry observed in a funnel plot may be due to this correlation rather than publication bias. The effect of this correlation can be avoided by plotting effect size estimates against sample size, rather than precision. The meta-analysis plotted in Figure 1A uses data simulated from a model with no publication bias. However, it appears that some small negative studies could be missing from the bottom left-hand corner, which could be interpreted as indicating publication bias. When these data are plotted against sample size (Figure 1B), the funnel plot looks more symmetrical. Although Figure 1A and Figure 1B are not remarkably different, since only the y-axis has changed, the impact on Egger's regression test can be quite striking, especially if the underlying OR is far from null.
Statistical tests have been developed to provide more formal assessments for publication bias than the inspection of funnel plots. Egger's regression test5 is widely used (eg, as of January 11, 2006, the Web of Knowledge6 included 819 articles citing this article), is implemented in a number of software packages,7- 10 and has become a standard procedure (eg, of 43 meta-analyses published in JAMA since 1997 in which an assessment of publication bias was made, 13 reported using Egger's regression test). Since it is based directly on the funnel plot, where the standardized effect estimate (effect/SE) is regressed on a measure of precision (1/SE), Egger's regression test is also subject to the effects of the correlation when using ORs.
In fact, Egger's regression test has been challenged because of its high type I error rates (the proportion of false-positive results) when ORs are used,3,11,12 a probable symptom of this correlation. As almost one third of the JAMA articles reviewed above used Egger's regression test when the summary estimates were ORs, this needs investigation. Using simulation analyses, we confirm that Egger's regression test is indeed inappropriate for ORs, particularly when the ORs are large and there is considerable between-study heterogeneity.3,4,11 We describe a simple alternative (a modified version of Macaskill's test,4 which is little used in practice) for detecting funnel plot asymmetry that avoids this correlation.
We assessed the performance of 8 regression tests for funnel plot asymmetry, including Egger's regression test, using simulation methods. The tests13 differ in terms of the independent variable used, the weighting used, and whether random effects were included. In this article we compare the performance of Egger's regression test and the test found to have the most desirable properties compared with the remaining regression tests (results for all tests can be found in Peters et al13). Other modified tests are also being developed.14
Characteristics of the simulated meta-analyses were based on a systematic review of meta-analyses of animal experiments,15 but the findings can be applied generally. Meta-analyses of 6, 16, 30, and 90 primary studies with underlying ORs of 1, 1.2, 1.5, 3, and 5 were simulated. The control group event rate was allowed to vary for each primary study. It was sampled from a uniform distribution with lower and upper limits of 0.3 and 0.7, respectively, representing a fairly common event in the control group. The treatment group event rate was calculated from this and the assumed underlying OR. The number of subjects in the control group in each study was based on the exponential of the normal distribution with a mean of 5 and variance of 0.3. The ratio of control to treated/exposed subjects was 1. The median sample size was around 300 in each simulated meta-analysis. Fixed- and random-effects models were used to simulate the meta-analyses. Since between-study heterogeneity is often found in meta-analyses,16,17 an understanding of the performance of tests for publication bias in such situations is essential in practice. The between-study heterogeneity parameter was calculated as a percentage of the average within-study variance estimate. From the fixed-effects model, the average within-study variance was calculated and between-study heterogeneity was then defined to be 20%, 150%, and 500% of the within-study variance estimate. This reflects scenarios ranging from modest to considerable between-study heterogeneity. These levels of between-study heterogeneity corresponds to values of I2, the percentage of total variation across studies that is due to heterogeneity rather than chance,18 of 16.7%, 60%, and 83.3%, respectively.
Performance of the regression tests was assessed in the absence and presence of induced funnel plot asymmetry. Asymmetry was induced in 2 ways. First, it was induced on the basis of the P value associated with a study's effect size4,19,20 (the larger the P value the more likely that study was excluded from the meta-analysis). Since a study estimate is more likely to be statistically significant when the underlying OR is large, little publication bias is actually induced for the larger underlying ORs. Therefore, publication bias was also induced on the basis of study effect size.21 Studies with the most extreme negative effect sizes were excluded from the meta-analysis. Results are based on 1000 replications. The maximum SE of estimates for the type I error rates and power in the simulations is 1.7%. All simulations and analyses were carried out in Stata 8.2.7 For ease of presentation, only results for the underlying ORs of 1, 1.5, and 5 are given in the Figures (findings for underlying ORs of 1.2 and 3 follow the same general trend13).
An ideal test has the desired type I error rate (eg, 10% when statistical significance is specified from a 2-tailed test at P<.10, as is advocated for these tests5) and good power to detect asymmetry when it exists. In Figure 2, Egger's regression test exceeds the appropriate type I error rate of 10% for large underlying ORs. As the amount of between-study heterogeneity and number of primary studies increases, the type I error rates also increase, even for moderate ORs (ie, OR = 1.5). We also observed an imbalance in the tail probability areas for Egger's 2-tailed test,13 as previously demonstrated.4,11
In the presence of funnel plot asymmetry, Egger's regression test appears reasonably powerful to detect this asymmetry (Figure 3), especially as the underlying OR and number of studies in the meta-analysis increase.
However, in assessing practical use of the test, power must be considered in light of the type I error rates (so that false-positive results are not mistaken for true-positive results). This trade-off between power and type I error rates is similar to that between the sensitivity and specificity of a diagnostic test. Our findings and those of others3,4,11 lead us to have serious concerns over the practical use of Egger's test to identify funnel plot asymmetry for lnORs.
Of the 7 further regression models assessed, one model stands out in that its performance is superior to all the others, including Egger's regression test.13 This model and the simulated results from it are now discussed.
In preference to Egger's regression test, we recommend a simple weighted linear regression with lnOR as the dependent variable and the inverse of the total sample size as the independent variable. This is a minor modification of Macaskill's test,4 with the inverse of the total sample size as the independent variable rather than total sample size. Our results indicate that use of the inverse of the total sample size gives more balanced type I error rates in the tail probability areas than where there is no transformation of sample size.13 Use of sample size reduces the correlation between the lnOR and its SE.4,13 It also avoids violating an assumption of regression models that Egger's regression test does not avoid, as the independent variable, SE, is subject to random error (so that Egger's regression test is affected by regression dilution bias22).
The weighting given to each study by the alternative regression test is based on the assumption that the null hypothesis is true, ie, the underlying OR = 1. Choice of this weighting helps to reduce the correlation between the lnOR and the weight given to each study when the standard inverse variance weighting is used. Thus, appropriate type I error rates and balance in the tail probabilities are achieved.
Further explanation of the implications of this choice of weighting can be found in Macaskill et al4 and details of the weighting given to each study are given in Peters et al.13Figure 2 shows that the type I error rates for this alternative regression test are approximately 10%, as expected, regardless of the size of the underlying OR, the number of studies in the meta-analysis, and the amount of between-study heterogeneity, unlike those for Egger's regression test (Figure 2).
When there is little between-study heterogeneity, the alternative regression test and Egger's regression test appear to have moderate power to detect asymmetry when it is induced on the basis of P value (Figure 3) and high power when asymmetry is induced on the magnitude of the effect (data not shown).
When there is considerable heterogeneity (Figure 3), Egger's regression test is more powerful than the alternative regression test, however as discussed it is difficult to disentangle the high type I error rates of Egger's regression test from power.
Neither Egger's regression test nor the alternative regression test are particularly powerful in all scenarios. However, a test that may not be optimal, but performs well in all situations, is needed. Thus, although the alternative regression test is no more powerful than Egger's regression test, we recommend that the alternative be routinely used rather than Egger's regression test because it reduces the correlation between lnOR and its SE4 through the choice of weighting and has appropriate type I error rates. The alternative regression test can easily be run in any software package allowing weighted linear regression. (Details on implementing this test in Stata7 are available from the authors.) In fact, applying this test to the meta-analysis illustrated in Figure 1 gives a nonsignificant result (P = .18), as one would expect since the data were simulated with no publication bias; Egger's regression test yields P = .07.
The alternative regression test is analogous to a funnel plot based on sample size. Thus, although contrary to the recommendations of Sterne and Egger23 for choice of funnel plot axis, we advocate use of sample size13 for lnORs.
We have also assessed use of the permutation test to obtain the P value for each test. The permutation test has been advocated for use in meta-regression to deal with inflated type I error rates.24 Preliminary findings do not necessarily suggest better performance of tests based on P values from the permutation test compared with the usual t test.13 Extensions to, and performance of, these regression tests when some of the between-study heterogeneity can be explained by a study-level covariate is ongoing work.
Our results, like those of some others,3,4 only concern synthesis of ORs. Findings of an investigation of Egger's regression test using relative risks (RRs) suggests a similar result: excessive type I error rates.11 Although more work is needed on the performance of both tests when the summary estimate is not the OR, it is likely that other relative summary estimates (eg, RRs and risk differences) will be subject to effects similar to the correlation described above for the OR, thus suggesting Egger's regression test may not be appropriate. We did not consider meta-analyses of rare events. Evidence suggests the type I error rates for Egger's regression test are particularly high in these situations,3,11 but performance of the alternative regression test needs exploring.
Simply testing for the presence of asymmetry does not help obtain an unbiased estimate from the meta-analysis, particularly as there is overreliance on these tests (eg, a nonsignificant P value being taken as evidence that publication bias is not an issue). Our review of 43 meta-analyses published in JAMA since 1997 found that a number of approaches are taken when publication bias is suspected (as in 11 of the 43 meta-analyses). These include acknowledging possible publication bias, but giving no detail on the extent or impact of such bias; discussing the possible implications of suspected publication bias and advising caution in the interpretation of the pooled estimate; and attributing inconsistent findings to the possible existence of publication bias. Other possible approaches include the trim and fill method25 and best evidence synthesis approach.26,27 None of these approaches is adequate; while better methods of detecting and dealing with publication bias are being developed, we recommend that authors draw their conclusions cautiously, keeping the possibility of sensitivity to publication and related biases in mind.
Corresponding Author: Jaime Peters, MSc, Centre for Biostatistics and Genetic Epidemiology, Department of Health Sciences, University of Leicester, 22-28 Princess Rd W, Leicester, LE1 6TP, England (firstname.lastname@example.org).
Author Contributions: Ms Peters had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Peters, Sutton, Jones, Abrams, Rushton.
Analysis and interpretation of data: Peters, Sutton, Jones, Abrams, Rushton.
Drafting of the manuscript: Peters, Sutton, Jones, Abrams, Rushton.
Critical revision of the manuscript for important intellectual content: Peters, Sutton, Jones, Abrams, Rushton.
Statistical analysis: Peters, Sutton, Jones, Abrams, Rushton.
Obtained funding: Peters, Sutton, Jones, Abrams, Rushton.
Administrative, technical, or material support: Peters, Sutton, Jones, Abrams, Rushton.
Study supervision: Sutton.
Financial Disclosures: None reported.
Funding/Support: Ms Peters is funded through a UK Department of Health Evidence Synthesis Award.
Role of the Sponsor: The funding source had no role in any aspect of the study.
Acknowledgment: We are pleased to thank Petra Macaskill, PhD (School of Public Health, Sydney, Australia), for her comments on an earlier draft and suggestions for its improvement. Dr Macaskill did not receive any compensation.