Assessing Uncontrolled Confounding in Associations of Being Overweight With All-Cause Mortality

This study investigates potential uncontrolled confounding in meta-analyses of the association of being overweight with all-cause mortality.


The percentage of meaningfully strong effect sizes
Random-effects meta-analyses have reported the percentage of studies with meaningfully strong effect sizes to characterize evidence strength across numerous studies with effect sizes or associations that may differ. [9][10][11] That is, by conducting a random-effects meta-analysis, the metaanalyst acknowledges the possibility that studies' effects differ (e.g., due to differences in their populations) by assuming that these effect sizes come from a distribution that might be highly concentrated around the meta-analysis mean (i.e., low heterogeneity) or alternatively could be more spread out (i.e., high heterogeneity). The meta-analytic pooled estimate represents the mean of this distribution. As a supplement to the meta-analytic estimate, the percentage of meaningfully strong effect sizes helps assess, in the potentially heterogeneous distribution of effect sizes, how often those effect sizes are meaningfully strong. If this percentage is large (e.g., 80%), this would suggest meaningfully strong associations in most studies, albeit prior to considering potential bias due to uncontrolled confounding. 12,13 Then, as a sensitivity analysis to consider confounding, one can ask: "How strong would the potential influence of uncontrolled confounder(s) have to be to reduce this percentage of meaningfully strong effect sizes to below a certain threshold?"

Confounding control in meta-analyzed studies
In Flegal et al.'s 15 meta-analysis, approximately half of the studies adjusted for age, sex, and smoking; Flegal et al. 15 reported similar results when analyzing only studies that did control for these variables. GBMC 14 restricted their analysis to individual participants who were neversmokers without specific chronic diseases, controlled within each study for age and sex, and omitted the first 5 years of follow-up (when these data were available). Omitting the first 5 years of follow-up could, in principle, reduce confounding by underlying health conditions, but this method does have substantial limitations, essentially because early mortality may be a weak surrogate for the presence of underlying health conditions. 16,17 In both meta-analyses, most (or all) studies did not adjust for probable confounders such as socioeconomic status, physical activity, dietary quality, and baseline body mass index (BMI).

Methods for re-analysis and primary sensitivity analyses
We conducted all data analyses using R statistical software version 4.0.2 (R Project for Statistical Computing). We conducted analyses from December 2021 to January 2022 using data sets provided by the meta-analysts at our request. All P values are 2-tailed. To conduct the sensitivity analyses, we first obtained point estimates and confidence intervals by fitting a standard randomeffects meta-analysis by restricted maximum likelihood and with standard errors estimated with the Knapp-Hartung adjustment. 18 Throughout, we treated hazard ratios as approximately equal to risk ratios because the outcome was rare.
In both meta-analyses, some papers, consortia, or cohorts contributed multiple point estimates.
In other cases, multiple cohorts were pre-aggregated into a single estimate (see Supplements of references [14][15] ). Throughout the main text and this Supplement, we use "studies" to refer to the meta-analyzed point estimates. This terminology differs from that used in the meta-analyses themselves, such that we report different numbers of "studies" (140 for Flegal et al. 15 and 186 for GBMC 14 ) than were reported in the original meta-analyses (97 and 189, respectively). Because the original analyses did not seem to account for clustering of estimates within papers, consortia, or cohorts, we similarly analyzed both datasets using a simple random-effects model that assumed independent estimates. (GBMC's 14 analysis accounted for cohorts' contributing multiple outcomes that represented different BMI contrasts, but this is distinct from clustering of estimates for a single BMI contrast within, for example, a consortium.) However, note that a best-practice meta-analysis would account for the clustering via, for example, robust estimation 19 or multilevel modeling, or a combination. 20 GBMC 14 conducted several analyses, for example by defining BMI categories at different levels of granularity. For comparability to Flegal et al.'s 15 meta-analysis using standard BMI categories, our re-analysis of GBMC's data used the standard BMI range for being overweight (i.e., 25 ≤ BMI < 30). Additionally, GBMC's 14 analyses considered dose-response across BMI categories, so used multivariate meta-analysis and floating variance estimates to account for the multiple BMI categories contributed by each cohort. Again for comparability to Flegal et al.'s 15 meta-analysis and because we focused on only one contrast in BMI categories (i.e., being overweight vs. being normal weight), we used standard univariate meta-analysis methods and inference rather than multivariate meta-analysis. Because of these methodological differences, our point estimate and confidence interval for GBMC 14 differed negligibly from their reported HR = 1.11 (95% CI: [1.10, 1.11]) for their analysis that used standard BMI categories.
We calculated E-values for the point estimate and confidence interval using methods and software that have been described elsewhere. 21.22 We estimated the percentage of meaningfully strong effect sizes (as defined in the main text) using nonparametric methods when considering bias of homogeneous strength across studies. 13 As described in the main text, we considered effect sizes to be meaningfully strong when they were greater than HR = 1.1 (for estimates in the apparently detrimental direction) or when they were less than HR = 0.9 (for estimates in the apparently protective direction), such that being overweight confers at least a 10% increased or decreased hazard of mortality. These choices are of course somewhat arbitrary, but analyses with other thresholds yielded similar conclusions about sensitivity to uncontrolled confounding. Suggestions for how to choose such thresholds have been discussed elsewhere. 21,23 Our sensitivity analyses considered the strength of uncontrolled confounding associations that would be required to reduce the percentage of meaningfully strong effect sizes to less than 15%, a criterion we chose based on previous recommendations. 21

Reproducibility
All R code required to reproduce these results is publicly available (https://osf.io/b3ux8/). Data from the two meta-analyses cannot be made public at the authors' request, but they are available upon request to individuals who have secured permission from the original authors.