Figure. Correlation between the summary relative risk in each meta-analysis (random effects) and the relative risk in the largest study. Five outliers with relative risk greater than 3 in the meta-analysis and/or the largest study are not shown. Axes are in logarithmic scale.
Tzoulaki I, Siontis KC, Evangelou E, Ioannidis JPA. Bias in associations of emerging biomarkers with cardiovascular disease. JAMA Intern Med. Published online March 25, 2013. doi:10.1001/jamainternalmed.2013.3018.
Table. List of All Biomarkers Examined
eFigure. Flowchart for Eligible Studies
Customize your JAMA Network experience by selecting one or more topics from the list below.
Tzoulaki I, Siontis KC, Evangelou E, Ioannidis JPA. Bias in Associations of Emerging Biomarkers With Cardiovascular Disease. JAMA Intern Med. 2013;173(8):664–671. doi:10.1001/jamainternmed.2013.3018
Author Affiliations: Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece (Drs Tzoulaki, Siontis, and Evangelou); Department of Epidemiology and Biostatistics, Imperial College London, London, United Kingdom (Dr Tzoulaki); Mayo School of Graduate Medical Education, College of Medicine, Mayo Clinic, Rochester, Minnesota (Dr Siontis); and Stanford Prevention Research Center, Departments of Medicine and Health Research and Policy, Stanford University School of Medicine, and Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, California (Dr Ioannidis).
Importance Numerous cardiovascular biomarkers are proposed as potential predictors of cardiovascular risk.
Objective To evaluate whether there is evidence for biases favoring statistically significant results and inflating associations in this literature.
Design and Setting PubMed search for meta-analyses of cardiovascular biomarkers that are not part of the Framingham Risk Score.
Main Outcome Measures We estimated summary effects and between-study heterogeneity (considered “very large” for I2 > 75%). We evaluated whether large studies had significantly more conservative results than smaller studies (small-study effects) and whether there were too many studies with statistically significant results compared with what would be expected on the basis of the findings of the largest study in each meta-analysis.
Results Of 56 eligible meta-analyses, 49 had statistically significant results. Very large heterogeneity and small-study effects were seen in 9 and 13 meta-analyses, respectively. In 29 meta-analyses (52%), there was a significant excess of studies with statistically significant results. Only 13 of the statistically significant meta-analyses had more than 1000 cases and no hints of large heterogeneity, small-study effects, or excess significance. These included the associations of glomerular filtration rate and albumin to creatinine ratio in general and high-risk populations with cardiovascular disease mortality and of non–high-density lipoprotein cholesterol, serum albumin, Chlamydia pneumoniae IgG, glycosylated hemoglobin, nonfasting insulin, apolipoprotein B/AI ratio, erythrocyte sedimentation rate, and lipoprotein-associated phospholipase mass or activity with coronary heart disease.
Conclusions and Relevance Selective reporting biases may be common in the evidence on emerging cardiovascular biomarkers. Most of the proposed associations of these biomarkers may be inflated.
Numerous cardiovascular biomarkers are proposed as potential predictors of cardiovascular risk.1-3 Despite intensive research efforts, to date, few emerging cardiovascular biomarkers have shown clear improvements in predictive discrimination, reclassification, and/or calibration, and their clinical utility remains equivocal.4-6 Methodologic limitations in this field, including poor reporting, lack of validation, and unjustified claims of improved prediction over well-established risk scores, cast doubts on the validity and magnitude of reported effect sizes.4,7-10
Numerous meta-analyses have been performed to date aiming to summarize the predictive value of individual cardiovascular biomarkers. However, to our knowledge, there has been no recent effort to summarize the evidence from these meta-analyses on their associated limitations such as publication, selective analysis, and outcome reporting biases. Here we assembled for the first time a comprehensive systematic sample of meta-analyses that examine associations of emerging biomarkers with cardiovascular outcomes. We sought to evaluate whether there is evidence for biases in this literature favoring statistically significant and/or inflated results and how many and which of these cardiovascular biomarker effects have no hints of bias.
We assembled meta-analyses that examined any emerging biomarker, defined as any biological parameter11 other than those included in the Framingham Risk Score,12 in relation to cardiovascular disease (CVD), coronary heart disease (CHD), or cardiovascular mortality. Meta-analyses were selected as a tool to provide us with systematic summary evidence on each examined biomarker. All types of biomarkers were eligible (blood, urine, tissue, imaging, or physical measurement). We excluded meta-analyses of single common genetic variants because these markers have limited prognostic ability when examined in isolation, but multigene scores were eligible.
We used 3 different approaches to collect a comprehensive sample of biomarker meta-analyses indexed in MEDLINE (with no year restriction and last update as of January 30, 2012). First, we used the algorithm “(‘Biological marker’[MeSH Terms]) AND (cardiovascular OR coronary [Title/Abstract])” limited to meta-analyses, English language, and human studies. Second, we performed targeted MEDLINE searches for meta-analyses of 71 additional specific emerging biomarkers included in recent comprehensive reviews13-15 using the same algorithm, but instead of applying the generic “Biological marker” term, we tracked the name of each biomarker (see eTable for full list of biomarkers searched; http://www.jamainternalmed.com). We perused the title and abstract of each of these citations, and potentially eligible articles were then retrieved in full text. Finally, we identified meta-analyses of individual participant data published by major consortia operating in the field (Emerging Risk Factor Collaboration, Fibrinogen Studies Collaboration, Ankle Brachial Index Collaboration, Homocysteine Studies Collaboration, and Chronic Kidney Disease Prognosis Consortium).
We included studies regardless of the baseline characteristics (clinical setting) of the examined populations. Whenever an article presented separate meta-analyses on more than 1 eligible biomarker, outcome, or type of clinical settings, those were kept separately. Meta-analyses were eligible regardless of whether the included studies used adjustment for some covariates or score (eg, the Framingham Risk Score) or unadjusted analyses. We excluded meta-analyses of randomized controlled trials assessing the change of a biomarker in relation to an outcome. When more than 1 meta-analysis examining the same biomarker and same outcome on the same clinical setting were identified, only the most recent one with eligible data was kept.
Data extraction was performed independently by 2 investigators (K.C.S. and I.T.) and, in the case of discrepancies, consensus was reached. From each eligible meta-analysis, we recorded the first author, journal, year of publication, and number of studies in the meta-analysis, and we noted the biomarker, risk factors or score used for covariate adjustment and the outcome examined. The study-specific relative risk (RR) estimates (risk ratio, odds ratio, hazard ratio, or incident risk ratio, as reported by the meta-analysis authors) along with the corresponding 95% CIs and the number of cases in each study were extracted for each biomarker and outcome. The number of control participants in addition to the number of cases for each study were extracted when needed for the power calculation (see the Statistical Analysis subsection). When data on number of cases (or number of cases and controls when needed) were missing, meta-analyses were excluded and substituted with a previously published meta-analysis on the same biomarker, whenever available. Meta-analyses of cardiovascular outcomes typically assume that study-specific RRs (odds ratio, hazard ratio, and risk ratio) are similar and combine different RRs under this assumption. This assumption is a fair approximation whenever the disease incidence is low or modest. We, therefore, extracted the metric of the largest study (study with smallest variance) and assumed that the remaining RRs correspond to the same metric. Whenever data were provided with different adjustments, we preferred estimates that adjusted for the Framingham Risk Score or the model with Framingham Risk Score variables; if neither of these options was available, we preferred the model with the larger number of adjusting factors. When subgroups were presented, we extracted the data for each subgroup separately unless the study combined the results across all subgroups.
For each meta-analysis, we estimated the summary effect size and its 95% CI using random effects models16 and calculated the I2 metric for heterogeneity. The I2 metric ranges from 0% to 100% and is the ratio of between-study variance to the sum of the within- and between-study variances. Values exceeding 50% or 75% are considered to represent “large” or “very large” heterogeneity, respectively. The 95% CIs of I2 estimates can be wide when there are few studies.17 Furthermore, we used the regression asymmetry test proposed by Egger et al18 and examined by Sterne et al.19P < .10 with more conservative effect in larger studies was considered evidence for small-study effects—that is, that the results of small studies differed from those of larger studies. Various biases or genuine heterogeneity may cause small-study effects.19
We applied the excess significance test to evaluate whether the observed number of studies (O) with statistically significant results (“positive” studies) in each meta-analysis is larger than their expected number (E).20-22 We also summed the O and E across all meta-analyses. For each meta-analysis, E is the sum of the power estimates of the studies it includes. The estimated power depends on the plausible effect size. The true effect size for any meta-analysis is unknown. Herein, we assumed that the most plausible effect is given by the largest study. Different equations were used to estimate the power when the largest study reported a hazard ratio23 or odds ratio.24 Excess significance for single meta-analyses was claimed at P < .10 (1-sided P < .05 with O > E as previously proposed22).
Predefined subgroup analyses applied the excess significance test in subgroups of meta-analyses with or without large between-study heterogeneity, small-study effects, individual-level data, or nominally significant summary effects and by primary (general populations) vs secondary (high risk or populations with CVD) prevention and biomarker (biological fluid measurement or physical measurement/imaging).
Stata, version 10.1 (StataCorp), was used for statistical analyses. P values were 2-tailed.
Overall, 582 articles were searched and 35 articles corresponding to 56 meta-analyses were deemed eligible25-59 (eFigure). Examining 42 unique biomarkers for the 56 meta-analyses, the median (range) number of studies was 12 (3-68) and number of events was 2459 (34-12 785). Meta-analyses pertained to a range of biomarkers and populations examined; most data corresponded to primary prevention (general populations, 42 meta-analyses) and biomarkers measured in biological fluids (37 meta-analyses). The outcome examined was CHD in 28 meta-analyses, CVD in 21, and CVD mortality or cardiac death in 7 (Table 1). Fourteen meta-analyses were of individual participants and 42 analyzed published literature. All but 4 meta-analyses reported RR estimates adjusted for a variety of other cardiovascular risk factors (Table 1).
Overall, 49 (88%) of the eligible meta-analyses reported a nominally statistically significant summary result (Table 1). The largest study had statistically significant results in 41 meta-analyses. The Figure shows the estimates of the largest studies against the random effects meta-analysis estimates. The largest study's result was more conservative compared with the summary result in 44 meta-analyses (79%), and most of the largest studies suggested effects of small magnitude (Table 1). Only myocardial metabolic imaging had the reported RR estimate adjusted for cardiovascular risk factors exceeding 3.00 both in the meta-analysis and in the largest study. Brain natriuretic peptide in high-risk populations and coronary artery calcium also reported large effect sizes (RR > 3) in unadjusted analyses. Nonetheless, studies with adjusted analyses have also shown relatively high estimates for these biomarkers.42,43
Twenty-six meta-analyses (46%) had large heterogeneity (I2 > 50%) and 9 (16%) had very large heterogeneity (I2 > 75%). Evidence for significant small-study effects was noted in 13 meta-analyses (23%). Meta-analyses of fibrinogen, selenium, and apolipoprotein B with CHD and brain natriuretic peptide, aortic pulse wave velocity, and cystatin C with CVD showed very large heterogeneity and evidence of small-study effects. The meta-analyses of coronary artery calcium and cardiorespiratory fitness with CVD and troponin with 30-day cardiac death in patients with acute coronary syndrome showed large heterogeneity (I2 = 50%-75%) and evidence of small-study effects (Table 1).
In 29 meta-analyses (52%) there was a significant excess of observed “positive” studies compared with those expected (Table 2). Table 3 shows aggregate data from all the meta-analyses and according to different subgroups. Among 919 studies included in 56 meta-analyses, 472 (51%) had nominally statistically significant results, while the expected number was 317. The difference between the observed and expected was significant (P < .001). The excess of significant findings was documented across all the examined subgroups (Table 3).
Of the 56 meta-analyses, 18 (32%) had nominally statistically significant summary associations per random effects calculations and had no evidence of small-study effects (P ≥ .10), not very large heterogeneity (I2 ≤ 75%), and no evidence for excess significance.
Overall, 13 of the 18 associations (72%) were based on cumulative evidence of more than 1000 cardiovascular events. This included 9 biomarkers with associations with CHD in general populations (non–high-density lipoprotein cholesterol, serum albumin, Chlamydia pneumoniae IgG titers, apolipoprotein B/AI ratio, glycosylated hemoglobin, lipoprotein-associated phospholipase mass and activity, erythrocyte sedimentation rate, and nonfasting insulin) and 2 biomarkers (estimated glomerular filtration rate [eGFR] and albumin to creatinine ratio) with associations with CVD mortality in high-risk and general populations. Across these 13 associations, the RR per 1-SD increase (assuming the top vs bottom tertile comparison corresponds to approximately 2 SD) had a median (range) of 1.2 (1.1-1.5), excluding the eGFR and albumin to creatinine meta-analyses. The latter meta-analyses in general and in high-risk populations reported RR using linear splines with knots at different levels of biomarkers. The RR was greater than 2 only in extreme comparisons—for example, the meta-analysis between eGFR and CVD mortality in general populations reported an RR of 2.66 comparing eGFR of 15 mL/min/1.73 m2 (0.4% of the population) vs the reference category (95 mL/min/1.73 m2); the RR in less extreme categories was more conservative (RR = 1.40 for 60 vs 95 mL/min/1.73 m2).
A systematic evaluation of 56 meta-analyses of emerging cardiovascular biomarkers suggests that many results are prone to biases. We found strong evidence to suggest the effect of biomarkers is exaggerated because the largest studies—which one would expect to produce the most stable estimates—consistently showed smaller effects. In most meta-analyses, too many single studies had reported “positive” results compared with what would be expected on the basis of the results of the largest studies. This suggests that small studies with “negative” results remain unpublished or that their results are distorted during analysis and reporting to seem more prominent.
Bias from preferential reporting of positive findings or “positive analyses” has been postulated as a major problem in clinical investigation and biomarker research in particular.60-62 Perhaps only a fraction of potentially available data are eventually published in this field. For example, the number of studies in the meta-analyses of triglycerides, body mass index, C-reactive protein, homocysteine, and apolipoprotein B, which all examined general populations and CHD, varies extensively: 68, 39, 31, 26, and 21 studies, respectively. These biomarkers are relatively inexpensive and easy to measure routinely; thus, they should be available in most epidemiologic data sets with cardiovascular outcomes. Differences in the number of published studies may result in part from prioritization for publication of positive results. In addition, some promising biomarker results may simply reflect false-positives owing to multiple testing of many biomarkers.63,64 Selective analysis reporting bias emerges when there are many analyses that can be performed and only some of them, the ones with the “best” results, are presented.65,66 Scientists may have examined a wide range of biomarkers in the same population, different outcomes, different cut-off points, and different covariate adjustments for a single biomarker but then report only 1 or a few of these analyses.
In our evaluation, the excess of significant findings was seen across all prespecified subgroups. A cluster of meta-analyses, including those of fibrinogen, selenium, and apolipoprotein B with CHD and brain natriuretic peptide and aortic pulse wave velocity with CVD, had very large heterogeneity and small-study effects. Of note, the Egger test is particularly difficult to interpret in the presence of prominent between-study heterogeneity.19 Genuine heterogeneity of the strength of association in diverse settings and populations may be confused for selective reporting bias. However, in all of these cases there was also a marked excess of studies with positive results. Therefore, the overall picture is more consistent with bias and suggests that the claimed effect sizes are inflated. The meta-analyses themselves sometimes addressed these issues on individual biomarkers, calculated heterogeneity, assessed presence of publication bias, and presented subgroup and meta-regression analyses to identify differences between studies.
Hints of bias cannot exclude that these biomarkers have any association with cardiovascular outcomes. It is difficult to differentiate whether the underlying effect is small or null and whether genuine heterogeneity exists. There are also several associations in the literature of cardiovascular biomarkers that did not show any evidence of biases. These included the association of eGFR and albumin to creatinine ratio with CVD mortality in general and high-risk populations as well as the association of non–high-density lipoprotein cholesterol, serum albumin, Chlamydia pneumoniae IgG, glycosylated hemoglobin, nonfasting insulin, erythrocyte sedimentation rate, apolipoprotein B/AI ratio, and lipoprotein-associated phospholipase with CHD. Most of these associations were weak in terms of the magnitude of the RR. Therefore, their ability to improve cardiovascular risk assessment might be limited when used in isolation.
Several caveats should be considered in interpreting our findings. First, both asymmetry and excess significance tests offer hints of bias, not proof thereof. The exact estimation of excess significance is influenced by the choice of plausible effect size and the calculation of power. We selected the largest study's effect as the plausible effect. These studies may have inherent biases themselves. Finally, we performed power calculations assuming that all studies within each meta-analysis reported the same RR metric. This is a common assumption in meta-analyses of CVD outcomes, which typically consider the different reported metrics to be similar assuming CVD incidence is less than 10%.
With acknowledgment of these caveats, our evaluation maps the status of the evidence on the associations between popular cardiovascular biomarkers with CVD outcomes. Similar conclusions have been drawn from investigations in other fields, such as Alzheimer disease and mental health conditions.21,22 Cardiovascular biomarkers may be as prone to bias as biomarkers in other fields. Single biomarkers with large effect sizes are probably rare. Thus, the much-awaited improvement in CVD risk prediction will require evaluation of composites of numerous biomarkers in large-scale consortia using standardized approaches to minimize biases.4
Accepted for Publication: November 5, 2012.
Published Online: March 25, 2013. doi:10.1001/jamainternmed.2013.3018
Correspondence: John P. A. Ioannidis, MD, DSc, Stanford Prevention Research Center, 1265 Welch Rd, Medical School Office Building, Room X306, Stanford, CA 94350 (firstname.lastname@example.org).
Author Contributions: Dr Ioannidis had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: Tzoulaki, Siontis, and Ioannidis. Acquisition of data: Tzoulaki, Siontis, and Evangelou. Analysis and interpretation of data: All authors. Drafting of the manuscript: Tzoulaki, Siontis, and Ioannidis. Critical revision of the manuscript for important intellectual content: Siontis and Evangelou. Statistical analysis: All authors. Administrative, technical, and material support: Siontis. Study supervision: Ioannidis.
Conflict of Interest Disclosures: None reported.