Association of Food and Nonalcoholic Beverage Marketing With Children and Adolescents’ Eating Behaviors and Health

This systematic review and meta-analysis quantifies the association of food and nonalcoholic beverage marketing with behavioral and health outcomes in children and adolescents.


Grading the certainty of evidence
The GRADE approach (Grading of Recommendations, Assessment, Development and Evaluation) was applied to assess the certainty of the available evidence for each outcome (GRADEpro software https://gradepro.org/). GRADE focuses on the internal validity of bodies of evidence and is widely used in guideline development. Certainty of evidence can be graded as very low, low, moderate, or high. The evidence from observational studies start as being of low quality, while the evidence from randomised controlled trials starts as high quality. We considered five criteria for lowering the level of confidence: risk of bias, indirectness, imprecision, inconsistency, and likelihood of publication bias. Further, the level of confidence could be raised by three criteria: large effect, dose-response gradient, and where the influence of all plausible confounding would reduce a demonstrated effect or suggest a spurious effect when results show no effect (dose-response and plausible confounding were only considered where the evidence had not been downgraded on any domain for any reason), but this did not occur.
Where pooled analyses were not possible, for those outcomes we applied the constructs of GRADE in accordance with recommendations for rating the certainty of evidence in the absence of a single estimate of effect i.e., where data have been summarised narratively. 1 Where pooled analyses were undertaken for an outcome, we present the outcomes of the GRADE assessment in two formats: i) an assessment based on the single estimate of effect only and ii) this assessment and the non-pooled studies as a combined narrative summary. Results are presented in summary of findings tables, and we provide rationales for judgements in footnotes beneath the Evidence profile table.

Combining p values and vote counting by direction of effect
Combining p values can be used where studies report no, or minimal information beyond p values and direction of effect. This approach answers the question 'is there evidence that there is an effect in at least one study?' but provides no information on the magnitude of effects. 2 For the product requests analysis, one effect had a significant p-value where the effect favoured the control. The analysis was conducted with and without this study included.
Because of a limited number of p values available, vote counting based on method of effect was employed for three outcomes in this review: purchasing/sales, body weight and dental outcomes. To conduct this synthesis, first we explored multiplicity within the data (where multiple effect measures of the same outcome domain were reported). As recommended by Cochrane for applying the vote counting based on the direction of effect synthesis method, we selected one effect measure per study per outcome to include in the synthesis. The selection was based on decision rules, namely (i) to first identify the most relevant effect measure in relation to the aims of the review, then (ii) if effect measures are equally relevant select the first reported effect. Only n=3 studies provided more than one effect measure for a single outcome, and in each case, both effects were equally relevant to the review aims so the first reported effect in each study was selected for synthesis. For the product requests analysis, one effect had a significant p value where the effect favoured the control. The analysis was conducted with and without this study included.
Five categories of effect direction were used in the review: i.
Clear effect of public health harm, where the effect estimate favours the intervention and the 95% CI excludes the null; ii.
Unclear effect of potential public health harm, where the effect estimate favours intervention but the 95% CI includes the null and is wide; iii.
No difference in effect, where the 95% CI crosses the null but is narrow; iv.
Unclear effect of potential public health benefit, where the effect estimate favours the control but the 95% CI includes the null and is wide; and v.
Clear effect of public health benefit, where the effect estimate favours the control and the 95% CI excludes the null.
If 95% CI were not reported, the p value was used to determine whether the direction of effect was clear or unclear (or if no difference existed, taken as p>.05) but never to determine the direction of effect. If no effect estimates or p value were reported, effects were always classified as 'unclear'. Author reports of effect direction (e.g., if authors stated that one value was significantly greater than the other) and/or statistical significance (e.g., if no p value was reported but authors stated that no significant difference had been identified) were used to guide decisions.
We applied the binomial probability test on the (i) number of clear effects of public health harm and (ii) number of unclear effects of potential public health harm each compared to the number of effects clearly favouring the control, potentially favouring the control, or showing no effect. Analyses were conducted using the 'prop.test' function in R. The null hypothesis tested is that p=.5 (i.e., that there is an equal probability of effects of public health harm versus not). Therefore, the two-sided p value, based on Pearson's chi-squared statistic, reflects the proportion difference between effects with undesirable effects (clear or potential public health harm) versus those with desirable effects (clear or potential public health benefit) or no effect. Significant p values can represent either a significantly smaller proportion of undesirable effects or a significantly larger proportion of undesirable effects compared with effects in the other categories. A non-significant p value is indicative of no significant differences in the proportions. Narrow CIs reflect more precise estimates of the proportion of interventions with desirable effects, due to an increased number of studies in the analysis. We also provide a combined harvest plot for the three outcomes synthesised in this way. The harvest plot is an effective, clear, and transparent way to portray evidence from a heterogeneous evidence base, especially where primary studies are not well-suited to statistical pooling. 3,4

Meta-analysis method
Meta-analyses were conducted for three outcomes: diet, food choice and food preference.
For the diet and food preference outcomes effect measures were continuous data so we computed the standardised mean difference [SMD = mean exposure -mean control / pooled SD] and the standard error. The interpretation of the SMD was as follows: 0.2 was indicative of a small effect, 0.5 was indicative of a moderate effect, and 0.8 was indicative of a large effect 5 . In the current analysis, a positive SMD was indicative of greater consumption/preference after exposure to food marketing relative to a control condition. If the design was within subjects the standard error formula was adjusted by the correlation in line with Cochrane recommendations. 6 As the correlation between measurements was not readily available, we used r = .59, in line with previous research. 7 Where a standard error was reported in place of a standard deviation, we converted it using the formula SD = SE * √N.
Some studies had binary outcomes (e.g., child consumes cariogenic foods: yes or no) but otherwise fit our inclusion criteria. To include these studies, we computed the Odds Ratio and 95% confidence intervals and then converted the odds ratio to standardized mean difference using the formula SMD = logOR /1.81 from Chinn (2000) 8 , and the variance of the SMD was calculated as variance of logOR*(3/3.1416 2 ).
For the diet outcome, several studies provided quartile estimates rather than standard deviations, to convert these the formula Q3 -Q1/1.35 was used in line with Cochrane recommendations. 6 For the preference outcome, where studies provided percentages in each group / condition, we calculated the raw numbers and if the percentages were not granular enough, we rounded the raw numbers to the nearest whole integer. If a study generated more than one effect size (e.g., two relevant experimental conditions, or two relevant effect measures for the same outcome) we took an average effect size. However, in the case of Pettigrew et al. (2013) we computed an effect size separately for TV and digital marketing exposure by dividing the N of the control group by 2 to allow for discrete subgroups. 9 Neyens et al. (2017) included both a TV and digital advertisement experimental condition (each compared to control). However, as the outcome was binary, we were unable to adjust the control group. The standard errors for this study will be narrower in this case, however removal of either of the effect sizes did not substantially influence the overall effect (as evidenced by leave-one-out analysis reported in the results section). One study (Toomey et al. 2013) had a 0 cell, therefore in line with recommendations 9 we added 0.5 to all cells in the 2 x2 table. Caution is advised as this often leads to overestimation of estimation of variance (as can be seen in the Forest Plot), which reduces the weight of the study. Several studies (n=5) included a cross-over design with a binary outcome (e.g., exposure to both experimental and control stimuli). In line with previous studies, these cross-over trials were poorly reported (e.g., condition totals only reported and not separated by A/B design (where intervention arm A is undertaken first by participants) and B/A design (where intervention arm B is undertaken first)). 10 Therefore, cross-over odds ratios were not able to be calculated. 11 To obtain an effect size we treated these studies as parallel trials to compute the log odds ratio and standard error. However, we did not include them in our main analyses (including subgroups), in line with Cochrane Recommendations, so results are reported separately.
For the choice outcome because effect measures provided binary choice data, log odds ratios were calculated and used in the meta-analysis. The standard error of the log odds ratios was calculated using the formula . If percentage choice rather than raw scores were reported we calculated the raw scores, if the percentage was not granular enough and did not lead to a whole integer we rounded as appropriate. Cross-over trials were not reported in enough detail (separately for A/B and B/A designs as described above), as such we were unable to calculate the appropriate precision. 11 Therefore, we treated these trials as parallel designs, but did not include them in the main analyses and instead ran a supplementary analysis on these studies (see results). Log odds ratios are converted using the exponential function to Odds Ratios for interpretation.

P curve analysis method
P curve examines the distribution of p values < .05 (based on Z-Scores). 12 If there was no effect of marketing on the outcome the distribution of p values would be uniform (no curve). If there is a true effect, the distributions of p values should be more frequent at p<.01 compared to p~.05 (right skew). If there is evidence of selective reporting (or p-hacking) then there will be a greater frequency of p values ~p=.05 (left skew). The continuous test is reported. This computes the p value for each significant p value from the individual effect sizes in the meta-analysis (the pp value). The pp values are standardized (Z-scored), the sum of the Z-Scores is divided by the number of tests and the resulting Z-score and corresponding value is the test for evidential value. We also report visual descriptions of the p curves. 13 GOSH performs separate meta-analysis on a number of subsets of the available data. In smaller meta-analyses this is usually all possible subsets, however here we chose to limit to 100,000 subsets. This means the metaanalysis was run 100,00 times using different iterations of included studies. From this we can examine the average effect size across the 100,00 models, the average I 2 statistic but also plot these to examine any influential studies.

Sensitivity analyses
Leave-one-out analyses demonstrated little variability in the effect when individual studies were removed (SMDs ranged 0.22-0.26), all models were statistically significant (ps<.001). Trim and fill analyses did not identify any hypothetical studies to impute, and no studies had a DFBETA value >1. Egger's regression test was also not significant (Z=0.13, p=.89).

GOSH analysis
The average I 2 across the 100,000 models was 70.85% and the average effect size was SMD=.246. All pooled effect sizes were positive, as can be seen from the histogram (eFigure 1).

Subgroup analyses by risk of bias
There was no statistical evidence that study bias significantly moderated effect sizes in RCTs (X 2 (1)=0.19, p=.66) and there was no statistical evidence of an association between bias scores and effect sizes in NRS (B=-.02, p=.69).

Subgroup analyses by study design
The effect size for RCTs (N = 31 eFigure 2) was SMD=0. 20

P-curve analysis plot
The p-curve demonstrated clear right skew (see eFigure 9) which is demonstrative of evidential value (and lack of selective reporting).

Sensitivity analyses
Leave-one-out analyses demonstrated little variability in the effect when individual studies were removed (OR 1.67-1.95), all models were statistically significant (ps<.001). Trim and fill analyses did not identify any hypothetical studies to impute, and no studies had a DFBETA value > 1. Egger's regression test was not significant (Z=1.54, p=.124).

GOSH analysis
The average I 2 across the 100,000 models was 74.06% and the average effect size was OR=1.702. A small number of pooled effect sizes were negative as can be seen from the histogram (eFigure 10). These were likely driven by the inclusion of one effect which was in the opposite direction from others.

Subgroup analyses by risk of bias
There was no statistical evidence that study bias significantly moderated effect sizes in RCTs (X 2 (1)=2.07, p=.15).

P-curve analysis plot
The p-curve demonstrated clear right skew (see eFigure 27) which is demonstrative of evidential value (and lack of selective reporting).
eFigure 27: Distribution of significant p-values from the p-curve test on food preference data. Explanations 1 Moderation analyses found no evidence that study bias significantly moderated the effect sizes of RCTs, the non-pooled RCT also had "some concerns" (like the majority) so would be unlikely to affect this overall outcome.

foods and non-alcoholic beverages.
This assessment is based on a single observational study of moderate quality. The sample size was large (>12,000 participants) and effects consistent (OR 1.5-3.2) across three marketing formats but below the typical threshold for a 'large effect', data speak directly to the research question but in the absence of further studies to corroborate this effect certainty remains very low.
Observatio nal Overall, the observational evidence is very uncertain about the effect of FNAB marketing on dental health outcomes. This outcome is assessed from just two studies in which one found a significant effect across two measures and the other found no effect. Level of certainty is affected by serious risk of bias, inconsistency, and indirectness of the intervention in providing direct evidence for the research question.