Key PointsQuestion
What is the effect of a multicomponent workplace wellness program on health and economic outcomes?
Findings
In this cluster randomized trial involving 32 974 employees at a large US warehouse retail company, worksites with the wellness program had an 8.3-percentage point higher rate of employees who reported engaging in regular exercise and a 13.6-percentage point higher rate of employees who reported actively managing their weight, but there were no significant differences in other self-reported health and behaviors; clinical markers of health; health care spending or utilization; or absenteeism, tenure, or job performance after 18 months.
Meaning
Employees exposed to a workplace wellness program reported significantly greater rates of some positive health behaviors compared with those who were not exposed, but there were no significant effects on clinical measures of health, health care spending and utilization, or employment outcomes after 18 months.
Importance
Employers have increasingly invested in workplace wellness programs to improve employee health and decrease health care costs. However, there is little experimental evidence on the effects of these programs.
Objective
To evaluate a multicomponent workplace wellness program resembling programs offered by US employers.
Design, Setting, and Participants
This clustered randomized trial was implemented at 160 worksites from January 2015 through June 2016. Administrative claims and employment data were gathered continuously through June 30, 2016; data from surveys and biometrics were collected from July 1, 2016, through August 31, 2016.
Interventions
There were 20 randomly selected treatment worksites (4037 employees) and 140 randomly selected control worksites (28 937 employees, including 20 primary control worksites [4106 employees]). Control worksites received no wellness programming. The program comprised 8 modules focused on nutrition, physical activity, stress reduction, and related topics implemented by registered dietitians at the treatment worksites.
Main Outcomes and Measures
Four outcome domains were assessed. Self-reported health and behaviors via surveys (29 outcomes) and clinical measures of health via screenings (10 outcomes) were compared among 20 intervention and 20 primary control sites; health care spending and utilization (38 outcomes) and employment outcomes (3 outcomes) from administrative data were compared among 20 intervention and 140 control sites.
Results
Among 32 974 employees (mean [SD] age, 38.6 [15.2] years; 15 272 [45.9%] women), the mean participation rate in surveys and screenings at intervention sites was 36.2% to 44.6% (n = 4037 employees) and at primary control sites was 34.4% to 43.0% (n = 4106 employees) (mean of 1.3 program modules completed). After 18 months, the rates for 2 self-reported outcomes were higher in the intervention group than in the control group: for engaging in regular exercise (69.8% vs 61.9%; adjusted difference, 8.3 percentage points [95% CI, 3.9-12.8]; adjusted P = .03) and for actively managing weight (69.2% vs 54.7%; adjusted difference, 13.6 percentage points [95% CI, 7.1-20.2]; adjusted P = .02). The program had no significant effects on other prespecified outcomes: 27 self-reported health outcomes and behaviors (including self-reported health, sleep quality, and food choices), 10 clinical markers of health (including cholesterol, blood pressure, and body mass index), 38 medical and pharmaceutical spending and utilization measures, and 3 employment outcomes (absenteeism, job tenure, and job performance).
Conclusions and Relevance
Among employees of a large US warehouse retail company, a workplace wellness program resulted in significantly greater rates of some positive self-reported health behaviors among those exposed compared with employees who were not exposed, but there were no significant differences in clinical measures of health, health care spending and utilization, and employment outcomes after 18 months. Although limited by incomplete data on some outcomes, these findings may temper expectations about the financial return on investment that wellness programs can deliver in the short term.
Trial Registration
ClinicalTrials.gov Identifier: NCT03167658
Workplace wellness programs have become increasingly popular as employers have aimed to lower health care costs and improve employee health and productivity. In 2018, 82% of large firms and 53% of small employers in the United States offered a wellness program, amounting to an $8 billion industry.1,2 This growth has been aided by public investments such as the Affordable Care Act, which included funds to promote the development of workplace wellness programs.
Workplace wellness programs tend to focus on modifiable risk factors of disease, such as nutrition, physical activity, and smoking cessation. Despite widespread adoption, causal evidence of such programs’ effects on health and economic outcomes has been limited. Meta-analyses have produced varying estimates of benefits relative to costs.3-5 Observational studies have often been limited by a lack of valid control groups, selection bias, and small samples.6-8 Experimental studies of comprehensive wellness programs have been scarce and have produced mixed results, with most of the more rigorous studies now dated.9,10 Other experimental studies have focused on certain components of wellness, such as smoking cessation and weight loss, using an intervention of limited duration.11-14 A recent rigorous randomized study used individual-level rather than workplace-wide randomization, making it difficult to assess the effects of the tools used by many programs aiming to improve workplace culture or harness peer effects.15
Using a design that randomized the implementation of wellness programming at the worksite level, this study evaluated the effect of a multiyear workplace wellness program on health and economic outcomes over 18 months in a middle- and lower-income employee population at locations across the eastern United States.
The research protocol was reviewed and approved by the institutional review boards at the Harvard T.H. Chan School of Public Health and Harvard Medical School. Written informed consent was obtained from all participants prior to primary data collection. All statistical analyses were prespecified in advance of making any treatment-control outcome comparisons and were publicly archived at clinicaltrials.gov and the American Economic Association Randomized Clinical Trials Registry. The protocol and analysis plan are available in Supplement 1.
A comprehensive workplace wellness program was implemented at a large warehouse retail company, BJ’s Wholesale Club, which employs approximately 26 000 workers across 201 worksites along the eastern United States (eFigure 1 in Supplement 2). The wellness program comprised 8 modules implemented over 18 months, from January 2015 through June 2016 (eTable 1 in Supplement 2). Each module lasted 4 to 8 weeks and focused on key elements of health and wellness, including nutrition, physical activity, stress reduction, and prevention. Programming content was delivered by registered dietitians assigned to the treatment worksites, using both individual and team-based activities and challenges. Modules included modest incentives for participation, most commonly a $25 BJ’s gift card for completing a particular module. Total potential incentives across the program averaged about $250 (details about the modules and incentives are provided in eMethods 1 in Supplement 2). The intervention was designed and implemented by an established wellness vendor, Wellness Workdays.
The wellness program was implemented in a randomly selected subset of worksites through simple randomization using a computer-generated random number (Figure). Worksites, rather than individuals, were randomized because wellness programs often use team-based interventions and aim to change workplace culture and environment.
Forty-one worksites were excluded because they were geographically remote or had substantially different insurance coverage, leaving 160 sites in the sample (mean of 108 employees per site). We randomly selected 20 treatment sites in which the program was available to all employees, with the remaining sites serving as controls in which there was no wellness program. Survey and clinical data were collected at the 20 treatment sites and at 20 randomly selected primary control sites. The remaining 120 sites served as secondary controls. Administrative data were collected from all 160 worksites (eFigure 2 in Supplement 2).
Individuals were assigned to treatment or control status based on their worksite at the time of randomization or initial employment, as subsequent movement between worksites could, in principle, be influenced by the wellness program. Individuals in treatment worksites were eligible, but not required, to participate in the program and could exit the program at any time.16 All individuals employed in treatment and primary control sites at the 18-month mark were free, but not required, to complete the survey and clinical screening.
Prespecified outcomes were collected across 4 domains, of which 2 were gathered in person and 2 derived from administrative data (eTables 2-3 in Supplement 2). Two in-person primary data domains were collected in the 20 treatment and 20 primary control sites at the end of the study period. Self-reported health and behaviors data were collected via personal health assessment surveys and included measures such as exercise, diet, smoking, and alcohol use.17 Clinical measures of health data were obtained from clinical biometric screenings by registered nurses and included blood pressure, body mass index, blood glucose levels, and cholesterol levels. No imputation was done for any unanswered survey items or unmeasured biometrics. We assessed potential selection into in-person data collection by comparing baseline characteristics of employees who participated in surveys and biometrics to those who did not.
Administrative data, gathered for all 160 treatment and control worksites, comprised employment records and health insurance claims collected continuously over the study period. Information on health care spending and utilization were gathered for the subset of workers enrolled in employer-sponsored insurance plans through Cigna, the third-party administrator for this self-insured firm. About half of stably employed workers (defined below) and a third of workers at any time were enrolled in Cigna; there were no missing data for these workers. Employment outcomes were gathered from employment records and included absenteeism and tenure (data available for all employees), as well as available work performance evaluations for the 73% of employees during the study period who had an evaluation.
After randomization of the worksites (with the number of treatment sites limited by the study budget), we conducted initial power calculations before implementing this randomized clinical trial or collecting outcome data. Power calculations were made using data from secondary data sources, including the National Health and Nutrition Examination Survey, the Behavioral Risk Factor Surveillance System, the Medical Expenditure Panel Survey, and commercial insurance claims, to generate benchmark means and standard deviations, using standard assumptions about intracluster correlation and power. Details on these power calculations and estimated detectable differences are provided in eMethods 2 in Supplement 2.
In our primary analyses, we estimated the effect of working at a treatment worksite on outcomes, regardless of participation in the wellness program.18 For administrative outcomes, we compared all employees at treatment sites with all employees at control sites (an intention-to-treat design); for in-person primary data outcomes, we analyzed employees who were available at the 18-month mark (analogous to intention-to-treat, assessed in the available population).
We used an individual-level linear model with an indicator for employment in a treatment site as the key independent variable. This captured the effect of offering the opportunity to participate in the wellness program. As with most retailers, there was natural employee turnover during the study period. We included all individuals employed at any point during the study period. While administrative outcomes were available for all individuals, in-person primary data were gathered from participants in screenings and biometric collection at the 18-month mark. The study sample also included those who worked full time and those who worked only part time. The model weighted each individual by exposure to the program, as measured by the work schedule and share of the treatment period the individual was employed, described in eMethods 2 in Supplement 2.
To improve the precision of estimates and balance between treatment and control groups, we controlled for age, sex, age-sex interactions, race, and initial employment characteristics (not plausibly affected by the program) and also included minimum variance weights constructed to make the distribution of age, sex, and race representative of the entire study population—a method that performs better than a model-based approach that fits a propensity score.19-21 Data on race/ethnicity, used to describe the population and compare demographics between study groups, was gathered from voluntary survey responses of study participants to multiple-choice options presented by the investigators.
Because multiple measures within an outcome domain may reflect the same fundamental outcome, we prespecified standardized treatment effects across categories of clinical measures of health, self-reported health behaviors, and mental health and well-being. The standardized treatment effect is a summary measure of closely related outcomes and denotes the mean change across all of the components in the domain, measured in units of standard deviations (that is, the size of the estimated effect for an outcome relative to standard deviation of that outcome, averaged across all of the outcomes in the domain).
Because of the potential for type I error due to multiple comparisons, as prespecified, we also adjusted for multiple inference within outcome domains and reported both standard, per-comparison P values and adjusted, “family-wise” P values that accounted for multiple inference, using a conservative approach of grouping together a wide range of outcomes following the Westfall and Young method with 1000 bootstrap replications.22 Standard errors were clustered by worksite.
In addition to the effect of working in a treatment site, the effect of actually participating in the program may be of interest. Since participation was voluntary (and thus potentially related to health or health behaviors), a simple comparison of participants to non-participants risks producing biased estimates. We used a standard 2-stage least squares instrumental variables approach to estimate the local average treatment effect of program participation, with randomization into treatment as the instrument for participation. Our primary definition of participation was the completion of at least 1 program module, but we also tested robustness to other definitions (completion of at least 3 modules and number of modules completed).
We assessed the heterogeneity of program effects by age and sex among prespecified and key outcomes by testing for differences in the coefficient of interest using an interaction term between treatment status and the demographic characteristic of interest. We also conducted a number of secondary analyses. First, we estimated program effect on a prespecified cohort of stably employed workers who were employed for at least 13 consecutive weeks prior to the intervention. Second, we evaluated aggregate employment and claims outcomes at the worksite level. Third, we analyzed key outcomes using only the exposure weights. Fourth, we estimated logistic models for binary outcomes.
To assess endogenous selection into program participation, we compared the baseline characteristics of program participants to those of non-participants in treatment sites. To assess endogenous selection into participation in primary data collection, we compared baseline characteristics of workers who elected to provide clinical data or complete the health risk assessment to those of workers who did not, separately within the treatment group and the control group. This enabled us to assess any potential differential selection into primary data collection. Additionally, to examine differences in findings between our randomized trial approach and a standard observational design (and thereby any bias that confounding factors would have introduced into naive observational estimates), we generated estimates of program effects using ordinary least squares to compare program participants with nonparticipants (rather than using the variation generated by randomization).
Two-tailed tests were used, with a significance level of P = .05. Detailed methods are available in eMethods 3 in Supplement 2.
Population and Participation
The study population included 4037 individuals at the 20 treatment worksites, 4106 at the 20 primary control worksites, and 24 831 at the 120 secondary control worksites. Their demographic and employment characteristics, with balance weights, are shown in Table 1.
About 20% of the population was black and 18% Hispanic. Full-time workers comprised about 60% of the study population. Mean earnings for full-time salaried workers was slightly under $50 000 per year, and full-time hourly workers earned about half that amount. Population characteristics without balance weights are shown in eTable 4 in Supplement 2.
Program participation increased from 12.2% in the first module to, on average, 30.6% in the subsequent modules (eTable 5 in Supplement 2). Overall, 35.2% of individuals ever employed in treatment sites completed at least 1 module and 21.4% completed at least 3 modules (mean of 1.3 modules). Among those who completed at least 1 module, 60.9% completed at least 3 modules, with a mean of 3.7 modules completed. Participation in the personal health assessment survey and biometric screening at the 18-month mark (June 2016) was 25.8% and 25.5%, respectively, among individuals ever employed in the 20 treatment or 20 primary control worksites during the study period. Among individuals employed in June 2016, mean participation in surveys and screenings was 42.4% and 42.8%, respectively, across these 40 worksites.
Tables show effects of working at a treatment worksite and of participating in the wellness program on main outcomes (Table 2, Table 3, Table 4, Table 5). Full results across domains and for alternative populations are shown in eTables 6-10 in Supplement 2.
Self-reported Health and Behaviors
Effects on self-reported health and behaviors are shown in Table 2. The number of individuals providing these outcomes ranged between 1722 and 2022 (35.3% to 41.4% of individuals employed in June 2016). Randomization into a treatment worksite led to a higher proportion who reported engaging in regular exercise by 8.3 percentage points (95% CI, 3.9-12.8; unadjusted P < .001, adjusted P = .03) (treatment group, 69.8% vs control group, 61.9%), and who reported actively managing their weight by 13.6 percentage points (95% CI, 7.0 to 20.2; unadjusted P < .001, adjusted P = .02) (treatment group, 69.2% vs control group, 54.7%).
For some outcomes, such as smoking and alcohol use, randomization into treatment had a statistically significant effect by traditional P values, but statistical significance was not robust to multiple inference adjustment. For rates of smoking, the unadjusted treatment group mean was 17.3% and control group was 24.6% (adjusted difference, −6.9 percentage points [95% CI, −12.9 to −0.9 percentage points; unadjusted P = .03, adjusted P = .52]). For number of alcoholic drinks per week, the unadjusted treatment group mean was 4.0 and control group was 4.6 drinks (adjusted difference, −0.6 drinks [95% CI, −1.1 to 0.0 drinks; unadjusted P = .04, adjusted P = .69]). Other outcomes in this domain were not significantly affected by randomization into treatment (all P values >.05) (Table 2).
In the standardized treatment effect, health behaviors were 0.07 SD better (95% CI, 0.02 to 0.10; P = .001) in the treatment sites. There was no detectable effect on the standardized treatment effect for mental health and well-being (0.001 SD [95% CI, −0.05 to 0.05; unadjusted P = .97]). (As a single index, standardized treatment effects do not have adjusted P values.)
Clinical Measures of Health
Results for clinical measures of health are shown in Table 3. The number of individuals providing these outcomes ranged between 2082 and 2139 (42.6% to 43.8% of individuals employed in June 2016). High cholesterol levels (30.3% in the treatment group vs 29.3% in the control group), hypertension (26.5% in the treatment group vs 23.1% in the control group), and obesity 43.5% in the treatment group vs 43.0% in the control group) did not differ between groups. Randomization into a treatment worksite did not have a detectable effect on any clinical measures of health (all P values >.05) or their standardized treatment effect (−0.03 SD [95% CI, −0.09 to 0.03; unadjusted P = .37]).
Health Care Spending and Utilization
Results for health care spending and utilization are shown in Table 4. The sample size was 7631 or 23.2% of all employees during the study period, with no missing data among these individuals with employer-sponsored insurance. Medical spending averaged $3583 per employee per year in the treatment group vs $3953 in the control group. Pharmaceutical spending was a mean of $1412 per employee per year in the treatment group vs $1215 in the control group. Medical cost-sharing averaged $780 per year in the treatment group and $778 in the control group. Pharmaceutical cost-sharing was a mean of $102 per year in the treatment group and $94 in the control group. Randomization into a treatment worksite did not have a detectable effect on health care spending or utilization (all P values >.05).
Table 5 shows results for absenteeism, work performance, and job tenure, derived from the full sample of 32 974 employees (for absenteeism and tenure, where there were no missing data) and 24 054 for work performance (73% of employees had a performance review). Workers were absent (sick or personal time) for a mean of 2.5% of scheduled hours in the treatment group vs 2.6% in the control group. Employees scored better than 3 out of 5 on their job performance review 60.6% of the time in the treatment group vs 60.5% in the control group. Workers were employed for a mean of 305.9 total days during the study period in the treatment group vs 308.8 in the control group. Randomization into treatment had no effect on absenteeism, work performance, or tenure (all P values >.05) (Table 5).
Local Average Treatment Effects
For self-reported health and behaviors, participation in the wellness program (defined by participation in at least 1 module) led to a higher share who reported regular exercise (10.6-percentage point difference; 95% CI, 5.3-16.0; unadjusted P < .001, adjusted P = .03) and higher share actively managing weight (17.2-percentage-point difference; 95% CI, 9.1-25.4; unadjusted P < .001, adjusted P = .01) compared with the control group. No other outcome in this domain was significantly affected by program participation. The standardized treatment effect showed that health behaviors were 0.09 SD better (95% CI, 0.03-0.13; unadjusted P = .001) for wellness program participants (Table 2).
Participation in the program led to no detectable effects on clinical measures of health, with a standardized treatment effect of −0.04 SD (−0.12 to 0.04; unadjusted P = .36) (Table 3). There were also no detectable effects on health care spending or utilization or employment outcomes (all P values >.05) (Tables 4 and 5).
Program effects among prespecified and key outcomes were not significantly different between men and women (P for interaction >.05; eTable 11A in Supplement 2). However, the increase in regular exercise was driven by workers aged 40 years or older (P for interaction = .01; eTable 11B in Supplement 2).
Secondary and Sensitivity Analyses
When alternative definitions of participation were used, the effect of participation was numerically greater among participants who completed at least 3 modules than those who completed at least 1 module, although most estimates were not statistically significant (eTable 12 in Supplement 2).
Estimates using the stably employed subsample were similar to those from the full sample (eTables 5-9 in Supplement 2). Analyses of spending, utilizations, and employment outcomes at the worksite level yielded similar results to those obtained at the individual level (eTables 7-9 in Supplement 2). Estimates of program effect using only exposure weights produced similar estimates to the main findings (eTable 1 in Supplement 2). For binary outcomes, estimates using logistic regressions were similar to those using linear models (eTable 14 in Supplement 2).
Selection Into Program Participation
Comparisons of preintervention characteristics between participants and nonparticipants in the treatment group provided evidence of potential selection effects. Participants were significantly more likely to be female, nonwhite, and full-time salaried workers in sales, although neither mean health care spending nor the probability of having any spending during the year before the program was significantly different between participants and nonparticipants (eTable 15 in Supplement 2). There was no evidence of differential selection into completion of surveys or biometrics between treatment and control groups on baseline covariates (eTable 16 in Supplement 2).
Moreover, an observational approach comparing workers who elected to participate with nonparticipants would have incorrectly suggested that the program had larger effects on some outcomes than the effects found using the controlled design, underscoring the importance of randomization to obtain unbiased estimates (eTable 17 in Supplement 2).
This randomized clinical trial of a multiyear, multicomponent workplace wellness program implemented in a middle- and lower-income population found that individuals in workplaces where the program was offered reported better health behaviors, including regular exercise and active weight management, but the program did not generate differences in clinical measures of health, health care spending or utilization, or employment outcomes after 18 months.
That the program affected self-reported health behaviors, but not health or economic outcomes, may be interpreted in several ways. Given that workplace wellness programs focus on changing behavior and that behavior change may precede improvements in other outcomes, these findings could be consistent with future improvements in health or reductions in spending. On the other hand, behavior change is likely easier to achieve than improvements in clinical or employment outcomes. Thus, there may remain no detectable effects on those outcomes, which would have implications for the return on investment in wellness programs.
The finding of no significant effects on clinical measures of health, health care spending, or employment outcomes is consistent with a recent trial of a wellness program implemented at the University of Illinois, which evaluated similar outcomes after 1 year.15 However, our study found a sizeable and robust improvement in some self-reported health behaviors. Moreover, we found that participants did not have lower preintervention spending than nonparticipants, although there was selection on other dimensions. Unlike the Illinois study, this intervention was implemented at the worksite level (rather than varying across individuals within the same worksite), perhaps better facilitating changes in workplace culture and providing greater social supports for behavior change. This intervention was also fielded in a different population, set of geographies, and employment setting, making it difficult to isolate the causes of any differences in findings.
These findings stand in contrast with much of the prior literature on workplace wellness programs, which tended to find positive and often large returns on investment through, for example, reductions in absenteeism and health care spending.3-9,23,24 Given that most prior studies were based on observational designs with methodological shortcomings such as potential selection bias, results based on random assignment of the intervention are likely more reliable.
This study has several limitations. First, although this population was diverse, results may not generalize to other workplace settings or populations. Second, the ability to detect treatment effects was limited by statistical power, despite prespecified strategies to maximize power. This challenge was augmented by our very conservative approach to multiple-inference adjustment, which grouped a wide array of outcomes (rather than narrowly construing related outcomes). It was further limited by employee turnover that restricted the workers present to participate in end-of-study primary data collection, although the mean duration of employment was similar among the 3 groups of the trial (Figure), suggesting that entry and exit from the sample was due to natural exogenous employment turnover, not the wellness program.
Third, not all employees contributed data for every outcome. Survey and biometric data were available only for individuals employed at the 18-month mark who chose to participate in primary data collection. However, there was no evidence of differential selection into completing the survey and screening. Claims data were available only for employees with Cigna coverage, although no data were missing in this sample. All individuals contributed employment outcomes, except performance reviews, which represented 74% and 73% of employees in the treatment and control groups, respectively. Overall, all available data on employees were analyzed; rates of missing data were similar between groups and may thus have affected the precision of estimates but do not seem to have adversely affected the validity of the findings.
Fourth, this study was unable to disentangle effects of particular elements of the wellness program, nor assess the effects of a differently configured wellness program. Rather, it evaluated the program as a package, with implementation that varied only idiosyncratically in small ways across worksites. Such design features are in fact common in most wellness programs.3-6
Among employees of a large US warehouse retail company, a workplace wellness program resulted in significantly greater rates of some positive self-reported health behaviors among those exposed compared with employees who were not exposed, but there were no significant differences in clinical measures of health, health care spending and utilization, and employment outcomes after 18 months. Although limited by incomplete data on some outcomes, these findings may temper expectations about the financial return on investment that wellness programs can deliver in the short term.
Corresponding Author: Zirui Song, MD, PhD, Department of Health Care Policy, Harvard Medical School, 180A Longwood Ave, Boston, MA 02115 (song@hcp.med.harvard.edu).
Accepted for Publication: March 6, 2019.
Correction: This article was corrected on April 16, 2019, for data errors in the Abstract and Figure and for omissions to the Additional Contributions section.
Author Contributions: Drs Song and Baicker had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design, acquisition, analysis, or interpretation of data, drafting of the manuscript, critical revision of the manuscript for important intellectual content, statistical analysis, obtained funding, administrative, technical, or material support, and supervision: Both authors.
Conflict of Interest Disclosures: Dr Song reported no disclosures. Dr Baicker reported receiving personal fees from Eli Lilly outside the submitted work and reported serving on the board of directors of Eli Lilly.
Funding/Support: This work was supported by the National Institute on Aging (R01 AG050329; P30 AG012810 through the National Bureau of Economic Research), Robert Wood Johnson Foundation (grant 72611), and Abdul Latif Jameel Poverty Action Lab North America. BJ’s Wholesale Club provided in-kind logistical and personnel support for the fielding of the wellness program.
Role of the Funder/Sponsor: The funders had no role in the design or conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.
Additional Contributions: We thank José Zubizarreta, PhD, Harvard Medical School, for his statistical guidance and contributions to the sample weights, without financial compensation. We thank Sherri Rose, PhD, Harvard Medical School, for her statistical guidance on randomization, without financial compensation. We thank David Molitor, PhD, and Julian Reif, PhD, University of Illinois at Urbana-Champaign, for guidance on the statistical software for multiple inference adjustment they created in the University of Illinois wellness study, which was used in this study, without financial compensation. We thank Ozlem Blakeley, MA, an employee of Harvard Medical School, and Kathryn Clark, BA, BS, and Bethany Maylone, MEd, employees of the Harvard T.H. Chan School of Public Health, for research assistance and project management. We also thank Josephine Fisher, BA, Jack Huang, AB, Harlan Pittell, BS, and Artemis (Yuanxiaoyue) Yang, BA, employees of the Harvard T.H. Chan School of Public Health, for research assistance. We thank Luke Sonnet, BS, University of California, Los Angeles, for replicating the study results through the Abdul Latif Jameel Poverty Action Lab’s Research Transparency and Reproducibility Initiative, without financial compensation. We thank the study partners, BJ’s Wholesale Club, and Wellness Workdays for collaboration and assistance in the design and fielding of the workplace wellness program. We thank seminar participants at the 7th Conference of the American Society of Health Economists, the Department of Nutrition at the Harvard T.H. Chan School of Public Health, and the Department of Health Care Policy at Harvard Medical School for comments and suggestions, without financial compensation.
Data Sharing Statement: See Supplement 3.
8.Pelletier
KR. A review and analysis of the clinical and cost-effectiveness studies of comprehensive health promotion and disease management programs at the worksite: update VIII 2008 to 2010.
J Occup Environ Med. 2011;53(11):1310-1331. doi:
10.1097/JOM.0b013e3182337748PubMedGoogle ScholarCrossref 9.Fries
JF, Harrington
H, Edwards
R, Kent
LA, Richardson
N. Randomized controlled trial of cost reductions from a health education program: the California Public Employees’ Retirement System (PERS) study.
Am J Health Promot. 1994;8(3):216-223. doi:
10.4278/0890-1171-8.3.216PubMedGoogle ScholarCrossref 14.Cahill
K, Hartmann-Boyce
J, Perera
R. Incentives for smoking cessation.
Cochrane Database Syst Rev. 2015;(5):CD004307.
PubMedGoogle Scholar 15.Jones
D, Molitor
D, Reif
J. What do workplace wellness programs do? evidence from the Illinois Workplace Wellness Study. NBER Working Paper Series. 2018; 24229.
17.Ware
JE, Kosinski
M, Dewey
JE, Gandek
B. How to score and interpret single-item health status measures: a manual for users of the SF-8 Health Survey.
QualityMetric Inc. 2001;15(10):5.
Google Scholar 22.Westfall
PH, Young
SS. Resampling-Based Multiple Testing: Examples and Methods for P Value Adjustment. New York, NY: Wiley & Sons; 1993.