Association Between Switching to a High-Deductible Health Plan and Major Cardiovascular Outcomes

Key Points Question Are high-deductible health plans associated with an increased risk of major cardiovascular events? Finding This cohort study included 156 962 individuals with cardiovascular disease risk factors who experienced mandated enrollment in health insurance plans with high deductibles but relatively low medication costs, a common value-based feature. Members with high-deductible health plans did not have detectable increases in major adverse cardiovascular events compared with 1 467 758 members with low-deductible health plans. Meaning Among patients with cardiovascular disease risk factors in this study, enrollment in typical high-deductible health plans was not associated with increased risk of major adverse cardiovascular events during 4 follow-up years.


A. Study Group Construction and Individual-level Deductible Imputation Algorithm
To determine employer deductible levels, we used a benefits type variable that we had for most smaller employers (with approximately 100 or fewer employees). For larger employers, we took advantage of the fact that health insurance claims data are the most accurate source for assessing out-of-pocket obligations among patients who utilize health services. Our claims data contained an in-network/out-of-network individual deductible payment field. For patients who use expensive or frequent services, the sum of their yearly deductible payments adds up to clearly identifiable exact amounts such as $500.00, $1000.00, $2000.00, etc. When even several members have these same amounts, it provides strong evidence that the employer offered such an annual deductible level. It is also possible to detect employers that offer choices of deductible levels when multiple employees have deductibles at two or more levels, such as 20 employees with an exact annual amount of $1000.00 and 12 employees with $500.00. For employer accounts with at least 10 enrollees, we therefore summed each member's in-network (individual-level) deductible payments and number of claims over the enrollment year and assessed other key characteristics such as percentage with Health Savings Accounts. We randomly selected half of the employer account data set that contained both our calculated employer characteristics (independent variables, below) and actual annual deductible levels from the benefits table (dependent variable, after categorization; below). We then used a multinomial logistic model that predicted the 4-level outcome of individual-level deductible ≤$500/$500-$999/$1000-$2499/≥$2500 (again, dependent variable) based on multiple aggregate employer characteristics (independent variables) such as the percentage with Health Savings Accounts and Health Reimbursement Arrangements, the deductible payment per employer in the 75 percentile of payments, the percentage of employees reaching exact deductible levels or with deductible payments but not reaching an exact deductible level, the employer account size, the percentage of enrollees per account with summed whole dollar annual deductible amounts (from claims data) between $0 to <$100, ≥$100 to ≤$500, >$500 to <$1000, ≥$1000 to <$2500, ≥$2500, etc.
The statistical model was as follows: Logit(Pr=Yi) = β0 + ∑ Where: Yi = dependent variable (4-level deductible category) Xki= k th characteristics for i th employer β0= intercept βk= coefficient for k th characteristic The SAS code we used to implement this model was: proc logistic data=csn_impute_PLUS_to_be_imputed descending; class d_wusd1perc_0_100_cat d_wusd1perc_100_500_cat d_wusd1perc_500_1000_cat d_wusd1perc_1000_2500_cat d_wusd1perc_ge2500_cat d_wusd2perc_0_100_cat d_wusd2perc_100_500_cat d_wusd2perc_500_1000_cat d_wusd2perc_1000_2500_cat d_wusd2perc_ge2500_cat d_wusd3perc_0_100_cat d_wusd3perc_100_500_cat d_wusd3perc_500_1000_cat d_wusd3perc_1000_2500_cat d_wusd3perc_ge2500_cat d_wusd4perc_0_100_cat d_wusd4perc_100_500_cat d_wusd4perc_500_1000_cat d_wusd4perc_1000_2500_cat d_wusd4perc_ge2500_cat; model real_dduct_cat = pyr sampletot hsa_cnt_over_total cdhp_cnt_over_total perc_grp2 perc_grp3 perc_grp4 perc_grp5 d_wusd1perc_0_100_cat d_wusd1perc_100_500_cat d_wusd1perc_500_1000_cat d_wusd1perc_1000_2500_cat d_wusd1perc_ge2500_cat d_wusd2perc_0_100_cat d_wusd2perc_100_500_cat d_wusd2perc_500_1000_cat d_wusd2perc_1000_2500_cat d_wusd2perc_ge2500_cat d_wusd3perc_0_100_cat d_wusd3perc_100_500_cat d_wusd3perc_500_1000_cat d_wusd3perc_1000_2500_cat d_wusd3perc_ge2500_cat • csn_impute_PLUS_to_be_imputed = name of dataset that contains, at the employer account and benefit year level, accounts with missing deductible levels as well as a random half of the accounts that have actual deductible levels. The other random half is also present in the dataset but with actual deductible levels "hidden" so that they can later be used to validate the predictive algorithm. • real_dduct_cat = dependent variable; category of actual deductible level from the gold standard source (<=$500, $500-$999, $1000-$2499, ≥$2500) • pyr = benefit year of account's information and tied to the calendar year. An employer could have multiple benefit years represented in separate records per account-benefit year. • sampletot = total enrollees per account during the benefit year • hsa_cnt_over_total = percent of members per account listed as having a health savings account • cdhp_cnt_over_total = percent of members per account listed as having a health savings account or health reimbursement arrangement • perc_grp1. Percentage of enrollees per employer-year who have claims but $0 deductible amounts for all annual claims. • perc_grp2. Percentage of enrollees per employer-year who have reached their annual deductible, evidenced by the sum of their deductible payments ending in $*0.00. Members must have at least one month after the month of the $*0.00 summation where the deductible field is blank, and all subsequent months must have blank deductible fields, indicating that the member reached his or her annual deductible amount. • perc_grp3. Percentage of enrollees per employer-year who have an annual deductible amount that does not end in $*0.00. • perc_grp4. Percentage of enrollees per employer-year who have enrollment during the benefit year where all months show no evidence of utilization (no health insurance claims). • perc_grp5. Percentage of enrollees per employer-year who might have reached their deductible, as evidenced by having the last month of enrollment of the benefit year with a summed annual deductible amount that ends in $*0.00. • d_wusd1perc_0_100_cat, d_wusd1perc_100_500_cat, d_wusd1perc_500_1000_cat, d_wusd1perc_1000_2500_cat d_wusd1perc_ge2500_cat. Category of percentage of enrollees with an employer's most common whole number annual individual deductible payment total (e.g. dollar amount ending in 0.00) per employee that is $0 to <$100, ≥$100 to ≤$500, >$500 to <$1000, ≥$1000 to <$2500, and ≥$2500, respectively. • d_wusd2perc_0_100_cat, d_wusd2perc_100_500_cat, d_wusd2perc_500_1000_cat, d_wusd2perc_1000_2500_cat d_wusd2perc_ge2500_cat. Category of percentage of enrollees with an employer's second most common whole number annual individual deductible payment total (e.g. dollar amount ending in 0.00) per employee that is $0 to <$100, ≥$100 to ≤$500, >$500 to <$1000, ≥$1000 to <$2500, and ≥$2500, respectively. • d_wusd3perc_0_100_cat, d_wusd3perc_100_500_cat, d_wusd3perc_500_1000_cat, d_wusd3perc_1000_2500_cat d_wusd3perc_ge2500_cat. Category of percentage of enrollees with an employer's third most common whole number annual individual deductible payment total (e.g. dollar amount ending in 0.00) per employee that is $0 to <$100, ≥$100 to ≤$500, >$500 to <$1000, ≥$1000 to <$2500, and ≥$2500, respectively. • d_wusd4perc_0_100_cat, d_wusd4perc_100_500_cat, d_wusd4perc_500_1000_cat, d_wusd4perc_1000_2500_cat d_wusd4perc_ge2500_cat. Category of percentage of enrollees with an employer's fourth most common whole number annual individual deductible payment total (e.g. dollar amount ending in 0.00) per employee that is $0 to <$100, ≥$100 to ≤$500, >$500 to <$1000, ≥$1000 to <$2500, and ≥$2500, respectively.
This predictive model output the probability that employers had deductibles in the four categories (summing to 1.0) and we assigned the employer to the level that had the highest probability. We overwrote this assignment with the most common whole number deductible amount per year if it was not zero, and with the second most common whole number deductible amount if the most common amount was zero and at least 10 members had the value of the second most common whole number deductible amount. If an employer had members with both enrollment and evidence of utilization, but never had any amounts in the deductible field, we assigned that employer to <$500 deductible level. If an employer had only members that reached a whole number annual deductible amount such as $1000.00 or $2000.00, we assigned the most common deductible amount as the employer's deductible if that amount was greater than or equal to $1000 and to the 95% percentile value if that number was less than $1000. If at least 99% of employees had Health Savings Accounts or Health Reimbursement Arrangements, we also overwrote any previous assignment to classify the employer as a high-deductible employer. We assigned employers to have a choice between deductible levels of $1000 to $2499 and ≥$2500 when both were common and one accounted for at least 85% of $1000-$2499 or ≥$2500 deductible levels reached per employer. If we detected employers that had sufficient enrollees with whole number deductible levels both above and below $1000 (e.g. $250.00 and $1500.00), we assigned the employers' category as "choice," applying a similar 85% rule. Finally, for any employer that had gold standard deductible level information in our benefits file, we overwrote any previous imputed deductible level.
Our file that contains actual deductible amounts per employer covers the "small employer" segment of the insurer's business, a segment that generally includes employers with fewer than 100 or so enrollees. However, it does include a modest number of employers with more than 100 enrollees, even up to approximately 1000 enrollees. The histograms below, where the x-axis represents employer size and the y-axis shows the percentage of employers that are that size, demonstrate the distribution of employer sizes. The second plot "magnifies" the y-axis to demonstrate the smaller number of large employers.
To demonstrate the robustness of our imputation algorithm, and its predictive value as employer size increases (given that we do not have benefits information on most large employers), we took advantage of the fact that although this file mostly covers employers with 100 enrollees or fewer, there is some overlap with larger employers (i.e., those with ~100 to 1000 enrollees). A random half of our imputation sample had the actual deductible levels of employers of all sizes "hidden" from the imputation. Thus, this random half included a modest number of employers with 75 to 1000 enrollees. We tested the sensitivity and specificity of the imputation in this overlap zone, categorizing employer sizes as 75-100, 101-400, 401-700, and 701-1000 enrollees (Exhibit 1). At employers with 75-100 enrollees, we found sensitivity of 95.4% and specificity of 98.3% (Exhibit 1a). Sensitivity and specificity increased across employer size to 100%, and Exhibits 1b-1d display these for employers of sizes 101-400, 401-700, and 701-1000.
We used an employer ID and an algorithm that determined linked employer subaccounts to identify an employer's subaccounts per benefit year, and removed benefit years when employers offered both low and high deductible levels.

Rationale for High-Deductible Cutoffs:
When Health Savings Account-eligible high-deductible health plans came to market in 2005-2006, the Internal Revenue Service set the minimum deductible level for qualifying high-deductible health plans at $1050 (which could be adjusted upward for inflation annually). The range of this minimum deductible during our study period was $1050-$1250. For these reasons, we defined high-deductible health plans as annual individual deductibles of at least $1000 (otherwise some health savings account plans would be excluded). In addition, choosing this cutoff (as opposed to, e.g., $2000) improves the sensitivity and specificity of the imputation because this is common deductible level and more enrollees per employer meet this threshold. This cutoff is also a "real-world" deductible minimum that allows the most generalizable results. It should also be noted that $1000 was the minimum annual deductible level we included and not the mean deductible level. We cannot precisely calculate the mean deductible level of the high-deductible health plan group, but we estimate, using the most common non-zero deductible levels per employer account, an approximate mean deductible of $1900. We defined traditional plans as having deductible levels of ≤$500 after determining that a threshold of ≤$250 would lead to an inadequate sample size for the control group. Again, the mean deductible level of the control group members would be lower than $500.
Our high-deductible health plan group comprised the enrollment years of employers that had a year-on-year transition from low-to high-deductible coverage (i.e., from $500 or less to $1000 or more). Some members had multiple eligible index dates (e.g., multiple low-to-low deductible years or both low-to-low and low-to-high deductible years). In the cases of members with both low-to-low and low-to-high deductible years, we randomly assigned enrollees to the high-deductible health plan or control pool. Then, for members assigned to the control pool that had multiple low-to-low deductible spans, we randomly selected one of their potential index dates (and their corresponding before-after enrollment years).
There were 1,799,404 members with 3,908,743 unique potential patient index date combinations within the overall cohort. Among them, 1,600,531 members had only low-to-low deductible years, 117,217 members had only low-to-high deductible enrollment, and 81,656 members had both low-to-low and low-to-high deductible enrollment. Among those 81,656 members, we first randomly assigned members with statistical "coin flip" to a study group, resulting in 40,962 members for whom we randomly assigned their low-to-high deductible plan switch date as their index date. A very small number of these members had more than one low-to-high deductible plan switch date, so we also randomly chose one of those. For the members remaining from the initial 81,656 sample, we randomly assigned one of their low-to-low deductible plan switches as their index date. Next, among the 1,600,531 members with multiple potential low-to-low deductible index dates, we randomly selected one of the index dates. The final cohort included 158,179 members who underwent a switch from a low-deductible health plan to a high-deductible health plan and 1,641,225 members who remained in a low-deductible health plan throughout the baseline and follow-up periods.
Population. Our study population included people with risk factors for cardiovascular disease at baseline so we could observe the occurrence of major adverse cardiovascular events at baseline and over 4 follow-up years. These individuals were commercially insured members in a large national health insurance database enrolled between 1/1/2003-12/31/2014. This database includes approximately 48 million members with commercial insurance along with their enrollment information and all medical, pharmacy, and hospitalization claims. We included only members with employersponsored insurance.
We considered an insurance plan to have a low deductible if the annual deductible was ≤$500, and we considered a plan to have a high-deductible if the annual amount was >$1000. For smaller employers, we determined the deductible amount from a benefits table obtained from the health insurer. This benefits table mostly included employers with fewer than 100 employees and dependents, but also included a modest number of larger employers (see above). For employers not represented in the insurance benefits table (mostly large employers), we imputed deductible amounts from actual out-ofpocket spending by individuals who utilized health services, using an algorithm with a sensitivity of 98-100% and specificity of 95-100% depending on employer size (eTable1).
The individuals in this study were not able to choose a low-deductible level versus a high-deductible level, because each employer provided only one level of deductible each year. Some employers offered a low-deductible plan throughout the study, and others that had offered a low-deductible plan during an earlier time switched to a high-deductible plan for the remainder of the study.
We defined the index date for employers who switched to high-deductible plans as the beginning of the month when the switch occurred. We defined the index date for employers who did not switch plans as the beginning of the month when their yearly account was renewed. If an employer had multiple potential index dates, e.g., 5 continuous years with a lowdeductible plan or 4 years in a low-deductible plan followed by 1 year in a high-deductible plan, we randomly selected 1 index date. People entered the study at different times because their employers had different index dates. For all individuals in the study, the beginning of study time (time zero) was 12 months before their employers' index date and we defined this 12-month period as the baseline year (eFigure 1). The employer's index date was the beginning of the followup period. For each individual, we measured months from the beginning of the baseline period to the first outcome in the baseline period, and we measured months from the index date to the first outcome in the follow-up period.
Individuals were eligible for the study if their employers were present in the database for at least 1 year before and 1 month after the index date. We then restricted to people age 40-64 because cardiovascular complications are rare before age 40. We further restricted to people who were continuously enrolled for at least 1 year before and at least 1 month after the index date. Among individuals eligible for both the high-deductible and control group, we randomly assigned individuals to one or the other. If they then also had multiple potential index dates, we randomly selected one index date.
Our pre-match sample included 158,179 individuals with cardiovascular risk factors whose employers switched to highdeductible plans and 1,641,225 with the same risk factors whose employers kept low-deductible plans (Table 1).
Both high-and low-deductible plans generally cover medications, secondary preventing screening, and preventive primary care visits at low or no out-of-pocket cost. However, high-deductible health plan members on average must pay substantially higher amounts than low-deductible members for specialist care, diagnostic tests, and surgical procedures. 1

B. Details and Justification of Study Design
Our study design and matching sought to approximate fully blocked randomization, i.e., stratification by key population characteristics then randomization within those strata. eFigure 1 demonstrates our before-after comparison between matched groups study design. In constructing our design and matching approach, we were influenced by the work of Thomas Cook and colleagues 2,3 and Gary King and colleagues 4-6 regarding optimal matching approaches for before-after with control group studies. Four key elements of our study design attempt to reduce selection bias: (1) studying only employers that mandate a given deductible level (2) balancing key employer characteristics given that employers selfselect into insurance types, (3) balancing key person-level characteristics given that these closely influence outcome measures, and (4) attempting to balance unmeasured characteristics using methods validated by Thomas Cook and colleagues that we describe below. 2,3 (1) The primary design feature we use to reduce bias is to isolate our analyses to employers that mandate a given deductible level so that enrollees at those employers experience an exogenous assignment to deductible level. To the extent that this exogenous assignment has no influence on the likelihood of outcomes except through the intervention, our design can approximate a person-level randomization of patients to study groups.
(2) However, we recognize the importance of employer-level factors given that employers "self-select" into insurance types. If such selection ultimately correlates with unmeasured patient-level factors that influence our outcome measures, an effect that we believe is likely to be minimal, reliance on the employer-mandated design feature alone could lead to biased results. Thus, we place a high priority in our matching approach on enforcing balance between similar employer types. We accomplish this by matching on employer propensity score tertile of propensity to switch to high-deductible health plans. We created tertiles of propensity score (rather than, for example, quintiles or a binary measure) on the general principle that we wanted enough categories to ensure that employers would match to similar employers, but not so many categories that lack of common support would excessively reduce sample size. Our approach of creating employer and member propensity tertiles (below) led to good balance of measured employer and member characteristics. To construct the employer propensity score, we included characteristics that predict high-deductible health plan switches, such as pre-switch spending levels and trends as well as employee demographics (see Section I.C).
(3) Within employer tertiles of propensity to switch, we required balance of key member-level characteristics. Our exact match therefore includes member propensity score tertile, predicting the person-level characteristics correlated with being switched to high-deductible health plans. As with the employer propensity score, we created tertiles of member propensity to have enough categories to ensure that members would match to similar members, but not so many categories that lack of common support would excessively reduce sample size. In summary, our approach attempts to balance similar types of people with cardiovascular risk factors from similar types of employers.
(4) However, the above 3 approaches could still leave imbalance on unmeasured characteristics that influence outcomes. To attempt to balance unmeasured characteristics in controlled before-after designs that may include selection bias, we follow the recommendations of Thomas Cook and colleagues. 2,3 These investigators have conducted "within-study comparison studies" in which institutions (with nested study subjects) are randomized to (1) being subject to a natural experiment where institutions can self-select into the intervention group ("group 1a") or selfselect into the control group ("group 1b") or (2) being subject to a second randomization into an intervention group ("group 2a") and a control group ("group 2b"). The "group 2" randomized controlled trial provides the gold standard effect estimate against which multiple matching approaches and subsequent effect estimates in the "group 1" natural experiment arm can be compared. This line of work has emphasized the importance of matching on the "pretest" measure, i.e., the rate and/or timing of outcome measures before the intervention date. Overall, the investigators have concluded, across 3 within-study comparison studies, that matching on both the baseline outcome and available covariates consistently led to effect estimates that closely approximated the "group 2" gold standard randomized controlled trial results. 2,3

C. Coarsened Exact Matching Description, Stata Coding, and Component Variables
We conducted an observational, longitudinal survival analysis study by comparing matched groups of individuals. The intervention group consisted of individuals who were in low-deductible insurance plans for 1 year and then were switched to high-deductible plans for an additional 1 month to 4 years. The control group consisted of matched individuals who remained in low-deductible plans throughout the study (eFigure 1). We matched based on the year of the index date; the propensity of the employer to mandate high-deductible insurance (component variables described below) and the propensity of individuals to work for such employers (component variables described below); 7,8 diabetes, cardiovascular disease, hypertension, or hyperlipidemia detected during or before the baseline year (yes/no); baseline occurrence of stroke, myocardial infarction, or amputation (yes/no); and follow-up duration category (years of follow up). Our reason for including baseline indicators of outcome measures (e.g., occurrence of stroke, myocardial infarction, or amputation) was to balance the future population-level risk of these events during the follow-up period.
We used coarsened exact matching 4,5,9 to match the study groups. Coarsened exact matching is a straightforward approach to balancing certain characteristics of study groups and is only a slight modification of exact matching. In exact matching, investigators determine population characteristics that they believe should be balanced between study groups. The exact matching process chooses control group members who have the exact same characteristics as intervention group members (for example, the same gender, age, and income). "Coarsening" denotes that investigators have leeway, based on their understanding of the clinical or research situation, to categorize a given matching variable into the number or type of categories that are most clinically meaningful or relevant to the research question (e.g., categorize age by 5 year versus 10 year intervals or by other meaningful strata such as 20-39, 40-49, 50-64, etc). Coarsened exact matching is then simply an exact match on the selected variables including by their investigator-defined categories.
Because the percentages of subjects with the exact same characteristics will differ between the intervention and control groups, coarsened exact matching software packages generate simple weights so that control members in the stratum represent the same percentage of the control group as intervention members in the stratum represent of the intervention group. For example, if there are 10 intervention group members who are female, high income, and residing in predominantly black neighborhoods, and they represent 2% of the intervention group, and there are 50 control group members with these same characteristics yet they represent 4% of the intervention group population, the exact matching algorithm would assign the control group members in this bin weights of 0.5 (2%/4%). This effectively "shrinks" this overrepresented control group bin from 50 to 25 members. The following is an example from our data showing a single very small stratum where the Stata coarsened exact matching algorithm upweighted the 3 controls relative to the 2 highdeductible members because the particular strata of characteristics is underrepresented in the control pool: The authors of the approach describe the matching steps in the Stata algorithm as follows: "1. Begin with the covariates X and make a copy, which we denote * ; 2. Coarsen * according to user-defined cutpoints, or [the coarsened exact matching software's] automatic binning algorithm.; 3. Create one stratum per unique observation of * and place each observation in a stratum.; 4. Assign these strata to the original data, X and drop any observation whose stratum does not contain at least one treated and one control unit." 10 Coarsened exact matching tries to mimic stratification by population characteristics and then randomization within the defined strata (i.e., fully blocked randomization). Proponents cite numerous advantages of coarsened exact matching over other approaches such as propensity score matching, matching based on Mahalanobis distance, nearest neighbor matching, and optimal matching. These advantages include better balance, lower root mean squared error, less model dependence, being robust to measurement error, ease of implementation, and compatibility with any subsequent statistical estimation model. [4][5][6] Coarsened exact matching typically can only include a limited number of stratifications, even in large samples. The great challenge of matching is determining which combination of variables, out of a nearly infinite set, should be included in the match. King's work recommends an a priori, "clinically informed" approach to matching as would be used in a blocked randomization. Section I.B above includes rationale for our general matching approach, and below we include the Stata coarsened exact match code and details of the component matching variables: Stata coarsened exact match code: cem r_pscore3cat1 r_pscore3cat2 m_pscore3cat1 m_pscore3cat2 b_flag_149 b_flag_150 b_flag_151 index_yr dm_ind cvd_ind hyper_ind follow_duration_cat,treatment(hdhp) saveold all_suv_1 (1) r_pscore3cat1 r_pscore3cat2 = Indicators of being in a tertile category of employer propensity to mandate a switch from low-to high-deductible health insurance (classified into tertiles). This score is created from a logistic regression with the dependent variable being whether the employer mandated a switch to highdeductible insurance or mandated continuation in a low-deductible plan (0/1) and a set of predictor variables explained below. The SAS code for this regression model is as follows: proc logistic data=empsty1 descending; model hdhp= empsize2-empsize3 agep2-agep6 pfemale racep2-racep5 edup2-edup4 povp2-povp4 stdcst_int stdcst_trd emp_acg median_copay regp2-regp4 indexmon ; output out=propensty prob=pscore_sty; ods output ParameterEstimates=param; ods output oddsratios=OR; run; The independent variables are constructed during the baseline period (please see manuscript for definitions of categorical variables) and include: • empsize2-empsize3 = employer size, measured as number of subscribers and dependents per employer (1-99 in the reference, 100-999, 1000+); • agep2-agep6 = proportion of enrollees in age category (proportion < age 20 is the reference, 20-29, 30-39, 40-49, 50-64, 65 and above); • pfemale = proportion of women; • racep2-racep5 = proportion of enrollees in race/ethnicity category (proportion from predominantly white neighborhood is the reference) • edup2-edup4 = proportion of enrollees in neighborhood education category (proportion from the highest education level is the reference); • povp2-povp4 = proportion of enrollees in neighborhood poverty category (proportion from lowest poverty level is the reference); • stdcst_int = employer total standardized cost level at the beginning of baseline; • stdcst_trd = employer total standardized cost trend during the baseline period; • emp_acg = mean enrollee ACG morbidity score among patients with full baseline year enrollment; • median_copay = median employer baseline outpatient copay for outpatient primary care visits; • regp2-regp4 = proportion of enrollees in four U.S. regions (proportion from the West is the reference, South, Midwest, and Northeast); • indexmon= calendar index month ("anniversary month") when high-deductible group employers mandated a switch from low-deductible to high-deductible plans; or the corresponding assigned anniversary month for employers that remained in low-deductible plans.
(2) m_pscore3cat1 m_pscore3cat2 = Indicators of being in a tertile category of member propensity to undergo a mandated switch from low-to high-deductible health insurance. Similar to the employer propensity score, this is created from a logistic regression with the dependent variable being whether the member experienced an employer-mandated switch to high-deductible insurance or an employer-mandated continuation in lowdeductible insurance (0/1). The predictor variables are listed and explained below. The SAS code for this regression model is as follows: proc logistic data=psmdat descending; model hdhp= emptot indexmon regc2-regc4 age_c4049 age_c5059 boopcat mem_std_cst1 mem_std_cst2 mem_std_cst3 emp_cost_r1 emp_cost_r2 emp_cost_r3; ods output ParameterEstimates=param; ods output oddsratios=OR; output out=propen_all_niddk prob=pscore; run; We constructed the dependent variables, described below, during the baseline period (please see manuscript for definitions of categorical variables): • emptot = continuous employer size, measured as number of subscribers and dependents per employer; • indexmon = calendar index month ("anniversary month") when high-deductible group employers mandated a switch from low-deductible to high-deductible plans; or the corresponding assigned anniversary month for employers that remained in low-deductible plans; • regc2-regc4 = region of residence (West is the reference, South, Midwest and Northeast); • age_c4049 = age in years at calendar index month between 40 and 49; • age_c5059 = age in years at calendar index month between 50 and 59; • boopcat = category of enrollee baseline out-of-pocket spending (<$500, $500-$999, $1000-$2499, $2500+) • mem_std_cst1 = indicator for enrollee's total baseline standardized cost being in the first quartile of spending; • mem_std_cst2 = indicator for enrollee's total baseline standardized cost being in the second quartile of spending; • mem_std_cst3 = indicator for enrollee's total baseline standardized cost being in the third quartile of spending; • emp_cost_r1 = enrollee insured through an employer with the lowest category (out of 4 categories) of ratio of total out-of-pocket costs to total standardized costs, an indicator of health plan generosity; • emp_cost_r2 = enrollee insured through an employer with the second lowest category (out of 4 categories) of ratio of total out-of-pocket costs to total standardized costs, and indicator of health plan generosity; • emp_cost_r3 = enrollee insured through an employer with the third lowest category (out of 4 categories) of ratio of total out-of-pocket costs to total standardized costs, and indicator of health plan generosity.

D. Outcome Measures
Our primary outcome was a composite measure at the patient level of the time in months, relative to either the beginning of the baseline period (for baseline measures) or the beginning of the follow-up period (for follow-up measures) until an individual experienced a myocardial infarction or stroke (whichever came first in the relevant period). We constructed our secondary measures of disaggregated myocardial infarction, stroke, and extremity amputation in the same manner. At the patient level, the above measures could occur in the baseline period, the follow-up period (once per person per period), both, or neither given that repeat events can occur. For the composite measure that added death, a given person could only have event during the follow-up period because we required members to be alive during the entire baseline period.
We used validated health insurance claims-based algorithms to define myocardial infarction 11 and stroke. 12 We based our extremity amputation measure on billing procedure codes that are reliable in indicating the occurrence of a reimbursed surgery (eTable 4). We defined myocardial infarction based on detecting a hospitalization of 3-180 days that had International Classification of Diseases, 9th revision (ICD-9-CM) discharge diagnoses in the first or second position as listed in eTable 4. 11 This algorithm has a positive predictive value of 94.1% (95% confidence interval, 93.0%-95.2%). To define stroke, we also used a hospital-based algorithm and ICD-9-CM discharge diagnoses (eTable 4) in the first diagnosis position. 12 This algorithm has a positive predictive value of 92.6% (95% confidence interval, 88.8%-96.4%).

E. Covariates and Subgroups of Interest
We used version 11.1 of the Johns Hopkins ACG® System 13,14 to calculate members' baseline period morbidity score. The algorithm uses age, gender, and ICD-9-CM codes to calculate a morbidity score and the average of the reference population is 1.0. 14 Researchers have validated the index against premature mortality. 13 To derive proxy demographic measures, the data vendor linked members' most recent residential street addresses to their 2010 US Census tract. 15 Census-based measures of socioeconomic status have been validated 16,17 and used in multiple studies to examine the impact of policy changes on disadvantaged populations. [18][19][20] Using 2008-2012 American Community Survey 21 census tract-level data and validated cut-points, 16,17 we created categories that defined residence in neighborhoods with belowpoverty levels of <5%, 5%-9.9%, 10%-19.9%, and ≥20%. Similarly, we defined categories of residence in neighborhoods with below-high-school education levels of <15%, 15%-24.9%, 25%-39.9%, ≥40%. 16,17 We classified members as from predominantly white, black, or Hispanic neighborhoods if they lived in a census tract with at least 75% of members of the respective race/ethnicity. We then applied a superseding ethnicity assignment using flags created by the E-Tech system (Ethnic Technologies), which analyzes full names and geographic locations of individuals. 22 We classified remaining members as from mixed race/ethnicity neighborhoods. This validated approach of combining surname analysis and census data has positive and negative predictive values of approximately 80 and 90 percent, respectively. 23 Other covariates included age category (40-49 and 50-59, and 50-64 years); sex; US region (West, Midwest, South, Northeast); employer size used as either a continuous variable or with categories of 0-99, 100-999, or >1000 individuals; and calendar month of the index date.
We defined two subgroups of interest for stratified analysesdiabetes and other cardiovascular disease risk factorsusing ACG software. 13,14 The ACG algorithm uses ICD-9 and medication billing codes to identify patients with chronic conditions. We first flagged enrollees with diabetes, and among those not in this group, we defined an "other cardiovascular disease risk factor" cohort based on ACG-flagged patients with established cardiovascular disease, hypertension, or hyperlipidemia.

F. Statistical Analyses
We analyzed time to the primary and disaggregated secondary outcomes in Cox proportional hazards regression models adjusted for sex, employer size, race/ethnicity, education, poverty, and US region. The statistical model for estimating adjusted hazard ratios for our measures, at time t for an individual, was: h(t,X)= hazard function at time t with covariate vector X ℎ ( )= The baseline hazard function at time t β0= intercept βk= coefficient for k th characteristic for the patient We implemented this model in Stata version 15.
We analyzed time to event in the baseline and the follow-up periods in a single model and the key term of interest was an interaction between study period (baseline or follow-up) and study group (high-deductible or control). This term generated an adjusted hazard ratio (aHR) of the high-deductible group compared with the control group at follow up versus baseline.  Abbreviations: HDHP, high-deductible health plan. 1 Values derives using generalized estimating equations with a zero-inflated negative binomial distribution and adjusted for sex, employer size, race/ethnicity, education, poverty, and US region. 2 Among entire cohort . 3 Among diabetes cohort.