Remission and Relapse of Dyslipidemia After Vertical Sleeve Gastrectomy vs Roux-en-Y Gastric Bypass in a Racially and Ethnically Diverse Population

Key Points Question Do dyslipidemia outcomes up to 4 years after surgery differ between patients undergoing vertical sleeve gastrectomy (VSG) and Roux-en-Y gastric bypass (RYGB)? Findings This comparative effectiveness study included 8265 racially and ethnically diverse patients who underwent VSG or RYGB and was conducted with a local instrumental variable approach to adjust for confounding in the choice of operation. Dyslipidemia remission was higher for patients who underwent RYGB (59%) than those who underwent VSG (52%) after 4 years, with high rates of relapse that were similar for RYGB and VSG during that time (21% vs 24%). Meaning These findings suggest that patients should be monitored closely throughout their postoperative course to maximize the benefits of these operations for treatment of dyslipidemia.

For the IV analysis, we excluded patients with their physicians having low caseload counts based on which the IV was calculated (n = 352). The final exclusions come from exploration of instrumental variables (IV) and the appropriate segment of the real-world data where a pseudo experiment can be exploited through the use of IVs (eTable 1). Specifically, how we arrived at these numbers is described below.

Analytical Methods
Our primary analysis used a local IV approach to estimate person-centered treatment effects (PeT), along with other average treatment effects. Instrumental variables mimic a pseudorandomization approach to establishing causal effects in healthcare observational research 1 and can address both observed and unobserved confounding. Unlike traditional IV approaches that estimate an effect on the marginal patient induced to select a different treatment due to the instrument, local IV approaches use a continuous IV to estimate the effect on every margin in the patient population. 2 In this way, not only can one address confounding by indication but also the treatment effect heterogeneity can be examined. We report PeT effects (details below), 3,4,5 which are individualized treatment effects for each person in our sample and can be easily aggregated to study population average treatment effects and also sub-group-specific average effects. A clinically intuitive description of these methods has been recently published. 5 In addition, we conducted standard IV analysis using a two-stage residual inclusion approach and the standard inverse-probability weighted propensity score analysis as sensitivity © 2022 Coleman KJ et al. JAMA Network Open.
analyses to provide comparability to the work that has been done comparing the effectiveness of SG vs RYGB for health outcomes. We now describe these methods in detail.

Local Instrumental Variable Method and Person-Centered Treatment (PeT) Effects
Instrumental variable selection and testing. The LIV methods are designed to mimic the conditions of random assignment of treatments occurring in real-world clinical settings. An IV must be correlated with the exposure (choice of bariatric operation) but not associated with the outcome (dyslipidemia remission/relapse) except through its correlation with the exposure. The IV chosen for the current study was based on the stakeholder engagement process mentioned earlier, 19 and was the rate of use of RYGB by each physician during the 12 months prior to each patient's bariatric surgery (RYGB PP) after conditioning on 3-digit zip code and year of surgery.
The use of this IV implies that patients within the same 3-digit zip code in the same chronological year were (pseudo) randomized to receive different operations based on having different surgeons (surgeons traditionally do not practice outside of a service region because of the need for access to assigned operating room time) with different preferences between RYGB vs SG. If such pseudo-randomization is occurring in practice, then the IV should be independent of all confounders, although we only can test this assumption using the observed covariates.
We observed a substantial reduction of observed covariate imbalance across most levels of the RYGB PP IV, especially for the central 80 th percentile of the distribution (eFigure 1). For this Figure, normalization based was on computing a z-score for each covariate by demeaning it and dividing by its standard deviation. A lowess smoother was used to examine the levels of zscore across IV. This gives us confidence that such an approach will also help reduce imbalances in unobserved confounders, especially among those patients who are going to see We found that compared to patients who were treated by surgeons within the central 80 th percentile of RYGB PP, patients treated by surgeons outside the central 80 th percentile of RYGB PP had slightly higher weight loss before surgery, were more likely to be non-Hispanic white, less likely to have sleep apnea, or severe anxiety, and more likely to have gastric duodenitis, chronic kidney disease (CKD) or gastroesophageal reflux disease (GERD) (eTable

2).
To maintain the highest validity of the RYGB PP IV, we restricted our analytical sample to patients treated by surgeons in the central 80 th percentile (exclude N= 2,080). Similar methods to select a valid IV that best mimicked the pseudo-randomization of treatments have been used in the literature. 6,7,8,9 An additional 38 patients were excluded because they did not have a match receiving the opposite treatment but with the same IV-based propensity score to receive RYGB. The final sample consisted of 8,265 patients who had SG (N= 5,412) or RYGB (N = 2,853). At the time of surgery, compared to SG, RYGB patients were more likely to be male or Hispanic, had a higher Elixhauser comorbidity score, higher rates of GERD, T2DM, chronic kidney disease, and lower rates of gastric duodenitis. shows the imbalance of these z scores across treatment, while eFigure 2b shows the imbalance of the same z-scores across IV levels, both conditioned on total volume of prescriptions, year, and 3-digit zip code fixed effects. The balance across IV was improved for all covariates.
We also looked at how the standardized mean difference for each covariate across treatment groups compared to that across the median of the IV. eFigure 3 illustrates this reduction of the standardized mean difference attained by the IV variable.
These explorations of balance improvement in baseline risk factors across the IV variable (without any additional adjustments) suggest that the IV may be acting as a pseudorandomization mechanism that not only reduces imbalance in observed risk factors without any © 2022 Coleman KJ et al. JAMA Network Open.
adjustment, but might also reduce imbalance in unobserved risk factors, thereby increasing the accuracy with which we can estimate a causal treatment effect.

Person-centered treatment effects for SG vs RYGB.
To understand our choice of the local IV estimator (which we describe below), it is important to understand the context of selection bias or confounding by indication in the presence of treatment effect heterogeneity.
There are three key concepts to understand in this situation: 1) "essential heterogeneity", 2) marginal treatment effects, and 3) person-centered treatment effects.
"Essential heterogeneity": Essential heterogeneity is a phenomenon that demands the used of methods such as local instrumental variable approaches to produce estimates for interpretable treatment effect parameters. It is often difficult to discount the idea, unless evidence shows otherwise, that there is substantial variability in the case-mix of patients receiving RYGB versus SG, and the incremental effectiveness of RYGB over SG may be heterogeneous. A further challenge is that the selection of patients for a particular modality of surgery may be driven according to risk factors that modify the effectiveness of RYGB over SG and vice-versa. Many of these factors, such as the patients' pre-admission health status may remain unmeasured in the data at hand. Together, when treatment effects of RYGB over SG are heterogeneous over unobserved confounders, the phenomenon is termed 'essential heterogeneity'. In this situation, most traditional methods for comparative effectiveness have limitations. Methods that rely on selection on observables (regression methods, propensity score-based methods) do not address biases due to unobserved confounders.
Traditional IV methods, that aim to address both observed and unobserved confounders, estimate a local average treatment effect parameter that is often not interpretable, or of clinical relevance. In particular, the resultant estimate only applies to the ill-defined subgroup who would have switched treatment modality according to a change in the level of the instrument, i.e. compliers. 10,11,12,13,14,15 Who the compliers are with respect to physician preference for a surgery is difficult to determine, and whether the effect of RYGB over SG for these patients would apply to other patients is difficult to say.
We address these concerns with a recently developed econometric methodology of local instrumental variable (LIV) method that uses an (IV to address selection biases in observational studies and establish marginal treatment effects that can be aggregated to form PeT effects.
PeT effects represent an average treatment effect for each person in the data, conditioning on their levels of risk factors, and accounting for their individualized distribution of unobserved heterogeneity. Consequently, such individualized effects can help study a variety of distributional questions on effectiveness, such as examining the benefits and harms of RYGB versus SG and identifying subgroups that are most likely to benefit from such care.
"Marginal treatment effects (MTEs)": MTEs are the treatment effects for those individuals for whom the influence of the observed characteristics (say, baseline BMI and the surgery preference of the physician), balance with the influence of the unobserved confounders (medical history) on the decision to use RYGB, such that the physician is indifferent between RYGB and SG. To estimate an MTE, LIV methods compare the outcomes of two groups of, similar patients (say BMI 40 kg/m 2 ), where one group sees a surgeon whose historical rate of RYGB preference is d and the other groups sees another surgeon with preference d+ε, with ε representing a slight increase in the preference for RYGB. These two groups of patients should be identical with respect to the distribution of their risk factors (observed and unobserved) provided physician preference is independent of all risk factors affecting outcomes. This independence assumption will hold if physician preference is a valid IV. Therefore, any difference in average outcomes between these two groups is only driven by the difference in the receipt of RYGB or SG, for this margin of patients where the clinicians were indifferent between RYGB and SG but were nudged to select RYGB by the small perturbation of the IV, i.e., physician preference.
For this margin of patients, we can quantify a normalized level of unobserved confounders that was sufficient to balance their observed confounders at the considered level of physician preference (d). 2,3 Here, normalized means a scalar score that represents a balancing score for SG. The difference in average outcomes between the two groups of similar patients (e.g. BMI 40 kg/m 2 ) represents the MTE for those patients at that particular normalized level of unobserved confounders.
Similarly, for another dyad of physician preferences, d' and d'+ ε, one can estimate another MTE at another normalized level of unobserved confounder. In this way, a full schedule of MTEs can be estimated that vary over the unobserved confounder levels (e.g., medical history) given the level of the observed confounders (e.g., BMI), as long the physician preference (e.g., the IV) ( | ) = c0 + c1 * PSi + c2 * XLi + c3 * PSi * XLi + c4 * XCi + g(PSi), The partial derivative of the predicted probability in each of the models with respect to the propensity score was used as an estimator of the marginal treatment effects in each model. 2 These effects were then aggregated to calculate the PeT effect for each individual in our sample at any specific time since surgery. 3 We explored these comparative effects at every 90-days from alternate forms of surgery. ) = (b0 + b1*D i + b2*X i + b4*C i + b5*R i ), Here, R i acts as a proxy for unobserved confounders, thereby allowing a traditional regression- The second outcomes model follows the approach of the first model. For systolic and diastolic blood pressure, we used linear models. The fundamental challenge of this methods is that it is not guaranteed to produce an estimate of the true ATE as the theory of residual inclusion approach is only set up for a continuous treatment, and when applied to a binary treatment, produces an approximation for the ATE. 19 Instead of using raw-residuals, we used generalized residuals to obtain the best approximation. 19

Standard Propensity Score Methods with Inverse Probability Weighting
Finally, we also explored methods that only account for observed confounding and ignore unobserved confounding. Among a variety of methods available for such adjustment, we chose the inverse probability weight with propensity score estimator because of its robust properties and its use in the prior literature in the context of bariatric surgery.
For this estimator, we have a new propensity score first stage model. It differs from the firststage IV model in two ways: 1) The propensity model does not adjust for the IV, but only X and C, and 2) The model is saturated in that all potential interactions and second order polynomial of covariates are included for adjustment to generate the propensity score, which was used as a balancing score. Overfitting is not a concern in this stage. Let this balancing propensity score be denoted as BPS i . eFigure 5 illustrates the reduction in the standardized mean difference in each of the covariates with weighted by an inverse probability weight using BPS i . It is important to note that this reduction is a result of direct adjustment of the distribution of covariates through the propensity score, whereas in eFigure 3, the reduction is natural without any explicit adjustment.
For the second stage, these scores are used as follows to obtain an estimate for the average treatment effect, under no unobserved confounding: