Association Between Preoperative Obstructive Sleep Apnea and Preoperative Positive Airway Pressure With Postoperative Intensive Care Unit Delirium

Key Points Question Is there an association between obstructive sleep apnea and delirium after major surgery? Findings In this cohort study of 7792 patients admitted to the intensive care unit after surgery, 26% had obstructive sleep apnea, and delirium occurred in 47%. After risk adjustment, there was no significant association between obstructive sleep apnea and postoperative delirium. Meaning This study found no association between obstructive sleep apnea and delirium in patients admitted postoperatively to the intensive care unit.


eAppendix 3. Exploration of ICU Admission and Missing Delirium Status as Sources of Bias
We found that among all patients admitted to the ICU postoperatively, there was a substantial fraction of neverassessed patients (zero documented CAM-ICU). To reduce concerns from bias caused by missing data, we investigated the patterns of missing assessments. First, we found that although CAM-ICUs were performed during the specified time period (8/2012-8/2106), they were not routinely documented as discrete data until 11/2012 (18% pre and 70% post). We had initially planned to include data through 2018; however, the informatics infrastructure unexpectedly changed after our initial data pull in 2016 and removed the ability to feasibly query data. We therefore excluded cases before that time. Second, we found that the rate of performing CAM-ICU varied dramatically by ICU. Patients in the neurosurgical ICU had CAM-ICUs performed in a low fraction of cases (1.8% with year to year variation); other ICUs had less extreme fractions (cardiac care 66%, bone marrow transplant 55%). The remaining ICUs (cardiothoracic surgery, surgical-burn, medical) had high factions of assessment (84-90%). We also found that many patients in the non-surgical ICUs had relatively minor surgical procedures (tracheostomy, gastric tube placement, wound re-debridement, dressing change under anesthesia). The attribution of delirium in patients with substantial prior ICU stays as "postoperative" would be ambiguous; however, we did not wish to exclude all patients with prior ICU stays as it was not uncommon to have brief pre-admissions to the ICU before major surgery. We therefore excluded patients with >= 6 days of prior ICU stay, as this would eliminate the majority of these minor procedures in severely ill patients. Patients with an index surgery without a prior ICU stay remain in the dataset even if they re-visit the OR in the subsequent 7 days. A version of Figure 1 without these steps is provided in eFigure 2. These filters are a deviation from the analysis plan but primarily refine the included population rather than change the analysis. That is, because almost all excluded patients would have been excluded later (by virtue of lacking a delirium assessment) the actual analytic population is little changed, but the results can be stated with more confidence that selection bias in missing assessments did not play a major role. After applying these restrictions, to explore the possibility of OSA changing ICU admission rates or CAM-ICU performance, we then undertook a propensity analysis of CAM-ICU documentation in 2 steps. First, we fit a propensity model of ICU admission (using decision trees and gradient boosted decision trees) using the entire surgical cohort. Then, among patients admitted to the ICU we fit a similar model with the response an indicator variable of having any CAM-ICU assessments.
The ICU admission model had relatively good predictive performance (AUROC 0.941). A histogram of the fitted values of the ICU-admission model (eFigure 3) shows that while a substantial fraction of ICU patients have high propensity to admission, about half lay in an overlapping region with non-admitted patients. Model diagnostics showed that surgical procedure (CCS category), emergency surgery, and ASA-PS explained the majority of the fitted propensity (R^2 = 82%). Examination of the fitted model showed that OSA status was never used as a decision tree node. However, because of its correlation to other variables, OSA diagnosis was correlated with fitted propensity to admission (regression coefficient 0.37, SE 0.04 on log-odds scale) but explained only 0.7% of the variation in propensity. In the sensitivity analyses below, the propensity model above is used to select unadmitted patients to serve as additional controls.

eAppendix 4. Propensity Score Generation and Primary Analysis
Within each imputation round, propensity scores were calculated, 4 and estimates of the average treatment effect (ATE) or average treatment effect on the treated (ATT) were computed (see below). Frequentist estimates were combined using Rubin's rules, and Bayesian posterior samples of treatment effects were concatenated into a single estimate. 5,6 Propensity scores for the primary and secondary exposures were calculated using gradient boosted decision trees (xgboost library) within each imputation round, with all tuning parameters selected by 5-fold cross validation. All preoperative variables excluding OSA diagnosis, STOP-BANG total and questionnaire values, and neck measurement were included in the propensity model. Surgery was included as both the service category and singlelevel AHRQ CCS. We also experimented with propensity scores calculated by BART, and found them to be minimally different but much more time-consuming to calculate. Others have shown a close relationship between gradient boosted decision trees and BART point estimates. 7 The primary analysis uses these scores as an additional adjusting variable, making the procedure "doubly robust" in that it can produce low-bias estimates if either the outcome model or exposure model is well-specified. 8 Although the practical performance of doubly robust Bayesian regression which includes propensity scores is well known, the theoretical rationale is usually not clear. 9 In the case of BART, which performs well without modification for causal inference in many circumstances, 10 it can be viewed as a re-parameterization which more easily captures selection bias based on perceived risks. 8,11 Propagation of uncertainty in propensity was not used as it has not been shown to improve performance. 9 Cross-validation using an 80% subsample was used to select hyperparameters. This strategy empirically has low bias and total error for causal effects compared to multiple comparators. 12 A histogram of propensity to OSA is shown in eFigure 5.
BART is a Bayesian procedure and therefore produces credible intervals instead of confidence intervals. The credible interval is the area of the posterior with the highest density containing 99% of the probability for the calculated quantity. Because it is non-parametric, the prior is over the functional form of fitted surfaces in the form of splitting probabilities for decision trees, and the "prior" for treatment effects is only indirectly induced. The meaning of the BART prior has been discussed extensively elsewhere. 10,11 While there are guarantees of concordance between credible intervals and confidence intervals only in some specialized cases, others have observed that BART estimated treatment effects have good frequentist properties as well. 12 The association of PAP with delirium was analyzed within OSA + patients only. Patients without OSA (not eligible for PAP therapy) remained excluded even if they used BPAP or CPAP for respiratory failure. The "PAP effect" therefore compares "OSA +, PAP-" to "OSA +, PAP +" and no interaction term is required. Because the OSArestricted sample size is small and because PAP adherence was so frequently missing, we also experimented with adding a "missing" factor level and a term for missing included in the model and with including OSA -patients in the model. "Non-adherent" remained the reference level, and reported effects are for the "adherent vs non-adherent" contrast. Inclusion of these unknown exposure level observations does not directly contribute to the estimate; however, they have an indirect effect of stabilizing the estimated effects of covariates. Because the PAP+ patients are a subset of OSA+, no interaction term is required.
During the preparation of the manuscript, Hahn and colleagues released the bcf package implementing their methods. However, bcf did not accommodate binary outcomes, so the presented results use a customized BART 1.7 validated on the examples of Hill (data not shown). 10 The optmatch package performed matching; mice performed imputation, and the margins package estimated marginal effects for generalized linear models.

eAppendix 5. Sensitivity Analyses and Alternative Analytic Approaches
We validated these calculations by comparison to other methods. These included propensity score matching with 1:1 optimal matching 13 using BART or logistic regression to generate scores instead of gradient boosting, adjustment for covariates in classic BART, classic BART plus propensity scores as a covariate, and logistic regression. We conducted several sensitivity analyses. 1) We performed imputation and estimated an alternative propensity score using the entire perioperative dataset with an indicator variable for non-ICU admission. 2) We restricted the analysis to only patients with pre-specified surgical service (an option field serving as a marker of a complete anesthesia assessment) or only patients whose evaluation was performed in preoperative clinic. 3) We added all patients otherwise meeting inclusion criteria but never CAM-ICU assessed as "negative," all as "positive," and with a random draw from fitted values from a model excluding OSA status. 4) We added patients otherwise meeting inclusion criteria but not admitted to the ICU with a "negative" value for delirium. These patients were selected using a nonparametric model of ICU admission (see above) and 1:1 matched to admitted patients with a similar propensity for admission (20 strata), the same OSA status, and the same CCS level-2 primary procedure code. 5) We excluded components of STOP-BANG (hypertension, gender, BMI, age) as covariates. 6) We imputed missing data without including outcomes as a covariable. 7) We defined the exposure as OSA diagnosis only (not using STOP-BANG). 8) We added potential mediators of an OSA-delirium relationship as covariates (intra-and postoperative benzodiazepine use, total intraoperative use opioid in oral morphine equivalents, postoperative ventilation, postoperative positive airway pressure use). 9) We evaluated a regression of STOP-BANG score versus postoperative delirium only among patients without an OSA diagnosis. Several of the covariates are components of STOP-BANG; the eAppendix 7 contains the effective contrasts being drawn when conditioning on BMI, age, sex, and history of hypertension. The effect of STOP-BANG was evaluated by comparison of fitted values counterfactually setting it to "0". eAppendix 6. Hyperparameters, Tuning, and Variable Importance Duration of markov-chain monte carlo sampling was increased until Geweke diagnostics and autocorrelation-based effective sample sizes were found to be adequate. During tuning of OSA status prediction we used 20,000 posterior draws after a burn-in of 5,000 draws on a singly imputed dataset with an 80:20 cross validation split. During final propensity score calculation, we used 15,000 draws after a 5,000 burn in period for each of 30 imputed datasets. During PAP prediction tuning we used 50,000 posterior draws after a burn-in of 10,000. During BCF and BART prediction of delirium we used 30,000 posterior draws after a burn-in of 5,000 in each of 30 imputed datasets.
For the prediction of OSA status, cross-validation of AUROC selected the following BART parameters: k = 2.76, n_tree = 248. For the prediction of PAP adherence, k = 2.675, n_tree = 59. A sparse dirichlet prior made no meaningful improvement in AUROC (difference of .002) and substantially increased computing requirements. For predicting delirium with only the OSA propensity score, the optimal k was 3.1 and n_tree was 15. For predicting delirium with BCF k = 2.15 and n_tree = 231, ntree_treated = 41. For pure BART predicting delirium k = 2.5, n_tree = 162.
We observed the results to not depend strongly on these parameters. We kept the power and base split probabilities at their default (2, 0.95). We observed minimal dependence of the effect estimates on hyperparameters. For example, using default parameters (k=2, ntree=200 or ntree=50) did not change the point estimate for the effect of OSA by more than 0.01 but did change credible interval widths increasing them by up to 0.03.
Predictor relevance was assessed using loss in predictive accuracy when permuting the predictor, coverage probabilities, improvement in splitting, and mean absolute shapely value. 14 For the matched analysis, we used 1:1 optimal matching with a caliper of 1 on the logistic scale. In order to successfully match all exposed, in the analysis of PAP adherence the caliper was increased to 1.5.
Propensity-adjusted sample comparisons and standardized differences were computed using inverse propensity weights.

eAppendix 7. STOP-BANG Contrasts
Adjusting for covariates always explains away some of the exposure, but in the case where the exposure is a direct function of some covariates, the implicit contrast being drawn by the observed effect of the residual exposure is somewhat complicated.
Take the high-risk STOP-BANG screens as the clearest example. The goal is to compare outcome rates between patients with high risk screens and those without. Elements of the screen (PBAG=BMI, gender, hypertension, age) also have likely direct effects on the outcomes, and so we adjust them as if they were confounders. If we then say STOP-BANG > 4 is the threshold, the net contrasts being drawn are then within strata of PBAG using the number of other factors (SNOT = snoring, tiredness, observed, neck circumference)