Analysis of Benefit of Intensive Care Unit Transfer for Deteriorating Ward Patients

Key Points Question Which deteriorating ward patients benefit from intensive care unit transfer? Findings This analysis of a cohort study of 4596 deteriorating ward patients used an instrumental variable approach to evaluate estimates of person-centered effects of ICU transfer and mortality. This study found an increased risk reduction in 28-day mortality in patients transferred to ICU who were older than 75 years and had greater illness severity (National Early Warning Scores >6). Meaning The instrumental variable approach in this study found that benefits of intensive care unit transfer may increase with age and baseline physiology score.


Overview
Near-far matching is a matched-pair IV study design that aims to reduce weak instrument bias by ensuring that within the matched pairs, units are 'far' apart on the instrument but 'near' according to other baseline covariates 1-3 . Hence there are I matched pairs, with the units in each pair indexed by j ={1,2}, which are 'near' according to observed covariates (x), i.e. xi1≈ xi2, but far according to the instrument (Z), implying (Zi1 -Zi2) is large for each matched pair i=1,…I.

Application of the near-far matching algorithm to strengthen the IV in the (SPOT)light study, while balancing baseline prognostic variables
We use the same matched data as a previous paper that assessed the overall effectiveness of ICU transfer using the (SPOT)light data. For full details of the matching algorithm, interested readers are Mahalanobis distance metric which is robust to low-incidence binary variables and variables with highly skewed distributions 5 . For two covariates, the current level of care, and the recommended level of care, a small fraction of data were missing. Instead of imputing these missing values based on a model, a method recommended by Rosenbaum (2010) 5 was used. Missing values were imputed using the mean for that covariate and a separate indicator for whether the value was missing was created. The imputed values were included in the distance calculation. The indicators for missing data were subsequently included in the match to ensure that the rate of missingness was balanced across comparison groups.
Next, to obtain the near-far match, Keele et al (2019) matched with a reverse caliper 1 so that only those matches where the difference in the instrumental variable (Zi1 -Zi2) exceeded a threshold, Λ, were acceptable. The larger the required difference in the instrumental variable, the more difficult it may be to find those units similar according to observed covariates. Hence, to balance covariates while strengthening the instrument, it may be necessary to remove some observations. Here, the matching algorithm used optimal subset matching 6 , which seeks to find the largest set of matched pairs such that the average matched distance within a pair did not exceed a particular threshold .
The parameter can also be viewed as a penalty, describing the cost of excluding a treated individual from the match. In general, as the value for is increased, sufficiently large samples meet the average distance criterion, so that the match does not exclude anyone (at the cost of greater covariate imbalance), and as is decreased, more units will be excluded so that only the closest pairs will be retained in the match.
Keele et al (2019) undertook a grid search which iterated over values of Λ and to produce those matches judged to lead to sufficient balance, and instrument strength. In this study, Λ=1.5 times the standard deviation of the instrumental variable (number of beds available) and =1000 for the average matched distance, were judged most appropriate. Choosing Λ=1.5, resulted in the exclusion of those matched pairs where the difference between the matched patients was that there were less than three beds available in the ICU at the time of assessment. Hence, patients were excluded from the matched data if it was not possible to find a corresponding patient with similar covariates but a difference of at least three ICU beds available at assessment. The net result was that the sample size was reduced from 13,011 (unmatched) to 9,192 patients (4,596 matched pairs), but the characteristics of the patients included in the matching were similar to those excluded (eTable 2).

Performance of the matching algorithm
Following the near-far matching, the balance of the matched data according to the level of the instrument was assessed for each baseline covariate. First, the standardized mean differences were reported across two groups defined as the subsamples of patients who were admitted when there were 'many' versus 'few' beds available. Here the results show that the standardized differences were relatively low (all less than 10) (see main text, Table 1). Second, the levels of the covariates (rescaled by their standard deviations) were compared according to the levels of the IV, the number of beds available. If the covariates influencing outcomes tend not to vary across levels of the IV (as is the case here (eFigure 1), this increases confidence that the IV only influences outcomes through its influence on the likelihood of ICU transfer. Overall, the near-far matching algorithm provides good covariate balance across the levels of the IV.
As intended, the near-far matching algorithm also increased the strength of the instrument, assessed using the Cragg-Donald Wald F-statistic for weak instruments and by comparing the proportion of patients transferred to ICU when there were 'many' versus 'few' beds available. In the near-far matched data, the proportion of patients transferred to ICU was 10.3 percentage points higher when 'many' beds were available (43.4%) than when 'few' beds were available (33.1%) and the Cragg-Donald Wald F-statistic for weak instruments was 70.962. For comparison, when the instrument is not strengthened, that is when a 'near' matching algorithm was used instead, the difference in the proportion of patients transferred to ICU was only 7.1 percentage points (33.9% versus 41%) and the Cragg-Donald Wald F-statistic was 63.247.
According to these results the near-far matching algorithm balanced the observed covariates, that is: age, gender, NEWS SOFA and ICNARC physiology scores, CCMDS level at assessment, and timing (out of hours, winter, and weekend or not) while increasing the imbalance for the IV as desired (STROBE checklist).
We now describe the intuition behind the particular IV approach taken to report the effectiveness of ICU transfer for deteriorating ward patients. eAppendix 2: The intuitive ideas behind essential heterogeneity, marginal treatment effects (MTEs) and person-centered treatment (PeT) effects.

Essential heterogeneity
Studies examining the impact of ICU care often distil comparisons down to a single number that represents the average incremental benefit or harm 7,8 . This approach ignores evidence that there is substantial variability in the case-mix of patients admitted to ICU, and the effectiveness of ICU care may be heterogeneous. A further challenge is that the selection of patients for ICU transfer is according to risk factors that modify the effectiveness of ICU care, and many of these factors, such as the patient's frailty or pre-admission health status may remain unmeasured. In particular, clinicians may select those patients for ICU transfer according to their anticipated gain in health outcome, but their health status at assessment is not measured in the data recorded.
These issues limit the usefulness of traditional approaches that report the effectiveness of ICU care overall, or for a limited range of measured patient subgroups. Together, heterogeneous effects of ICU transfer and selection into ICU based on anticipated gains are termed 'essential heterogeneity'.
In the presence of this 'essential heterogeneity' previous methodological research has shown that traditional IV regressions tend to estimate a local average treatment effect parameter that is often not interpretable, or of clinical relevance. In particular, the resultant estimate only applies to the illdefined subgroup who would have switched treatment modality according to a change in the level of the instrument 8-15 .
We address these concerns with a recently developed econometric methodology that uses an instrumental variable (IV) to address selection biases in observational studies and establish personcentered treatment (PeT) effects [16][17][18] . PeT effects estimate an average treatment effect for each person in the data, conditioning on their observed characteristics and the level of the IV, and crucially accounting for their individualized distribution of unobserved heterogeneity (see next section). Consequently, such individualized effects can help answer distributional questions on effectiveness, such as examining the benefits and harms of ICU care versus care in a general ward, and identifying subgroups that are most likely to benefit from such care.

Marginal treatment effects (MTEs)
To provide the intuition behind these concepts, we pose a clinical question; is transfer to ICU effective for deteriorating ward patients? Some clinicians believe that the effectiveness of ICU care differs according to patients underlying health condition and age, and select patients for ICU accordingly. One challenge for the analysis is that not all of the required variables are measured.
Suppose the available dataset contains for each patient: age, mortality, and a valid instrumental variable, the number of ICU beds available at assessment (NBA). The fewer the NBA, the less likely the patient is to be transferred to ICU.
A local instrumental variable (LIV) approach can be used to overcome the problem of essential heterogeneity when a multivalued instrument, such as NBA, is available. LIV methods are used to estimate the marginal treatment effects (MTEs) parameters. MTEs are the treatment effects for those individuals for whom the influence of the observed characteristics (age and NBA in the stylized example), balance with those of the unobserved confounders (medical history) on the decision to transfer the patient, such that the clinician is indifferent to the decision as to whether or not to transfer the patient to ICU (see Figure 1 in main text).
To estimate an MTE, the LIV approach compares the outcomes of two groups of similar patients (say aged 50), where one group is faced with a constraint of d beds available and the other a slightly weaker constraint d+ε, with ε representing a slight increase in beds available. These two groups of patients should be identical with respect to the distribution of their risk factors (observed and unobserved) provided NBA is independent of all risk factors affecting outcomes. Hence the individual's propensity for transfer is identical, beyond the slight difference in the constraint according to the number of beds. By definition, this independence assumption will hold if NBA is a valid instrumental variable (assumption 2 below). The decision as to whether to transfer these similar patients to ICU is only according to the NBA, which does not directly influence outcomes (assumption 2 below). Therefore any difference in average outcomes between these two groups is only driven by the receipt of ICU care or not, for this margin of patients where the clinicians were indifferent between transfer and not, but were nudged to transfer by the small perturbation of the instrumental variable, i.e. NBA.
For this margin of patients, we can quantify a normalized level of unobserved confounders that was sufficient to balance their observed confounders at the considered level of NBA (d). Here, normalized means a scalar score that represents a balancing score for unobserved risk factors, irrespective of their empirical distributions. One can think of the normalized level of unobserved confounders as the propensity not to transfer the patient based on unobserved confounders. If the observed and unobserved risk factors do not balance, then the small perturbation induced by the nudge would have been inadequate to affect treatment selection. The obvious but crucial consequence is that we can use this insight to quantify the effect of the unmeasured confounders on treatment selection.
By definition for marginal patients, the propensity to transfer the patient to ICU equals the propensity not to. The difference in average outcomes between the two groups of similar patients (e.g. aged 50) represents the marginal treatment effect (MTE) for those patients at that particular normalized level of unobserved confounders.
Similarly, for another dyad of NBA, d' and d'+ ε, one can estimate another MTE at another normalized level of unobserved confounder. In this way, a full schedule of MTEs can be estimated that vary over the unobserved confounder levels (i.e. past medical history here) given the level of the observed confounders (i.e. age here). MTEs can be calculated by considering different values of the observed covariates, which will imply different values of the normalized unobserved confounders. 2 Once MTEs are estimated over the range of observed and normalized unobserved confounder levels, they can then be aggregated to form meaningful treatment effect parameters such as the ATE, CATEs, ATT and ATC.

Person-centered treatment (PeT) effects
The MTEs can also be aggregated to study heterogeneity in effects using person-centered treatment (PeT) effects. PeT effects are obtained by averaging the MTEs over only the normalized levels of the unobserved confounder (e.g. medical history) that conforms with the observed decision whether or not to transfer the patient to ICU. Intuitively, if based on a patient's observed information (age), it is unlikely that they would be transferred to ICU, but we observe that they are in fact transferred, this conveys useful information about their unobserved confounders (e.g. medical history) (See eFigure 2 for further details). Accordingly, when averaging the MTEs to estimate an effect for this patient conditional on their observed covariates, we would not consider MTEs that imply values of the unobserved confounders that are incompatible with the observed transfer decision. By taking account of the individual's context in this manner, the PeT effect is more personalized than CATEs which average across all of the MTEs conditional on the observed covariates.
We next consider the formal models on which the estimation of PeT effects is based. eAppendix 3: Formal models behind essential heterogeneity, marginal treatment effects and person-centered treatment (PeT) effects.

Structural models
We start by formally developing structural models of outcomes and treatment choice 11,12 . There are two treatment states (D = 0 or 1) -transfer to ICU (treated) state denoted by D = 1 and continued care in a general ward (untreated) state denoted by D = 0. The corresponding potential individual outcomes (YD) in these two states are denoted by Y1 and Y0 and can be defined as: where X0 is a vector of observed random variables, is a vector of unobserved random variables which are also believed to influence treatment selection (they are the unobserved confounders), and is an unobserved random variable that captures all the remaining unobserved random variables which influence outcomes but not treatment selection. We assume that observed covariates are exogenous (assumption 1a and 1b) implying that endogeneity only arises through the decision of whether or not to transfer the patient to ICU (D):

Assumption 1: (a)
, We assume the existence of an instrumental variable, Z, that influences whether the patient is transferred to ICU (assumption 2a), but is independent of the unobserved confounders (assumption 2b): To estimate the probability of being in treatment state 1 consistently requires that the instrument is conditionally independent of the unobserved covariates influencing the transfer decision (Assumption 3): Assumption 3: |

Marginal Treatment Effects
A MTE is perhaps the most nuanced estimable effect. It identifies an effect for an individual whose where Y1 and Y0 are the outcomes in State 1 and 0, Y = D*Y1 + (1 -D)*Y0 is the observed outcome and p is the propensity score.

Person Centered Treatment Effects
For a particular individual we will not observe v, meaning we cannot estimate their treatment effect | , and they may not be marginal (i.e. , ) making the MTE inappropriate.
However, their actual treatment assignment allows us to infer that v < P(xo,z) if we observe they were transferred to ICU and v ≥ P(xo,z) if they remained in a general ward. This insight allowed Basu To implement the PeT method on the matched sample, we first estimate the probability of being in the treated group (D=1) conditional on the covariates and the IV (number of ICU bed available) and predict the propensity score for each individual in the matched sample , . We require that this propensity score has mass at any value (rounded to 0.01) for both levels of the exposure so observations at values of the rounded propensity score that do not meet this criterion are dropped.
Next we determine an appropriate model for the outcome equation, g(.). Since our outcome of interest, mortality, is binary we use a probit model for this equation also. We include the baseline covariates, hospital fixed effects, the propensity score and also include interactions between the covariates and the propensity score as suggested in Basu (2015) 17 . The hospital fixed effects are not interacted with the propensity score to preserve degrees of freedom.
After specifying this equation, which is the second stage of an LIV estimand (Eq. 5), we obtain marginal treatment effects using numerical integration. To do so, we compute the marginal treatment effect , and then evaluate it 1,000 times replacing ̂ by a random draw, u, from a giving the average treatment effect on the untreated (ATUT) and also by age category, NEWS score, NEWS risk category, ICNARC score, SOFA score, and age category combined with each of the physiology measures.
To obtain standard errors for the PeT effect estimates, we use a bootstrap approach, where we resample with replacement 1,000 times and for each bootstrap sample we repeat the entire process outlined above including estimation of the propensity score and the outcome models. The standard deviation of the individuals' (or sub groups') bootstrapped estimates represents the standard error of the PeT estimate.
We applied a logistic regression analysis to explore which subgroups of patients were predicted to have an estimated reduction in the absolute risk of 28-day mortality following ICU transfer of greater (or less) than 10%, the magnitude of clinical benefit upon which the initial power calculation for the (SPOT)light study was based. Of the 9,068 patients, 3,472 (38.5%) had PeT estimates exceeding this threshold (PeT < -10%). The model included the following predictors: age, age squared, gender, diagnosis of sepsis, peri-arrest, NEWS, SOFA and ICNARC physiology scores, CCMDS level at assessment and recommended level, and timing (out of hours, winter, and weekend or not). Since PeT estimates for some patients are more precisely estimated than for others, the logistic regression was also estimated after weighting each individual's data by the inverse of the standard error of their PeT estimate. The results are presented in eTable 6. The patients predicted to benefit more from ICU transfer were those who were older, admitted during winter, during the week and within usual office hours, and with higher physiology scores indicating greater severity. Model 4 removed the physiology scores completely, Model 5 included the physiology scores but did not include squared terms. Model 6 included interactions between the physiology scores but did not include squared terms, while Model 7 included physiology scores, squared terms and interactions.
As shown in eTable 10, the overall ATE is quite robust to these changes. eTable 11 reports similar sensitivity analyses for the conditional average treatment effects by subgroups for 90 day mortality. (20.9%) ICU, intensive care unit; SD, Standard Deviation, CCMDS, Critical Care Minimum Dataset; ICNARC ICNARC, Intensive Care National Research and Audit Centre; SOFA, Sequential Organ Failure Assessment; NEWS, National Early Warning Score. The NEWS score ranges from 0 (least severe) to 20 (most severe). The SOFA score from 0 (least severe) to 14 (most severe), and the ICNARC physiology score from 0 (least severe) to 100 (most severe).

Baseline PeT
Model 2 SLS