Assessment of Risk of Harm Associated With Intensive Blood Pressure Management Among Patients With Hypertension Who Smoke

This secondary analysis of the Systolic Blood Pressure Intervention Trial uses a random forest–based analysis to assess whether clinically important heterogeneity exists in the risk of harm associated with intensive blood pressure treatment among adults with hypertension who smoke.


Overview
This paper applies recent advances in machine learning for causal inference to conduct a posthoc analysis of a randomized controlled trial (RCT). The Systolic Blood Pressure Intervention Trial (SPRINT) clinical trial we focus on demonstrated that treatment to a lower systolic blood pressure target (<120 mmHg) in non-diabetic adults provides increased benefit over a more modest target (<140mmHg) (1). However, we hypothesized that the positive average treatment effect may mask clinically-and policy-relevant heterogeneity.
Causally interpreting post-hoc analyses of RCTs is challenging because investigators may test a large number hypotheses, but only report those with significant treatment effects. On the other hand, the small set of pre-specified hypotheses registered ex-ante by investigators may leave clinically useful relationships between interventions, outcomes, and subgroups undiscovered.
Recognizing the limitations of conventional approaches to subgroup analyses, and the fact that many clinical trials will be underpowered to detect meaningful treatment variation, a number of newer approaches to identifying heterogeneous treatment effect (HTEs) have been proposed. (2) These include a class of more data-driven predictive risk modeling tools such as Classification and Regression Trees which are typically most appropriate for early exploratory analyses.
The post-hoc analysis method we employ, called causal forest, extends classical recursive partitioning methods (e.g. random forest) to identify causally relevant subgroups defined by interactions of many variables, a combinatorial task for which human intuition and expertise is poorly suited. The initial, and conceptually important, step is to randomly split the data into two © 2019 Scarpa J et al. JAMA Network Open. independent halves, using the first partition for hypothesis generation/tree construction (training data) and preserving the remainder of the data for statistically valid inference (testing data). The method first identifies subgroups with similar treatment effects in the training data, then tests the most promising HTE hypotheses on the testing data to mitigate multiple testing concerns.

Partitioning the data
The SPRINT data (n=9,361) was randomly divided into two equal subsets: a training set for machine learning-based hypothesis generation and a testing set for statistical inference-based hypothesis testing. To ensure the training data was reflective of the whole data set, we constrained the split to guarantee the average treatment effect in the training data was within 1% of the originally reported overall hazard ratio for the primary outcome and covariate distributional balance, both across the training and testing data and between treated and control groups within each partition, using entropy weight minimization. Specifically, to select an optimal split, one thousand different random divisions of the full data were analyzed. To evaluate if the training data was reflective of the whole data set, the Cox proportional hazards regression was used to calculate the hazard ratio in the training data of each split. Splits with a training data hazard ratio +/-1% of the originally reported hazard ratio (0.75) were further evaluated. For these splits, entropy weights were calculated to estimate the covariate balance between the training and testing data. The covariate distribution across treatment and control groups within the training data and the testing data, respectively, was also evaluated. For each split, the variance was calculated for these three weight vectors: the first compares the balance of covariates between the candidate training and testing data, the second compares the balance of covariates between the treatment groups of the candidate training and testing data, and the last compares the covariate balance between the control groups. Each split was assigned three ranks, based on the variance of each weight vector. A composite rank was calculated for each data split by minimizing the variance of three weight vectors. and the optimal split was determined to be the one with the lowest composite rank. The final training data had 4,681 participants and the final testing data had 4,680 participants.

Identification of subgroups using the training data
To identify subgroups, we constructed an ensemble of causal trees (3), a type of decision tree.
Decision trees are especially well-suited for identifying subgroups because they produce a partition of the sample in which subgroups share similar predictions or classifications that is not limited by model specification assumptions (as compared to several other approaches, e.g. (4) and (5)).In each causal tree, half the sample is randomly selected and its covariate space is sequentially partitioned into subspaces. Each split minimizes variation in the mean squared error of the estimated average treatment effect within each subspace. Because the structure of a single tree depends on the training data, different training data may yield vastly different trees. To account for the high variance in any given tree, an ensemble of trees (a "forest") is often used. In this study, we constructed a forest of 1,000 trees.
Trees with an overall treatment effect within 1 median absolute deviation of the ensemble subgroup 1 of patients in the leaf may have had higher mortality due to treatment. Six percent of all leaves met this criterion and were considered high-priority subgroup hypotheses to investigate in the testing data.

Estimating HTE using the testing data
For these subgroup hypotheses, a Cox proportional hazards regression was used to estimate the significance of the hazard ratio for the primary outcome, with stratification according to clinic site. Following standardized protocols for detection of HTEs, the Cox models contain terms for study-group assignment, a subgroup dummy variable, and their interaction. To account for multiple hypothesis testing, we randomly permuted the subgroup assignment in the test data 1,000 times. For each permutation, the cox model was calculated with treatment, subgroup, and their interaction as independent covariates, stratified by clinic site, as employed in the original test data. The false discovery rate (FDR) was estimated by calculating the proportion of the permuted interaction coefficients that were greater than true interaction coefficient. A subgroup was considered adversely affected only if (i) the hazard ratio for the interaction between the treatment and the subgroup was greater than 1 and significant (p < 0.05 and the false discovery rate (FDR) < 0.05) and (ii) the hazard ratio for the subgroup was greater than 1 and significant (p < 0.05)