Comparison of Long-term Outcomes of Valve-Sparing and Transannular Patch Procedures for Correction of Tetralogy of Fallot

This cohort study compares the long-term outcomes associated with valve-sparing procedures vs transannular patches for correction of tetralogy of Fallot.


eMethods Multiple imputation model
We used the PROC MI procedure in SAS v9.4 with a fully conditional specification model for imputation of missing data. This process creates K datasets with plausible values for missing variables. We created K=50 datasets, which is cautiously higher than the generally recommended 20 to 30 datasets required to produce valid estimates in subsequent analyses 1,2 . A higher number of datasets is now computationally feasible and likely to produce better characterization of variability introduced by the imputation process 3 . We chose the fully conditional specification approach because it does not rely on assumptions of multivariate normality, since most of the variables included in the model were categorical or binomial 4 . It uses an iterative approach by estimating a variable-by-variable conditional distribution with observed cases 5 . The model then randomly imputes a plausible value in each of the K datasets according to the association with other available variables. The fully conditional specification method can be used with monotone missing patterns, or with arbitrary missing data patterns as in our case 6 .
We included all variables likely to be used in subsequent analyses, patients' characteristics associated with the missing variable observed values and variables associated with the probability of the variable to be missing in the model 6,7 . We included information on the type of corrective surgery, the age at correction, the preoperative and postoperative severity of pulmonary stenosis, the postoperative severity of pulmonary regurgitation, a history of previous palliative procedure, the surgical era and the presence of a genetic syndrome in the model because they were the only variables respecting these criteria. Finally, to validate the imputation process, we visually compared the distribution of the genetic condition categories between the imputed and original datasets.

Validation of Missing Completely at Random (MCAR) assumption
The use of multiple imputation relies on an assumption that variables are missing completely at random (MCAR) which means that no association is present between observed and missing data. Since categorical variables were imputed, we opted for a multiple correspondence analysis approach to explore the presence of pattern between missing and observed data, where missing values were considered as a new category for the analysis. No relevant association were found between observed and missing data, which suggest that the MCAR assumption was respected.

Propensity score model
We compared two propensity score computation methods: the first one is the standard binomial model including surgical era as a categorical variable and the second tested a spline function with year of surgery as a continuous variable.
In both cases, the propensity score represents the probability of undergoing a valve-sparing (VS) procedure based on perioperative variables. We used a binomial regression model. The "outcome" was to undergo: (1) valve-sparing procedure or (0) transannular patch (TAP). The variables were included in the model using an ascending hierarchical approach, i.e., they were included one-byone in the final multivariable propensity score model in a decremental order according to their effect size in a univariable analysis. Variables were kept in the model if there was a clinically or statistically significant association with the probability of undergoing a VS correction compared to TAP. The results from the standard binomial model are presented as odds ratios (OR) with their 95% confidence intervals (CI) in supplemental Table S1, below.
The second propensity score method used a cubic spline function to characterize the influence of surgical era. For this method, the surgical year was included as a continuous variable. Cubic spline functions are used to find inflexion points (knots) in the non-linear risk attributable to an independent variable without subjectively assuming categories. We tested multiple approaches: (1) knots manually placed at 5 years intervals, (2) Knots manually placed at 10 years intervals and (3) default function automatically attributing three equally-spaced knots to define inflexion points. The latter was used for the spline model because of lower post-matching R 2 and balancing of preoperative characteristics between groups. Three knots were placed at 1989, 1998 and 2007. The results from the binomial model including the spline function are presented as odds ratios (OR) with their 95% confidence intervals (CI) in supplemental

Mitigation of the surgical era bias
In this study, surgical groups were not evenly distributed across eras. Subjects born after the year 2000 had a 1.4-fold increased risk of undergoing VS rather than TAP compared to the 1980-1990 decades (supplemental Table S1). Concurrently, outcomes have improved significantly when comparing recent eras to earlier eras, which is likely partly attributable to improvements in surgical techniques, cardiopulmonary bypass and cardiovascular critical care. This leads to a potential bias when comparing surgical groups. The era bias is not thoroughly described in the current literature, probably because long-term multi-decade observational studies are scarce 8,9 , but we suggest treating it as a confounding variable in analyses.
The surgical era was included in the computation of the propensity score and was also adjusted in all models by including a covariable based on the subjects' year of birth. In preliminary analyses, we used a categorical variable because we highly suspected a non-linearity of the risk attributable to surgical era. We initially divided surgical era by 5-year intervals but ended up joining adjacent categories because risks were similar both for propensity score and for outcomes. Surgical era is thus expressed as decades of year of birth.
We also tested spline functions to include year of surgery as a continuous variable to correct for era, as described in the "propensity score" section. However, it did not change final results and provided worse adjustment of preoperative baseline characteristics after propensity score matching compared to the use of categories.
In the final analyses, we used only two eras (1980-1999 and 2000-2015) because their effect size for every outcome was comparable to those obtained with the four-category variable. The effect size, hazard ratios and mean ratios of surgical era for survival and interventions final models are presented in supplemental Table S3. Subjects born after 2000 had a 3-fold lower mortality risk compared to subjects born between 1980 and 1999, and a 1.8-fold higher risk of cardiovascular reintervention, suggesting that re-operations are more frequently performed in recent surgical eras.

Survival
Hazard Furthermore, stratified models for surgical era were used to explore the potential of a modifying effect of era on the associations between surgical groups and outcomes. Results from these sensitivity analyses are presented in supplemental Tables S4 and S5.
In the stratified survival analysis, there was significant heterogeneity in the effect of surgical groups between eras. Patients undergoing VS had a lower risk of mortality, independently of surgical era, but that trend was less marked and not statistically significant for subjects born after the year 2000 (supplemental Table S4). There were only 3 deaths in subjects born after the year 2000 which prevents us from concluding that there was a significant difference in survival between VS and TAP in the more recent decades. We can conclude that the observed difference in mortality between the VS and TAP groups is mainly attributable to their occurrence during earlier surgical eras (1980)(1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999).
In the stratified intervention analysis, we did not detect significant heterogeneity in the effect on surgical groups between eras (supplemental Table S5). Mean ratios were similar and led to the same conclusions when compared to the final model presented in the manuscript.

Post-hoc sensitivity analyses: comparison of valve-sparing and transannular patch techniques
As described in the manuscript, patients in the transannular patch (TAP) groups proportionally have a more severe cardiovascular condition than patients in the valve-sparing (VS) groups which could bias conclusions when comparing study groups. The use of propensity scores to adjust analyses to reduce the potential of bias by indication was popularized by Austin, Rosenbaum and Rubin, but the best approach is still being debated 9-149-14 . Some authors have suggested that direct adjustment by confounding variables produced similar results compared with the use of propensity score adjustment 13,15,16 . It has been suggested that reproducing analyses using various methods and interpreting the results from each method would lead to more robust conclusions 17,18 .
As explained earlier, we compared two approaches to compute propensity scores: surgical era treated as a categorical variable or as a continuous variable with spline functions. We also chose to use four different post-hoc adjustment strategies to compare study groups using propensity score adjustment methods: propensity score matching with and without subsequent covariable adjustment, inverse-probability weighting and the inclusion of the propensity score as a covariable. This results in 8 different models (including the final model) depending on the combination of adjustment strategy and propensity score method. We also tested direct covariable adjustment without the use of propensity score as mentioned above. The description of each model is available in Table S6. The results from each model are presented in Tables S11 to S14.

Detailed estimates of outcomes according to the final model
We present the detailed computed estimates for each surgical group and outcome from the final model presented in the article. Results are visually represented in Figure 3

Post-hoc sensitivity analyses: comparison of valve-sparing and transannular patch techniques
The choice of the propensity score method was based on the method that would yield the best balancing of preoperative baseline characteristics between the two surgical groups (TAP and VS). The use of spline functions was inferior to categorizing surgical era into decades for balancing era in the matched cohort (higher SMD and post-matching R 2 ). However, the impact on final results is marginal and compares to results presented in the article. Overall, each method yielded similar results which increases the robustness of our conclusions.
The effect sizes of VS groups compared to TAP with their associated 95% confidence interval and p-values from the post-hoc analyses are presented in Tables S11 to S14 for survival, interventions, pulmonary valve replacements (PVR) and unplanned hospitalizations, respectively. We also reported the percentage of variation between the post-hoc models effect size and the final models' effect size (e.g., Variation %