Comparative Effectiveness of Adalimumab vs Tofacitinib in Patients With Rheumatoid Arthritis in Australia

This comparative effectiveness reseach study assesses the effecetiveness of adalimumab vs tofacitinib for treatment of rheumatoid arthiritis in the clinical setting.

This supplemental material has been provided by the authors to give readers additional information about their work.

eMethods 1: Protocol for the Target Trial to be Emulated
The aim of this study was to estimate the average treatment effect (ATE) of tofacitinib (TOF) compared to adalimumab (ADA) at 3 and 9 months after initiating treatment in patients with rheumatoid arthritis (RA) who are new users of a biologic or targeted synthetic disease modifying anti-rheumatic drug (b/tsDMARD). The ATE was defined as the difference in mean disease activity score, Disease Activity Score-28 with C-reactive protein (DAS28CRP), a composite disease activity score which includes swollen and tender joint counts, a systemic inflammatory marker and a patient-reported outcome. This objective is equivalent to aiming to estimate the intention-to-treat effect in a randomised controlled trial. eTable 1 lists the details of the pre-specified protocol for the target trial which was developed prior to analysis [1].

eTable 1. Design features of pre-specified protocol for the target trial of ADA versus TOF to be emulated
Design feature Detail Eligibility criteria Adult patients diagnosed with RA whose first recorded visit occurred between 1 April 2015 and 1 January 2021 with no prior recorded b/tsDMARD and at least 6 months from their first-recorded visit until Time Zero and at least 6 months of treatment with a conventional synthetic DMARD (csDMARD) immediately prior to Time Zero.

Treatment strategies
Initiate treatment with either ADA (2.9 mg daily) or TOF (10 mg daily). Continue treatment during follow-up unless the patient experiences an adverse event or contra-indication.

Assignment procedures
Participants are randomised to either treatment strategy and are unblinded as to the treatment they receive i.e. a pragmatic trial Follow-up period Starts at randomisation (baseline), with follow-up visits at 3 months and 9 months as per the government reimbursement schedule.

Outcomes
Difference in DAS28CRP at 3 months and 9 months.

Causal contrasts of interest
Intention-to-treat effects Analysis plan Intention-to-treat effect estimated by comparing mean DAS28CRP at follow-up in patients assigned to treatment with ADA or TOF, with adjustment for baseline DAS28CRP.
Time zero Baseline/Time Zero defined as the date at which patients are prescribed ADA or TOF as their first-recorded b/tsDMARD biologic medication. This definition ensures eligibility criteria can only be met at a single time.
to account for baseline differences between treatment groups and should not be interpreted as an analysis of treatment persistence.
Of the 842 included patients who initiated treatment, 56 (7%) who initiated ADA and 14 (2%) who initiated TOF were lost to follow-up after the 3 month timepoint. There were no patients who were lost to follow-up due to mortality.

Baseline covariates
Variable type and overall completeness for the baseline covariates used in this analysis are described in eTable 2. While data on additional recorded comorbidities at baseline were also collected, including previous cardiac disease, cancer, diabetes, tuberculosis, lung disease, thromboembolism and clotting disorder, these comorbidities were not used in the analysis due to sparseness. Dummy variables were generated during data preparation to represent all categorical and ordered categorical variables, as these were required in later steps of the analysis.

eTable 2. Description of baseline covariates in the OPAL dataset
The recording of the presence of a comorbidity in the EMR requires an active step by the rheumatologist. Consequently, the variable on prior recorded renal disease is best interpreted as "known recorded prior renal disease" or "no recorded prior renal disease or prior renal disease unknown". The nature of how comorbidities are recorded may account for the sparseness of the other comorbidities that were considered as possible relevant prior comorbidities. showing putative relationships between measured and unmeasured covariates, and the exposure and outcome. Each arrow in the DAG represents a relationship that is assumed to be causal for the purpose of the analysis. These putative relationships were discussed prior to the analysis.

eTable 3. Baseline features of included and excluded patients
The intention of a DAG is not to depict these relationships with absolute certainty, as this cannot be known, but rather to give transparency about the a priori assumptions underpinning the causal relationships between exposure and outcomes, and possible confounders, and to inform selection of variables to be included or excluded from propensity score models [3].
For example, a clinician's experience may be influenced by their training (which is unmeasured) and may also be related to their gender as a higher proportion of rheumatologists with over 30 years of experience are male. Experience is assumed to directly influence disease activity (the outcome) and the treatment prescribed (exposure). However, indirect influences on treatment and disease activity are also assumed for the clinician's experience e.g. via the presence of a nurse at their practice, the patient's previous disease activity and previously prescribed immunosuppressant medication.
DAGs were generated and analysed using the dagitty package [4], and were visualised using the ggdag package [5].
The DAG can be used to identify colliders, which are variables that have exposure and outcome as a common cause, as well as confounders, which are a common cause of exposure and outcome. The propensity score model should control for the minimum set of confounders in order to reduce bias, but not control for any colliders as this would induce bias.
Possible colliders were identified in eFigure 3 using the ggdag_collider function. eFigure 4 shows the minimum adjustment set for the measured covariates identified using ggdag_adjustment_set. Note that this set also accounts for the effects of possible collider variables. The minimum adjustment set is based on the arrows in eFigure 2 and is intended to identify the smallest number of potential confounding variables to adjust for in the propensity score model while avoiding collider bias. Note that disease duration, which is a common cause of both the outcome (directly and via indirect pathways) and the exposure (indirect pathways via previous immunosuppression or previous disease activity), does not need to be adjusted for in the minimal set, but adjusting for it will not introduce collider bias. is an indicator for whether any of 1 or 2 are observed i.e. whether the patient has any DAS28CRP components at follow-up (n=699 have = 1) • is the treatment indicator, = 1 for TOF (n=273), = 0 for ADA (n=569) • represents fully observed variables at time 0, including: -Patient demographics (age, gender, state, regional location, disease duration) - Year of treatment start -Prior/concomitant medication -Prior/concomitant comorbidities -Practitioner gender, experience, tendency to prescribe biologics -Availability of nurse at practice, regional location of practice -An indicator for each individual practitioner's overall tendency to record the patient global based on their entire EMR The sample comprises any individuals with an observed 0 , 0 , or 2 (n=842).
As percentages of missing data for variables in were below 1%, missing data were replaced by the most frequent category.
1. ( , , 0 , 1 , 2 ) is missing at random (MAR) i.e. any missing data for any DAS28CRP components at any timepoint are missing at random , = 1) = ( = 1| , 0 , = 1) i.e. the probability of treatment assignment depends on the baseline covariates and is not affected by potential outcomes at 3 months or at 9 months 3. Stable unit treatment value i.e. one patient's outcome is not affected by other patients' exposure to treatment 4. Positivity i.e. the conditional probability of patients receiving TOF given and 0 does not equal zero or one for any value of or 0 5. Consistency i.e. the potential outcome of a patient when assigned to that patient's observed exposure is the same as that patient's observed outcome

Methodology
This methodology uses random forest multiple imputation (RF-MI) to impute missing DAS28CRP [6], and stable balancing weights (SBW) to simultaneously account for treatment assignment ( = 1 for TOF, = 0 for ADA) and whether patients have outcomes data at followup ( = 1 for at least one DAS28CRP component observed at 3 or at 9 months, = 0 for no DAS28CRP components at either follow-up timepoint) [7].
Balancing is a different approach for generating weights compared to direct modelling approaches, such as propensity score models. Both approaches aim to achieve balanced distributions of covariates between different groups of a sample e.g. treatment groups. Whereas modelling approaches maximise the fit of a model for treatment assignment to then derive weights and imposes balance conditions on the covariates used to estimate the propensity score, a balancing approach directly optimises certain features of the weights and implicitly models the propensity score [8]. When the correct treatment assignment model is unknown, the balancing approach can result in better covariate balance with weights that are minimally dispersed and stable, meaning that they have minimum variance. This may reduce the standard error of the estimator.
The weights need to ensure that the individuals in the first two groups have the same marginal means of ( , 0 ) as the marginal means of ( , 0 ) in the entire sample i.e. all three groups (n=842).
After this methodology was used to generate the point estimates for ATE, the whole procedure was bootstrapped using = 1000 bootstrap samples in order to generate a 95% confidence interval for the estimates using the MI boot percentile method [9].

Steps
To generate the point estimates for the ATE at 3 months and at 9 months: 1. RF-MI to generate = 10 imputed datasets, followed by calculation of DAS28CRP from imputed DAS28CRP components 2. SBW to account for selection by ( , ) = (0,1) i.e. to generate weights for patients on ADA to account for non-randomised treatment assignment and for whether outcomes were observed at follow-up eMethods 5: Multiple Imputation RF-MI was used to impute missing DAS28CRP components as there were many non-normal variables in the dataset which were thought to have complex, non-linear relationships. The natural order of ordinal categorical variables was maintained. Since the formula for DAS28CRP, below, uses log-transformed CRP and square root-transformed joint counts, the transformed versions of these variables were used in the imputation algorithm. 28 = (0.56 * √ 28) + (0.28 * √ 28) + (0.36 * log( + 1)) + (0.014 * ) + 0.96

Missing data patterns
The combination of patients with missing data at any of the timepoints is described in eTable 5, with values of 1 indicating that at least one component of DAS28CRP was recorded at that timepoint. There are 734 patients with any DAS28CRP components at 0 months (87%), regardless of missing data at follow-up. There are (108) patients with some DAS28CRP components recorded at 3 or at 9 months or at both follow-up timepoints, but who do not have any DAS28CRP components recorded at 0 months (13%).

eTable 5. Patterns of missing DAS28CRP components at baseline and follow-up timepoints for all included patients
There are 699 patients with any DAS28CRP components at 3 months or 9 months or both (83%), regardless of missing data at 0 months.
There are 143 patients with DAS28CRP components at 0 months but no subsequent DAS28CRP components (17%).
There are 0 patients with no DAS28CRP components at 0 months but no DAS28CRP components at 3 or 9 months (0%).
The data do not appear to be monotone missing i.e. it is not the case that if a patient's data are missing at one visit, then the data are also missing at all subsequent visits for that patient.
The combination of patients with any DAS28CRP components at 3 or at 9 months follow-up for those with baseline DAS28CRP components observed is described in eTable 6. There are 143 patients with data on DAS28CRP components at baseline but not at 3 or 9 months. These patients were not included in the analysis that estimated the treatment effect at these follow-up timepoints, but were included in the imputation model to contribute information about the associations between DAS28CRP components at baseline and other baseline components.

Methodology for multiple imputation
RF-MI was performed using the mice package [6], generating m=10 imputed datasets. DAS28CRP was calculated from its imputed components in the complete datasets (see code).

eMethods 6: Stable Balancing Weights
Methodology for stable balancing weights SBW calculates weights for patients who receive ADA and whose outcomes are observed, ( , ) = (0,1) and for patients who receive TOF and whose outcomes are observed, ( , ) = (1,1). Weights are not calculated for patients whose outcomes are not observed, = 0.
The weights aim to balance the empirical distributions of the observed covariates, ( , 0 ), between the treatment groups for patients whose outcomes are observed, and also to balance these distributions with those for the entire sample (patients with at least one observed 0 , 1 or

eMethods 7: Bootstrapping Procedure
To estimate a 95% confidence interval for the ATE using the percentile method, the entire procedure (i.e. steps 1-4) was repeated in = 1000 bootstrap samples drawn from the original sample of n=842 patients, with replacement.
This was implemented using a series of nested for-loops, with the outer loop representing the bootstrap samples and the inner loop representing the imputed datasets.
The percentile method for calculating the 95% confidence interval for the point estimates was implemented (see code).

eResults 1: Performance of Multiple Imputation
Algorithm convergence for the imputation of the missing DAS28CRP components is shown in eFigure 5. As expected, the trace lines intermingle and there are no major trends at later iterations.

eResults 2: Properties of Stable Balancing Weights
Within each imputed dataset, the properties of weights were checked. Balance was also checked (eResults3) for the observed ADA and TOF groups, and for both groups against the whole sample of patients with observed and unobserved DAS28CRP at follow-up.
The properties of the weights estimated using SBW are described in eTable 7, with the minimum and maximum values of weights giving an indication of the dispersion of the weights. Patients who did not have any observed DAS28CRP components at follow-up had a weight of 0.

eResults 3: Balance Checks
Balance in the baseline characteristics between treatment groups and against the entire sample (patients with at least one observed 0 , 1 or 2 , n=842) was checked within each of the imputed datasets.
Within each imputed dataset, the baseline covariates including baseline DAS28CRP were multiplied by the weights for each individual. The mean for these variables in each treatment group are summarised below, along with the mean and standard deviation for each variable in the entire sample without weighting. The standardised mean difference for each covariate shown in eFigure 3 is the difference between treatment groups in the mean for that covariate divided by the standard deviation for the entire sample.
The maximum permitted differences in means for each covariate are also summarised. These values are equal to the standard deviation multiplied by a tolerance value for each treatment group. In a perfectly balanced dataset, the maximum differences in means for each covariate would be zero.
After applying the SBW, the maximum difference between the mean of each baseline characteristic in the ADA and TOF groups was less than 0.03% of the corresponding standard deviation in the entire sample, indicating reasonable balance was achieved.

Imputed dataset 1
The tolerance value for ADA was 0.002 and the tolerance value for TOF was 0.02.

Average treatment effect
The difference in mean weighted DAS28CRP for TOF versus mean weighted DAS28CRP for ADA was calculated at 3 and at 9 months within each imputed dataset, with estimates for the ATE pooled by taking the mean (see code).