In JAMA Network Open, Zhong et al1 present an illustrative application on how machine learning methods and statistical techniques can be used to adjust treatment effect size estimates in the presence of treatment nonadherence. The authors examined this in the context of the Effects of Aspirin in Gestation and Reproduction trial,2 which investigated the effect of daily use of low-dose aspirin on the proportion of pregnancies in patients with a history of pregnancy loss. A positive pregnancy outcome was indicated by a positive human chorionic gonadotropin (hCG) test result.
The trial randomized 1227 patients to receive 81 mg/d of aspirin or placebo, with both sets of patients receiving 400 mg of folic acid. Schisterman et al2 collected adherence information by asking whether patients took their daily medication. Approximately 20% of patients in their trial either stopped their randomized treatment entirely or for an extended period. In their secondary analysis of the trial data, Zhong et al1 took a more rigorous approach to defining treatment adherence, requiring patients to take 5 of 7 days of medication for more than 80% of their study enrollment duration. By this definition, 369 patients (30.1%) were nonadherent, 190 (51.5%) of whom took low-dose aspirin.
Schisterman et al2 used intention-to-treat analysis and grouped patient data according to their randomized treatment group regardless of whether the patients adhered to their prescribed treatment regimen. The authors found that patients receiving low-dose aspirin had an estimated 0.043 risk difference in the probability of a hCG-positive pregnancy, but this result was not significant (95% CI, −0.011 to 0.096). However, when applying machine learning techniques, Zhong et al1 found this risk difference to be statistically significant at 0.080 (95% CI, 0.025-0.0136). This result was adjusted for nonadherence and suggests that low-dose aspirin may be associted with increased hCG-positive pregnancies in patients who have a history of pregnancy loss but are not diagnosed as infertile.
Zhong et al1 used augmented inverse probability weighting (AIPW) in their analysis of data from the low-dose aspirin trial. This was likely due to several appealing features of AIPW compared with other statistical methods and the ability to strengthen these comparative advantages via flexible machine learning techniques. This method requires a researcher to specify a probability model for treatment (aspirin use) based on patient covariates (including adherence) and also a probability model for outcome (hCG positivity) based on treatment and patient covariates.
Augmented inverse probability weighting has an appealing feature in that it only needs to specify the treatment or outcome model, which is considered to make the technique “doubly robust.” This means that if the user correctly specifies one of the functional forms of these probability models, the estimated mean treatment effect size may converge to the true treatment effect size as the patient sample size increases.3 Zhong et al1 used flexible machine learning models (via stacked learning) to estimate these 2 probability models, which should increase the likelihood of correctly specifying either model and make results more reliable. This is a potentially powerful property in medical discovery, because researchers can adequately adjust for concerns such as nonrandomization and nonadherence by carefully constructing the treatment and outcome probability models while using AIPW to obtain mean treatment effects.
By combining AIPW and a machine learning approach, Zhong et al1 found an association between low-dose aspirin and hCG-positive pregnancies, whereas the intention-to-treat analysis did not. All these methods used the actual treatment status of all enrolled patients, and patients who did not adhere to the protocol were included in the analysis. The methods described by Zhong et al1 adjust for nonadherence without excluding these patients or adjusting their treatment status, similar to per-protocol analyses. They provided R code to implement this method in their Supplement 2 and have an associated AIPW R package that is publicly available.4 This approach represents a compromise between intention-to-treat and per-protocol analyses using modern day statistical techniques and demonstrates how AIPW and machine learning techniques (eg, stacked learning) may be combined to properly adjust for nonadherence. I hope that this analysis will spark more collaborations between research clinicians and statisticians to tackle common problems in modern medical research, including nonrandomization, nonadherence, and other protocol deviations. Although not all statisticians may be aware of these methods, most PhD-level statisticians have the theoretical knowledge to implement the methods described in this study for other data sets. By working closely with a highly trained statistician, some common issues in medical research can be mitigated via modern statistical methods.
Published: March 9, 2022. doi:10.1001/jamanetworkopen.2021.43422
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2022 Chapple AG. JAMA Network Open.
Corresponding Author: Andrew G. Chapple, PhD, Biostatistics Program, School of Public Health, Louisiana State University Health Science Center, 2020 Gravier St, 3rd Floor, New Orleans, LA 70112 (email@example.com).
Conflict of Interest Disclosures: None reported.
Chapple AG. Use of Machine Learning Approaches and Statistical Techniques to Adjust for Nonadherence in Randomized Clinical Trials. JAMA Netw Open. 2022;5(3):e2143422. doi:10.1001/jamanetworkopen.2021.43422
Customize your JAMA Network experience by selecting one or more topics from the list below.