A, The 3 data sets (plasma cell-free ribonucleic acid [cfRNA] or transcriptomics, metabolomics, and proteomics) produced a number of different features and had a range of correlations among the measured features. The internal correlation between features from each data set was quantified using the number of principal components (PCs) needed to capture 90% variance (eg, the cf-RNA data set had the most features but was highly correlated internally; therefore, fewer PCs were needed). B, A 2-dimensional representation of all measurements demonstrates the correlation between subsets of urine metabolites and cfRNA detected in plasma as well as a limited number of plasma proteins.
A, A cross-validation strategy was used to simultaneously optimize the integrated model and test the performance of the model on previously unseen patients. Models built on all 3 modalities (transcriptomics, metabolomics, and proteomics) and the integrated model were statistically significantly correlated with GA at the time of sample collection (Bonferroni-adjusted Spearman correlation P < .05). B, The correlation between GA at the time of sample collection and the estimated values on the blinded samples are shown. The shaded area represents the 95% CI. C, The features correlated with the progression of pregnancy (Spearman correlation P < .05) are color-coded according to biological modality. FGF indicates fibroblast growth factor; IGSF3, immunoglobulin superfamily member 3; PAPP-A, pregnancy-associated plasma protein A; PGF, placental growth factor; and SIGLEC6, sialic acid binding Ig-like lectin 6.
A, This receiver operating characteristic (ROC) curve analysis used each biological modality and the integrated approach. The mean area under the ROC curve and 95% CI for each modality were as follows: transcriptomics (AUROC, 0.73; 95% CI, 0.61-0.83), metabolomics (AUROC, 0.59; 95% CI, 0.47-0.72), proteomics (AUROC, 0.75; 95% CI, 0.64-0.85), and integrated (AUROC, 0.83; 95% CI, 0.72-0.91). B, Circle size is proportional to −log10 (Wilcoxon) P value for discrimination between term pregnancies and PTBs. Top features included an inflammatory module (which included interleukin 6 [IL-6]; IL-1 receptor antagonist [IL-1RA], a regulatory member of the IL-1 family whose expression is induced IL-1β under inflammatory conditions; granulocyte colony-stimulating factor [G-CSF]; retinoic acid receptor responder protein 2 [RARRES2]; chemokine ligand 3 [CCL3]; angiopoietin-like 4 [ANGPTL4]; protein-arginine deiminase type II [PADI2]; and transferrin receptor [TfR]) and a metabolomic module (which was enriched for glutamine and glutamate metabolism [Fisher test for pathway enrichment analysis P < 4.4 × 10−9] and valine, leucine, and isoleucine biosynthesis pathways [P < 7.3 × 10−6]).
eFigure 1. Data Quality Assessment
eFigure 2. Urine Metabolites as a Surrogate for PGF in Plasma
eFigure 3. Empirical Algorithm Comparison
eFigure 4. A Lower Bound for the Analysis Pipeline Using a Negative Example
eFigure 5. Analysis of Clinical Covariates
eTable. Table of Clinical Covariates Harmonized Across All Cohorts
eFigure 6. Comprehensive Visualization of Single-Cell-Level Intracellular Signaling in Response to Selected Plasma Proteins
eFigure 7. Top Proteomics Features Activate Intracellular Signaling Pathways in Peripheral Blood Classical Monocytes
eFigure 8. Top Proteomics Features Activate Cytokine Production in Peripheral Blood Classical Monocytes
Customize your JAMA Network experience by selecting one or more topics from the list below.
Identify all potential conflicts of interest that might be relevant to your comment.
Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.
Err on the side of full disclosure.
If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.
Not all submitted comments are published. Please see our commenting policy for details.
Jehan F, Sazawal S, Baqui AH, et al. Multiomics Characterization of Preterm Birth in Low- and Middle-Income Countries. JAMA Netw Open. 2020;3(12):e2029655. doi:10.1001/jamanetworkopen.2020.29655
What maternal biological modalities are associated with preterm birth (PTB)?
In this diagnostic/prognostic study of 81 pregnant women from 5 birth cohorts in low- and middle-income countries, several correlates of preterm birth in urine and blood were found to be associated with PTB. Although cohort-specific signatures were present, a machine learning algorithm was able to generate a model that was capable of predicting PTB across the cohorts.
Results of this study suggest that most PTBs can be predicted using blood and urine samples collected early in the pregnancy, providing opportunities for interventions.
Worldwide, preterm birth (PTB) is the single largest cause of deaths in the perinatal and neonatal period and is associated with increased morbidity in young children. The cause of PTB is multifactorial, and the development of generalizable biological models may enable early detection and guide therapeutic studies.
To investigate the ability of transcriptomics and proteomics profiling of plasma and metabolomics analysis of urine to identify early biological measurements associated with PTB.
Design, Setting, and Participants
This diagnostic/prognostic study analyzed plasma and urine samples collected from May 2014 to June 2017 from pregnant women in 5 biorepository cohorts in low- and middle-income countries (LMICs; ie, Matlab, Bangladesh; Lusaka, Zambia; Sylhet, Bangladesh; Karachi, Pakistan; and Pemba, Tanzania). These cohorts were established to study maternal and fetal outcomes and were supported by the Alliance for Maternal and Newborn Health Improvement and the Global Alliance to Prevent Prematurity and Stillbirth biorepositories. Data were analyzed from December 2018 to July 2019.
Blood and urine specimens that were collected early during pregnancy (median sampling time of 13.6 weeks of gestation, according to ultrasonography) were processed, stored, and shipped to the laboratories under uniform protocols. Plasma samples were assayed for targeted measurement of proteins and untargeted cell-free ribonucleic acid profiling; urine samples were assayed for metabolites.
Main Outcomes and Measures
The PTB phenotype was defined as the delivery of a live infant before completing 37 weeks of gestation.
Of the 81 pregnant women included in this study, 39 had PTBs (48.1%) and 42 had term pregnancies (51.9%) (mean [SD] age of 24.8 [5.3] years). Univariate analysis demonstrated functional biological differences across the 5 cohorts. A cohort-adjusted machine learning algorithm was applied to each biological data set, and then a higher-level machine learning modeling combined the results into a final integrative model. The integrated model was more accurate, with an area under the receiver operating characteristic curve (AUROC) of 0.83 (95% CI, 0.72-0.91) compared with the models derived for each independent biological modality (transcriptomics AUROC, 0.73 [95% CI, 0.61-0.83]; metabolomics AUROC, 0.59 [95% CI, 0.47-0.72]; and proteomics AUROC, 0.75 [95% CI, 0.64-0.85]). Primary features associated with PTB included an inflammatory module as well as a metabolomic module measured in urine associated with the glutamine and glutamate metabolism and valine, leucine, and isoleucine biosynthesis pathways.
Conclusions and Relevance
This study found that, in LMICs and high PTB settings, major biological adaptations during term pregnancy follow a generalizable model and the predictive accuracy for PTB was augmented by combining various omics data sets, suggesting that PTB is a condition that manifests within multiple biological systems. These data sets, with machine learning partnerships, may be a key step in developing valuable predictive tests and intervention candidates for preventing PTB.
Preterm birth (PTB) is defined by the World Health Organization as the delivery of a live infant before the completion of 37 weeks of gestation.1,2 The worldwide rate of PTB in 2014 was estimated to be 10.6% (uncertainty interval, 9.0%-12.0%), with 80% of all cases occurring in South Asia and sub-Saharan Africa.2 Many risk factors for PTB have been highlighted in previous studies and include obstetrical (eg, previous PTB and multiple gestation), medical (eg, maternal obesity, diabetes, and chronodisruption), and external (eg, smoking and maternal stress) conditions.3-9 For example, a meta-analysis of individual- and population-level attributes among 4.1 million births concluded that “unknown factors requiring further research to act upon account for ~2/3 of the preterm birth rate.”10(p13) Unveiling and elucidating the role of early biological antecedents of PTB has been deemed a necessary step toward developing new diagnostic tests and therapeutic interventions.11-13 Biological investigations into the mechanisms of PTB are complicated, as indicated by accumulating evidence that distinct patient subpopulations follow divergent biological trajectories.14,15 Given this heterogeneity, simultaneously studying diverse cohorts is critical for identification of generalizable biological pathways.16
Recent technological advances have enabled the characterization of a broad range of biological changes during pregnancy. Biological layers explored include single-cell profiling of signaling pathways,17 measurements of plasma cell-free ribonucleic acid (cfRNA),18 proteome19,20 and metabolome21 characterization of the microbiome,14,22 and detailed genomics analysis.23 In addition, a recent multiomics investigation demonstrated that biological changes during normal pregnancy involve a number of intricate interactions of biological processes, which can be measured using a coordinated set of assays.24 The integration of the large, multidimensional data sets generated in a multiomics setting requires complex machine learning pipelines that will remain robust in the face of the inconsistent intrinsic properties of these high-throughput assays and cohort-specific variations.15
To our knowledge, this is the first multiomics analysis of term and preterm pregnancies from multiple cohorts in low- and middle-income countries (LMICs). These cohorts were established using biorepositories of samples and phenotypic data for studying maternal and fetal outcomes collected and stored from diverse populations of South Asia and sub-Saharan Africa. The study aimed to investigate the ability of transcriptomics and proteomics profiling of blood plasma and metabolomics analysis of urine to identify early biological measurements associated with PTB.
Approval was obtained from the Stanford University Institutional Review Board, and ethical exemptions were sought and obtained independently from the respective country by each birth cohort supported by the Alliance for Maternal and Newborn Health Improvement (AMANHI) and the Global Alliance to Prevent Prematurity and Stillbirth (GAPPS) biorepositories. Written informed patient consent was obtained from each participant in the original cohorts and extends to the present study. We followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guideline. This study analyzed plasma and urine samples collected from May 2014 to June 2017, and data were analyzed from December 2018 to July 2019.
The study population comprised pregnant women selected from 5 biorepository-supported cohorts in Matlab, Bangladesh; Lusaka, Zambia; Sylhet, Bangladesh; Karachi, Pakistan; and Pemba, Tanzania. No compensation or incentives were provided for participating in this study.
Plasma samples were assayed to measure targeted proteins and cfRNA, and urine samples were analyzed for metabolites. The cfRNA analysis resulted in 20 659 measurements, the targeted proteomics assay measured 1002 proteins in plasma, and 6630 metabolites were measured in urine. The number of measurements of these assays did not correlate with their modularity, as indicated by the number of principal components needed to account for 90% of the total variance (Figure 1A). This result highlighted the need for a 2-layer metadimensional integrative approach to prevent the assays with more measurements to bias the predictive models (eMethods in the Supplement). An overview of the entire data set was produced by first calculating a correlation network of all available measurements and then producing a 2-dimensional layout for visualization using the t-SNE25 algorithm (Figure 1B).
From all AMANHI and GAPPS cohorts, trained phlebotomists collected blood samples for centrifugation and aliquoting of serum, plasma, and buffy coat for storage and future analyses. In addition, maternal urine was collected in parallel. Collection and processing of all sample types were performed according to harmonized operating procedures at all study cohorts. The eMethods in the Supplement provides details on the biological assays.
Data were analyzed from December 2018 to July 2019. All analyses were performed with R, version 3.6.1 (R Foundation for Statistical Computing). All multivariate modeling was performed with a 2-layer cross-validation strategy to prevent overfitting of the data and to ensure generalizability. Mixed-effect models were used to account for cohort-specific variations (eMethods in the Supplement). The analysis is independently reproducible. The measured features from all 3 omics data sets (transcriptomics, metabolomics, and proteomics); the algorithms and source codes for reproduction of the results; and an interactive website for visualizing the entire data set, the feature evaluation scores for PTB and gestational age (GA) at sampling, and the pathway enrichment analysis are available online (https://nalab.stanford.edu/multiomicsmulticohortpreterm/).
We used linear discriminant analysis and principal component analysis (PCA), respectively, to create a 2-dimensional representation of the entire cohort with cohort labels as the supervised guide and without supervised information. To confirm the presence of cohort-specific signatures, we used random forest analysis. We created models for each patient to estimate GA at the time of sample collection. To simultaneously optimize the integrative model and test the performance of the model on previously unseen patients, we applied a cross-validation strategy. To predict PTB (GA at delivery <37 weeks), we used a leave-one-out cross-validation procedure to test the models on blinded participants.
Of the 81 pregnant women included in this study, 39 had PTBs (48.1%) and 42 had term pregnancies (51.9%). The mean (SD) maternal age was 24.8 (5.3) years. The median sampling time was 13.6 weeks of gestation, according to ultrasonography (Figure 1A).
To investigate cohort-specific data signatures, PCA was used to create a 2-dimensional representation of the entire cohort for each biological modality and all modalities combined (eFigure 1A in the Supplement). The PCA demonstrated that the largest source of variation in the data was not driven by fundamental differences between the cohorts. Supervised linear discriminant analysis26 confirmed the existence of more subtle cohort-specific signatures that were not statistically significant enough to be visualized in an unsupervised PCA (eFigure 1B in the Supplement). The presence of cohort-specific signatures was confirmed using random forest analysis27 that underwent cross-validation to predict the cohort from which the patient was selected exclusively on the basis of each biological modality (eFigure 1C in the Supplement).
The impact of sample storage time was quantified with random forest analysis that underwent cross-validation in which the number of days between sample collection and laboratory analyses was used as a continuous prediction target. The results were statistically significant (thresholds of P = 1.25762 × 10−01 for transcriptomics, P = 8.83433 × 10−06 for metabolomics, and P = 5.56758 × 10−02 for proteomics) only in the case of the urine metabolomics data set, indicating the potential for sample degradation over time (eFigure 1D in the Supplement). However, this result did not confound the design of this study as GA at delivery did not correlate with storage time (r = –0.092; P > .41).
We built models to estimate GA at the time of sample collection (as a surrogate for the chronicity of pregnancy) for each patient. A cross-validation strategy was used to simultaneously optimize the integrative model and test the performance of the model on previously unseen patients. Models built on all 3 modalities (transcriptomics, metabolomics, and proteomics) as well as the integrated model were statistically significantly correlated with GA at the time of sample collection (transcriptomics: 1.736089 × 10−03; metabolomics: 8.936983 × 10−23; proteomics: 2.227379 × 10−19; and integrated model: 8.990768 × 10−22; Bonferroni-adjusted Spearman correlation P < .05) (Figure 2A and B). The features that most correlated with the progression of pregnancy (Spearman correlation P < .05) are color-coded in Figure 2C. A cluster of highly correlated metabolomics and proteomics features was identified that included the trophoblast-derived placental growth factor (PGF). Previous studies have demonstrated that PGF plays a substantial role in the pathogenesis of preeclampsia but has not been associated with spontaneous PTB.28,29 Pathway analysis30 of the metabolites in this module indicated the enrichment of the steroid hormone biosynthesis pathway (Fisher test for pathway enrichment analysis P < 1.2 × 10−12). The purine metabolism pathway was enriched in an additional module of metabolites (Fisher test for pathway enrichment analysis P < 1.7 × 10−5). Other proteins that were included in the model and close to this cluster were PAPP-A (pregnancy-associated plasma protein A), MMP-7 (matrix metallopeptidase 7), FGF and FGFBP1 (fibroblast growth factors), and SIGLEC6 (sialic acid binding Ig-like lectin 6), all of which play important roles in placental development.31-34 An additional cluster of proteins associated with cell migration and localization was identified by gene ontology analysis (Protein Analysis Through Evolutionary Relationships overrepresentation P < 10 × 10−7).
To further highlight the interplay between plasma proteins and urine metabolites, we developed a random forest model to estimate the PGF levels of each patient using only the urine metabolomics data set (eFigure 2 in the Supplement). Overall, this analysis highlighted the potential for biological profiling for estimating GA during pregnancy (a substantial challenge in LMICs) and the use of urine-based metabolite biomarkers as low-cost surrogates for models developed through multiomics analysis.
For prediction of PTB (GA at delivery <37 weeks), we used a leave-one-out cross-validation procedure to test the models on blinded participants. Before training the model using the entire data set, the feature space was limited to the top features in the cohort that corresponded to the blinded sample based on univariate testing. Overall, the models relied on a subset of all available features. The median number of features used by the models during cross-validation was 36 for transcriptomics, 35 for metabolomics, and 9 for proteomics. To combine predictions from each model, we developed an additional integration layer to produce the final weighted probabilities for statistical testing. The integrated model was more accurate than the model for each independent modality (Figure 3A). The mean area under the receiver operating characteristic curve (AUROC) and 95% CI for each modality were as follows: transcriptomics (AUROC, 0.73; 95% CI, 0.61-0.83), metabolomics (AUROC, 0.59; 95% CI, 0.47-0.72), proteomics (AUROC, 0.75; 95% CI, 0.64-0.85), and integrated (AUROC, 0.83; 95% CI, 0.72-0.91) (Figure 3A). eFigure 3 in the Supplement provides a comparison against other machine learning strategies applied to the same data set (support vector regression AUROC, 0.57; random forest AUROC, 0.66; lasso AUROC, 0.68; Gaussian process AUROC, 0.71; supervised learning cohort-adjusted model AUROC, 0.83; merging AUROC, 0.71; stacked generalization AUROC, 0.76; data integration cohort-adjusted model AUROC, 0.83). In an independent analysis, this same pipeline was used to model participants who were randomly assigned to case and control groups, confirming that the findings presented in Figure 3 did not result from model overfitting (transcriptomics AUROC, 0.54; metabolomics AUROC, 0.50; proteomics AUROC, 0.50; integrated AUROC, 0.50) (eFigure 4 in the Supplement).
Field workers were trained to collect detailed phenotypic and demographic data from the women and their families through scheduled household visits during pregnancy and postpartum. Clinical covariates were manually harmonized across all 5 cohorts. Of all the variables collected, only the weight of the baby and GA at delivery were statistically significantly correlated with the final outcome of the model predicting PTB (Spearman correlation = 0.73). (eFigure 5 and eTable in the Supplement). This finding confirmed that the model was not confounded by the other measured clinical covariates.
Given the statistically significant differences observed across various cohorts, we used mixed-effect models (with each cohort encoded as a random effect) to compare the distribution of each measurement between term pregnancies and PTBs (Figure 3B). Top features were contained within 2 correlated modules: (1) an inflammatory module, which included interleukin 6 (IL-6), IL-1 receptor antagonist (IL-1RA, a regulatory member of the IL-1 family whose expression is induced IL-1β under inflammatory conditions35,36), granulocyte colony-stimulating factor (G-CSF), retinoic acid receptor responder 2 (RARRES2), and chemokine ligand 3 (CCL3), and (2) a metabolomic module, which primarily consisted of urine metabolites enriched for glutamine and glutamate metabolism (Fisher test for pathway enrichment analysis P < 4.4 × 10−9)30 and valine, leucine, and isoleucine biosynthesis pathways (P < 7.3 × 10−6).37
The presence of inflammatory mediators among the features correlated with PTB is consistent with finding in previous studies that suggested dysfunctional immune adaptations during pregnancy was central to the pathogenesis of PTB.38,39 However, the predictive model also highlighted a set of proteomic features with no known inflammatory properties that were correlated with features from the inflammatory module. These proteins included protein-arginine deiminase type II (PADI2), a peptidylarginine deiminase that is responsible for protein citrullination and implicated in parturition and sensing infections40,41; transferrin receptor (TfR), which is implicated in iron transport; angiopoietin-like 4 (ANGPTL4), which regulates glucose homeostasis and lipid metabolism42; and RARRES2, an adipokine that is increased in metabolic syndrome and gestational diabetes.43,44
To ascertain whether observed correlations between these proteins and the inflammatory module reflected biologically relevant inflammatory properties, we examined the capacity of each of these factors to stimulate human peripheral blood leukocytes using an ex vivo mass cytometry assay.45 The activity of major intracellular signaling responses previously17 implicated in maternal immune adaptations during pregnancy was assessed at baseline and after a 15-minute stimulation in major innate and adaptive immune cell types (eMethods in the Supplement). As expected, robust and cell-specific signaling responses along the JAK/STAT and MyD88 signaling pathways were observed in classical monocytes (CMC) after stimulation with known proinflammatory cytokines, including IL-6 (mean [SD] pSTAT3 ArcSinh ratio over endogenous signal, 2.64 [0.22]; false discovery rate [FDR]–adjusted vs unstimulated P < 1.0 × 10−6), G-CSF (mean [SD] pSTAT5 ArcSinh ratio over endogenous signal, 0.42 [0.12]; P = .007), and CCL3 (mean [SD] pCREB ArcSinh ratio over endogenous signal, 0.35 [0.09]; P < 1.0 × 10−6) (eFigures 6 and 7 and the eMethods in the Supplement). Stimulation with PADI2 activated the key elements of the MyD88 pathway, including P38 (mean [SD] ArcSinh ratio over endogenous signal, 0.91 [0.52]; FDR-adjusted vs unstimulated P = .007), MK2 (mean [SD] ArcSinh ratio over endogenous signal, 0.38 [0.10]; P = .002), and NFkB (mean [SD] ArcSinh ratio over endogenous signal, 0.14 [0.03]; P = .009), in monocytes, although little or no signaling responses were observed after stimulation with ANGPTL4, TfR, or RARRES2.
We also tested whether stimulation with the most informative proteomic features of the predictive model of PTB would alter the effector function of circulating immune cells. To this end, we quantified the intracellular expression of select cytokines in circulating immune cells that were stimulated with the target proteins for 4 hours. In addition to the expected cytokine responses after exposure to CCL3, IL-6, and G-CSF, the results show that PADI2 and ANGPTL4 stimulated proinflammatory cytokine production in CMC (mean [SD] frequency of PADI2-stimulated IL-1β + CMC: 18.66 [1.93], FDR-adjusted vs unstimulated P < 1.0 × 10−6; mean [SD] frequency of PADI2-stimulated IL-6 + CMC: 8.01 [1.47], P = 1.0 × 10−6; mean [SD] frequency of PADI2-stimulated TNF + CMC: 7.43 [1.44], P = 1.0 × 10−6) (eFigure 8 and eMethods in the Supplement).
In contrast, stimulation with RARRES2 or TfR elicited little intracellular cytokine responses (mean [SD] frequency of RARRES2-stimulated IL-1β + CMC: 5.63 [0.25], FDR-adjusted vs unstimulated P < 1.0 × 10−6; mean [SD] frequency of TfR-stimulated IL-1β + CMC: 2.25 [0.66], P = .16). These results provide evidence of the potential communication between different biological systems and add new elements to the complex pathogenesis of preterm birth. Furthermore, the results suggest that PADI2, in conjunction with other inflammatory cytokines (such as IL-1β), may exacerbate proinflammatory innate immune responses during PTBs, thereby playing a role in the early onset of labor.
To our knowledge, this study is the first multicohort and multiomics analyses of term and preterm birth conducted in LMICs through use of biorepository samples from relevant geographies in a harmonized fashion. The plasma and urine samples were collected, processed, stored, and shipped to the laboratories under uniform protocols. In this proof-of-concept study, a machine learning approach was implemented for quality control, analysis of the timing of pregnancy, and prediction of PTB. Cohort-specific signatures were observed in all cohorts, and data quality was consistent across all modalities.
The prediction of GA at the time of sample collection was driven by an internally correlated module of placenta-related plasma proteins and urine metabolites. Correlations within this module provided an excellent example of leveraging multiomics data for identification of low-cost surrogates in an accessible biological sample (in this case, urine) for an otherwise complex plasma-based measurement with direct applications in LMICs. Accurate prediction of GA through laboratory testing of blood or urine, if validated in larger and more diverse cohorts, has the potential for widespread implementation in settings in which ultrasonography-based GA dating is not available or is impractical.
Prediction of PTB using a multiomics model adjusted for each cohort resulted in an AUROC of 0.83. The sparse nature of the developed methods indicated the possibility of developing simplified models in a validation cohort for scalable analysis of larger cohorts. Mixed-effect modeling revealed several features of interest. The top-ranked features, including IL-1RA, pointed to promising anti-inflammatory therapy candidates that were under active development.46 Although the prediction of GA at the time of sample collection was consistent across all 5 cohorts, models for prediction of PTB required cohort-specific adjustments. This finding is consistent with that in previous publications that indicated that, although the normal chronicity of pregnancy may be shared across populations, pathological pregnancies are likely to be population-specific.47,48
Each multiomics data set differed not only across the subcohorts but also in terms of their size and internal complexities. Therefore, we used a 2-step machine learning strategy in which a model was first built on each omics data set and then combined for final predictions. This approach prevented large untargeted data sets from overwhelming small yet carefully targeted assays that could have a similar or even more discriminatory information content. This approach resulted in an increase in predictive power and improved interpretability of the results.
In the present study, the predictive accuracy for PTB was augmented by combining various omics data sets, which was consistent with previous studies suggesting that PTB was a condition manifesting within multiple biological systems.18,49-52 Observed differences between cohorts also highlighted that the causes of PTB may be associated with varying environmental and socioeconomic factors.53 From a biological standpoint, examination of individual components of the multiomics model emphasized the role of inflammation in the pathobiological features of PTB. As such, inflammatory cytokines previously shown to be elevated in PTBs, including IL-6 and IL-1RA (often considered as a surrogate marker of IL-1β expression54) were among the most informative features of the multiomics model.55 These cytokines were integrated within a broader inflammatory module that revealed novel factors associated with preterm labor with previously unsuspected properties (eg, PADI2). In neutrophils, citrullination of histones by PADI2 is an important step in the formation of neutrophil extracellular traps, a defensive immunity tool that allows neutrophils to trap and kill bacteria.56-60 Increased soluble PADI2 observed in PTBs may potentially reflect heightened inflammatory responses to a bacterial pathogen, consistent with an infectious cause for PTB. We show that soluble PADI2 can also directly activate proinflammatory signaling pathways and cytokine production in classical monocytes, highlighting a synergistic mechanism that may further enhance the inflammatory state of PTB.
This study had several strengths. First, the AMANHI and GAPPS biorepositories used accurate early trimester ultrasonography scans for GA dating. Second, urine and plasma specimens were collected, processed, and transported in a harmonized manner. All samples underwent a single freeze-thaw cycle only at Stanford University before final processing and analysis. Third, the machine learning strategy used was able to detect patterns that were generalizable across cohorts.
This study also had several limitations. First, it used a small sample size compared with the number of measurements (which we accounted for through a rigorous 2-step cross-validation process). Therefore, reproduction of these results in larger and more diverse cohorts remains a major priority for our future efforts. For reproduction of these results to be successful, the validation of a reduced model with increased scalability will be a key step. Second, given the exploratory nature of this study, the cohort was clinically homogeneous (eTable and eFigure 2 in the Supplement), which limits the generalizability of the results to real-world heterogeneous populations. Therefore, a future area of investigation is the direct integration of clinical covariates into the predictive models61 to increase the generalizability in data sets with diverse phenotypes.
This diagnostic/prognostic study found that, in LMICs and high PTB settings, major biological adaptations during pregnancy may follow a generalizable model, but the biological signals that correlate with or are potentially associated with PTB can be detected using robust machine learning algorithms. In addition, this study demonstrated that a multiomics approach has the potential to both improve and help identify low-cost predictive surrogates in accessible biological samples for LMICs. Research to expand this analysis to a larger patient population and to broader cohorts and omics platforms are already under way. The data sets, together with state-of-the-art machine learning partnerships,62 will be a key step in developing valuable predictive tests and intervention candidates to tackle the long-term clinical challenge of preventing PTB.
Accepted for Publication: September 17, 2020.
Published: December 18, 2020. doi:10.1001/jamanetworkopen.2020.29655
Correction: This article was corrected on February 12, 2021, to fix errors in the data of the eTable in the Supplement.
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2020 Jehan F et al. JAMA Network Open.
Corresponding Author: Nima Aghaeepour, PhD, Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Dr, Grant S280, Stanford, CA 94305-5117 (firstname.lastname@example.org).
Author Contributions: Drs Aghaeepour and Shaw had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Co-lead authorship: Drs Jehan, A. Rahman, and Ghaemi. Co-senior authorship: Drs Bahl, Stringer, Litch, Snyder, Quake, Angst, B. Gaudilliere, and Aghaeepour.
Concept and design: Jehan, Sazawal, Baqui, Nisar, Ilyas, Mitra, Mahmud, Ali, Nizar, Quaiyum, Manu, Bahl, Musonda, Ghaemi, Culos, Fallahzadeh, Ando, Wise, Darmstadt, Murray, Shaw, Stevenson, Snyder, Quake, B. Gaudilliere, Aghaeepour.
Acquisition, analysis, or interpretation of data: Jehan, Sazawal, Baqui, Nisar, Dhingra, Khanam, Ilyas, Dutta, Mehmood, Deb, Hotwani, S. Rahman, Ame, Moin, Muhammad, Chauhan, Begum, Khan, Das, Ahmed, Hasan, Khalid, Rizvi, Juma, Chowdhury, Kabir, Aftab, Yoshida, Bahl, A. Rahman, Pervin, Winston, Musonda, Stringer, Litch, Ghaemi, Moufarrej, Contrepois, Chen, Stelzer, Stanley, Chang, Hamad, Wong, Liu, Quaintance, Espinosa, Xenochristou, Becker, Fallahzadeh, Ganio, A. Tsai, D. Gaudilliere, E. Tsai, Han, Tingle, Marić, Wise, Winn, Druzin, Gibbs, Shaw, Stevenson, Quake, Angst, B. Gaudilliere, Aghaeepour.
Drafting of the manuscript: Jehan, Nisar, Mehmood, Hotwani, S. Rahman, Moin, Muhammad, Khalid, Rizvi, Chowdhury, Quaiyum, Musonda, Ghaemi, Chang, Hamad, Wong, Liu, Shaw, Stevenson, Angst, B. Gaudilliere, Aghaeepour.
Critical revision of the manuscript for important intellectual content: Jehan, Sazawal, Baqui, Nisar, Dhingra, Khanam, Ilyas, Dutta, Mitra, Deb, Mahmud, Hotwani, Ali, Nizar, Ame, Chauhan, Begum, Khan, Das, Ahmed, Hasan, Khalid, Juma, Chowdhury, Kabir, Aftab, Manu, Yoshida, Bahl, A. Rahman, Pervin, Winston, Stringer, Litch, Ghaemi, Moufarrej, Contrepois, Chen, Stelzer, Stanley, Chang, Wong, Quaintance, Culos, Espinosa, Xenochristou, Becker, Fallahzadeh, Ganio, A. Tsai, D. Gaudilliere, E. Tsai, Han, Ando, Tingle, Marić, Wise, Winn, Druzin, Gibbs, Darmstadt, Murray, Shaw, Stevenson, Snyder, Quake, Angst, B. Gaudilliere, Aghaeepour.
Statistical analysis: Sazawal, Ilyas, Moin, Muhammad, Rizvi, Chowdhury, Ghaemi, Moufarrej, Stelzer, Stanley, Chang, Hamad, Liu, Espinosa, Becker, Fallahzadeh, D. Gaudilliere, Shaw, B. Gaudilliere, Aghaeepour.
Obtained funding: Jehan, Sazawal, Baqui, Nisar, Mitra, Manu, Bahl, Stringer, Litch, D. Gaudilliere, Shaw, Stevenson, Quake, B. Gaudilliere.
Administrative, technical, or material support: Jehan, Baqui, Nisar, Dhingra, Khanam, Ilyas, Dutta, Mitra, Mehmood, Deb, Mahmud, Hotwani, Ali, S. Rahman, Nizar, Chauhan, Begum, Das, Ahmed, Hasan, Khalid, Chowdhury, Kabir, Aftab, Quaiyum, Manu, Yoshida, Musonda, Stringer, Litch, Ghaemi, Stelzer, Chang, Wong, Quaintance, Espinosa, Xenochristou, Ganio, A. Tsai, D. Gaudilliere, Ando, Tingle, Marić, Murray, Shaw, Stevenson, Quake, Angst, B. Gaudilliere, Aghaeepour.
Supervision: Sazawal, Nisar, Dhingra, Ilyas, Dutta, Mitra, Mehmood, Deb, Mahmud, Hotwani, Ame, Chauhan, Das, Khalid, Juma, Aftab, Manu, Bahl, Stringer, Contrepois, Wong, D. Gaudilliere, Ando, Wise, Stevenson, Snyder, Quake, B. Gaudilliere, Aghaeepour.
Other - Field implementation: Mahmud.
Other - Bioinformatics Analysis: Khan.
Other - Intellectual guidance on project related to pregnancy: Winn.
Other - Design of analytical methods: Culos.
Conflict of Interest Disclosures: Dr Jehan reported receiving grants from the Bill & Melinda Gates Foundation (BMGF), World Health Organization (WHO), and PATH as well as grants and nonfinancial support from Emory University during the conduct of the study. Dr Baqui reported that his employer Johns Hopkins University received a biorepository grant from the BMGF under which specimens were collected for this research. Dr Nisar reported receiving grants from the BMGF during the conduct of the study and grants from the BMGF, Vital Pakistan Trust, and the WHO outside the submitted work. Dr Dutta reported receiving grants from the BMGF during the conduct of the study. Dr Deb reported receiving grants from the BMGF during the conduct of the study. Dr Ali reported receiving grants from the BMGF during the conduct of the study. Dr Chauhan reported receiving grants from the BMGF during the conduct of the study. Dr Aftab reported receiving grants from the BMGF during the conduct of the study. Dr Bahl reported receiving grants from the BMGF during the conduct of the study. Dr Stringer reported receiving grants from the BMGF and the National Institutes of Health (NIH) during the conduct of the study. Dr Litch reported receiving grants from the BMGF during the conduct of the study. Ms Moufarrej reported receiving personal fees from Nemours, Stanford University, and Stanford BioX Bowes Fellowship outside the submitted work. Dr Contrepois reported holding a pending patent to Prediction of Gestational Age Using Urine Metabolites. Dr Stelzer reported receiving grants from German Research Foundation during the conduct of the study. Dr Quaintance reported receiving grants from the BMGF and the March of Dimes Foundation during the conduct of the study. Dr Murray reported being a former employee of the BMGF during the conduct of the study. Dr Stevenson reported receiving grants from the BMGF during the conduct of the study. Dr Snyder reported receiving grants from the BMGF during the conduct of the study, receiving nonfinancial support from MirVie Shareholder outside the submitted work, and holding a patent based on this work that will be submitted. Dr Quake reported being a shareholder, consultant, and board member of MirVie and holding a patent to Cell Free RNA Analysis of Preterm Birth that is licensed to MirVie. Dr Angst reported receiving grants from the BMGF and the March of Dimes Foundation during the conduct of the study as well as holding a patent to Onset of Labor. Dr Aghaeepour reported receiving grants from the BMGF, NIH, Burroughs Wellcome Fund, Robertson Foundation, and March of Dimes Foundation during the conduct of the study. No other disclosures were reported.
Funding/Support: This research was supported by grants OPP1112382 and OPP1113682 from the BMGF (Drs Stevenson, Angst, Aghaeepour, B. Gaudilliere, Litch, and Stringer), grant 22-FY20-181 from the March of Dimes Prematurity Research Center at Stanford University (Drs Stevenson and Shaw), grant 1019816 from the Burroughs Wellcome Fund (Dr Aghaeepour), grant KL2TR003143 from the National Center for Advancing Translational Sciences (Dr Aghaeepour), award R35GM138353 from the National Institute of General Medical Sciences of the NIH (Dr Aghaeepour), a gift from the Stanford Maternal and Child Health Research Institute (Drs Stevenson and B. Gaudilliere), a gift from the Intensive Care Nursery Unit Fund (Dr Stevenson), a gift from the Robertson Family Foundation (Dr Stevenson), a gift from the Mary L Johnson Research Fund (Drs Stevenson and Wong), an endowment from Christopher Hess Research Fund (Drs Stevenson and Wong), grant P30 AI50410 from the NIH (Dr Stringer), grant D43TW007585 from the Fogarty International Center (Drs Jehan, Nisar, and Khanam), grant OPP1033514 from the BMGF to the Global Alliance to Prevent Prematurity and Stillbirth (Seattle Children’s Hospital) (Drs A. Rahman, Litch, and Stringer), and grants OPP1054163 and OPP1138582 from the BMGF to the WHO (Dr Bahl).
Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Group Information: Alliance for Maternal and Newborn Health Improvement: Fyezah Jehan, MBBS; Sunil Sazawal, PhD; Abdullah H. Baqui, DrPh; Muhammad Imran Nisar, MBBS; Usha Dhingra, MCA; Rasheda Khanam, PhD; Muhammad Ilyas, MSc; Arup Dutta, MBA; Dipak K. Mitra, PhD; Usma Mehmood, MSc; Saikat Deb, PhD; Arif Mahmud, MBBS; Aneeta Hotwani, MPhil; Said Mohammed Ali, MSc; Sayedur Rahman, MBBS; Ambreen Nizar, MSc; Shaali Makame Ame, PhD; Mamun Ibne Moin, BSc; Sajid Muhammad, MSc; Aishwarya Chauhan, PhD; Nazma Begum, MA; Waqasuddin Khan, PhD; Sayan Das, MSc; Salahuddin Ahmed, MBBS; Tarik Hasan, MSc; Javairia Khalid, MSc; Syed Jafar Raza Rizvi, MSc; Mohammed Hamad Juma, DCH; Nabidul Haque Chowdhury, BSc; Furqan Kabir, MPhil; Fahad Aftab, MTech; Abdul Quaiyum, MBBS; Alexander Manu, PhD; Sachiyo Yoshida, PhD; Rajiv Bahl, PhD. Global Alliance to Prevent Prematurity and Stillbirth: Anisur Rahman, PhD; Jesmin Pervin, PhD; Jennifer Winston, PhD; Patrick Musonda, PhD; Jeffrey S. A. Stringer, PhD; James A. Litch, PhD. Prematurity Research Center at Stanford University: Mohammad Sajjad Ghaemi, PhD; Mira N. Moufarrej, MSc; Kévin Contrepois, PhD; Songjie Chen, PhD; Ina A. Stelzer, PhD; Natalie Stanley, PhD; Alan L. Chang, PhD; Ghaith Bany Hammad, PhD; Ronald J. Wong, PhD; Candace Liu, BSc; Cecele C. Quaintance, BSc; Anthony Culos, BSc; Camilo Espinosa, BSc; Maria Xenochristou, PhD; Martin Becker, PhD; Ramin Fallahzadeh, PhD; Edward Ganio, PhD; Amy S. Tsai, BSc; Dyani Gaudilliere, MD; Eileen S. Tsai, BSc; Xiaoyuan Han, PhD; Kazuo Ando, MD; Martha Tingle, BSc; Ivana Marić, PhD; Paul H. Wise, PhD; Virginia D. Winn, MD, PhD; Maurice L. Druzin, MD; Ronald S. Gibbs, MD; Gary L. Darmstadt, MD; Jeffrey C. Murray, MD; Gary M. Shaw, PhD; David K. Stevenson, MD; Michael P. Snyder, PhD; Stephen R. Quake, PhD; Martin S. Angst, MD; Brice Gaudilliere, MD, PhD; Nima Aghaeepour, PhD.
Additional Information: The AMANHI biorepository was coordinated by the WHO (Drs Bahl, Manu, and Yoshida) and the BMGF.