Assessment of the Diagnostic Efficiency of a Liquid Biopsy Assay for Early Detection of Gastric Cancer

Key Points Question Can circulating microRNAs be used to derive clinically significant noninvasive diagnostic biomarkers for gastric cancer? Findings This diagnostic study used a multistep and comprehensive biomarker discovery approach to establish a novel, noninvasive, microRNA-based signature for the early detection of gastric cancer, which was retrospectively and prospectively validated in multicenter patient cohorts. Meaning For patients with gastric cancer and individuals with a high risk for gastric cancer, a microRNA-based signature may improve the early detection of gastric cancer.


Prospective serum validation cohort
For prospective serum validation phase, patients who had a history of recurrent or other metastatic cancer were excluded from this study. Blood samples were taken when the patient was diagnosed with GC for the very first time without any treatment. Diagnosis was made by means of imaging techniques (computed tomography or magnetic resonance) and blood examination and verified by histopathological examination. All patients provided informed consent and met the inclusion criteria when enrolled in the study. CEA and CA19-9 levels were estimated in the serum specimens for patients with GC and healthy participants based on Cobase 601 (Roche Diagnostics) with Roche original reagents (ref. 11731629 and 11776193 respectively). Using qRT-PCR data for the 3-miRNA signature, the risk-scoring formula established in the previous cohort was applied and the performance of these circulating miRNA biomarkers was interrogated.

Sample collection and processing
Whole blood was collected in the blood collection tube without clot activator and without anticoagulants and left at room temperature (15-25°C) until complete clotting. The serum was transferred to a new tube after centrifuging for several times. After the serum was separated from the whole blood, it was stored at -80°C immediately for further processing.

Genome-wide miRNA data analysis
In the discovery phase, we first analyzed genome-wide miRNA sequencing data from TCGA (discovery cohort) to identify candidate miRNAs for the early detection of patients with GC. More specifically, level-3 miRNA expression data, including 436 tumors and 41 normal tissues, was downloaded from Firehose Broad GDAC portal (http://gdac.broadinstitute.org/, accessed on Nov 1, 2015). The miRNA expression levels, measured by reads per million miRNAs mapped (RPM), were first log2-transformed. Differential miRNA expression analysis was performed between GC and adjacent normal tissues using the bioconductor package and "limma" package in R (34). To further evaluate the predictive power of each miRNA's expression level in distinguishing GC from normal tissue, AUC was calculated. Among 1046 targets, 104 were differentially expressed (BH-adjusted P < 0.05, absolute log2 fold change > 1 (34)) between GC and normal tissues. Among these, 9 miRNAs (miR-21, -196a-1, -146b, -196b, -135b, -181b, -181a, -93 and -335) were further selected based on the following criteria: BH-adjusted P<1 x 10 -5 , AUC > 0.9 and upregulated in GC. In addition, we also included two additional miRNAs, miR-196a-2 and 18a due to their high discriminative power (AUC=0.90 and 0.87, respectively), and because they were highly overexpressed in GC patients (log2 fold change >2, BH-adjusted P < 1 x 10 -5 ). Due to the annotation differences between platforms, we combined expression data of miR-196a-1 and miR-196a-2 together as a single miRNA probe (miR-196a), which led to reduction of the 11-miRNA candidates to 10 miRNAs. 10 differentially expressed and discriminatory miRNAs between GC and normal tissues were selected as the initial signature candidates.
To confirm their diagnostic potential, the 10 miRNAs were validated in two additional independent datasets. GSE23739 included 40 GC and 40 non-cancerous tissue specimens with miRNA profiling data acquired using the Agilent-019118 Human miRNA Microarray platform (35). GSE33743 included 37 primary GC tissues and 4 normal gastric mucosa profiled for miRNA expression using the miRNAChip_human_V2 miRNA microarrays (National DNA-Microarray Facility) containing 1175 probes(36). The miRNA expression data was downloaded from GEO using Bioconductor package "GEOquery" in R, which was subsequently preprocessed using the methods described by Carvalho and colleagues (35,36).

miRNA-mRNA regulatory network interaction and functional analysis
A miRNA-mRNA network was constructed to study the functional significance of the candidate miRNAs. Specifically, for each miRNA, its downstream target mRNAs was identified based on two key criteria: first, that each miRNA-mRNA interaction had been experimentally validated based on the miRTarBase database (V8) (21); second, that each downstream mRNA was differentially expressed between tumor and normal samples (|log2 fold change| > 2 & BH-adjusted P < 0.05) in the TCGA dataset. The functional analysis was performed based on hypergeometric tests using the "clusterProfiler" package (37), with C2 (curated gene sets) and C5 (GO and KEGG gene sets) collections retrieved from the MSigDB Database (v7.0) (38). P values were corrected for multiple hypothesis testing using the BH procedure, and a BH-adjusted P < 0.0001 was considered statistically significant.

Random Forest classification
To further evaluate the robustness of 10-miRNA signature across different datasets, a Random Forest classifier was trained using the expression levels of the 10 miRNAs by analyzing results from 41 GC and matched adjacent normal tissues in the discovery dataset. Z-normalization was performed for each miRNA separately in both validation datasets. Using the trained Random Forest classifier, predictive probabilities were calculated for identifying patients with GC, and the diagnostic performance of the combination of 10 miRNA candidates was assessed based on the AUC for both validation datasets.
Cost-effectiveness analysis CE analysis was performed under the following clinical assumptions (Supplementary Table 11): Noninvasive screening was assumed to be performed on a high-risk population, Chinese men between ages 50-75 years old. The compliance rate was estimated to be approximately 45%. The test-positive group were assumed to go on to have a confirmatory test using endoscopy and biopsy. The biopsy test is considered the gold standard, with 100% sensitivity and specificity. The test-negative group were assumed to have a 3-year follow-up, during which cancer patients would be detected during the followup period. For the non-screening group, 10% of high-risk population were estimated to receive an endoscopy test to evaluate the incidence of cancer. Due to the high sensitivity and specificity of the 3-miRNA signature, the rate of early-stage patients diagnosed was estimated to rise. Using the miRNA test in a large-scale screening was estimated to increase the detection rate of early-stage GC, which is calculated from the sensitivity and specificity of the 3-miRNA signature under the assumed compliance rate.
For the assumption of cancer treatment, early or advanced stages of GC (TNM Stage 1-3) were considered curable and would be cured after two years with a stage-specific recurrence rate. For cured patients, we estimated they would have additional medical expenditure (1000 CNY or 142.6 USD) every year before recurrence. Terminal stage GC (TNM Stage 4) was considered as untreatable, with only palliative care, and patients were assumed to die after one year. Considering the prognosis following cancer recurrence is poor, all relapsed patients were assumed to have Stage 4 status. Cost and incidence rate were either collected from the literature or estimated based on in-house clinical records.
RNA isolation and qRT-PCR miRNA extraction from tissue specimens was performed using miRNeasy RNA isolation kits (Qiagen, Valencia, California, USA), whereas miRNA extraction from serum was performed using miRNeasy serum/plasma kits (Qiagen). TaqMan miRNA real-time qRT-PCR assays (Applied Biosystems, Foster City, California, USA) were used to detect and quantify miRNA expression. The expression levels of serum or tissue miRNAs were normalized against miR-16 and U6 expression levels, respectively, and results were calculated using the ΔCt method as previously described (39). To ensure consistent measurements throughout all assays, for each PCR amplification reaction, three independent RNA samples were loaded as internal controls to account for potential plate-to-plate variation, and the results from each plate were normalized against the internal normalization controls. All experiments were triplicated.
Total RNA enriched in small RNAs was purified from serum samples using the miRNeasy Serum/Plasma Kit (Qiagen). C. elegans miR-39 miRNA mimic (miRNeasy Serum/Plasma Spike-In Control, Qiagen) was mixed thoroughly into all samples during the RNA isolation procedures for normalization of sample-to-sample variation. RNA extracted from serum samples was reversetranscribed using a TaqMan MicroRNA Reverse Transcription Kit (Applied Biosystems) for subsequent qRT-PCR assays. qRT-PCR was performed using a TaqMan MicroRNA Assay kit and TaqMan Universal Master Mix II, no UNG (Applied Biosystems) through QuantStudioTM 7 Flex Real-Time PCR System (Applied Biosystems). The expression of miRNAs was normalized against the average expression level of miR-16 and 423-5p (Applied Biosystems) in all serum samples. Whenever we used kits from companies, we followed the instruction from the manufactures.

Logistic regression analysis with elastic net regularization
Using 115 patients with GC and 115 healthy participants enrolled in the public serum cohort (GSE106817), we developed a 3-circulating-miRNA signature using logistic regression with elastic net regularization. Using the established risk scoring formula, we calculated risk probabilities for samples in the prospective serum cohort to evaluate the diagnostic performance.

Statistical analysis
All statistical analyses were performed using Medcalc V.12.3.0 (Broekstraat 52, 9030; Mariakerke, Belgium), GraphPad Prism V5.0 (GraphPad Software, San Diego, California, USA), and R (3.3.3, R Development Core Team, https://cran.r-project.org/). Differential miRNA expression analysis was performed using the 'limma' package in R, and the resulting P values were adjusted using Benjamini-Hochberg's method. Wilcoxon's signed-rank test, the Mann-Whitney U test, and the Kruskal-Wallis test were used to analyze miRNA expression data obtained from qRT-PCR experiments, and results were expressed as mean ± standard error. Silhouette width was calculated using R package 'cluster' eFigure 1. Study design for the identification of a circulating miRNA expression signature for early detection of GC. eFigure 3. miRNA regulatory network analysis and functional analysis of the miRNA target genes. (A) A regulatory network constructed using experimentally validated miRNA-mRNA interactions from miRTarBase (V8). Node size indicates the -log10 transformed BH-adjusted P, and color indicates the log2 fold change between GC and normal samples in the TCGA dataset. (B) Functional analysis using hypergeometric tests on cancer hallmark and KEGG pathways. The significantly enriched signaling pathways (BH-adjusted P < 0.05) are illustrated in the bar plot. Bar length indicates the number of overlapping genes and color indicates the P value. Correlation matrix of expression of five miRNAs (miR-93, miR-146b, miR-335, miR-18a and miR-181b) in the Kumamoto cohort. (I ) Paired dot plots comparing risk scores calculated for 22 pairs of pre-and post-operative sera form patients with GC. A significant drop in GC risk was observed after curative surgery (P < 0.0001, Wilcoxon signed-rank test), suggesting that expression of the circulating miRNAs was derived from GC tissues.

G H I
Pre-Post operation eFigure 6. Validation of the 5 circulating miRNAs in a public serum cohort (GSE106817). Boxplots with one-tailed Wilcoxon signed-rank tests comparing the expression levels of the circulating miRNAs between patients with GC and healthy participants. eFigure 7. Establishment of a 3-circulating-miRNA signature and evaluation in a prospective validation serum cohort. (A) Confusion matrices built from the diagnostic model prediction in the prospective validation cohort. (B) ROC curves demonstrating diagnostic performance of the 3-miRNA signature in all-stage GC samples, stage I GC samples, CEA and CA19-9 in a prospective validation cohort (Nanjing). ROC curves are shown with 95% CI (DeLong's test). (C) Boxplots comparing risk scores between patients with GC of different stages, precancerous lesion (pre) and healthy participants. All ROC curves are shown with 95% CI. The 95% CI of sensitivity and specificity (green line) for each miRNA is also shown at the best threshold (green point).  Abbreviations: GC, gastric cancer. Healthy, healthy participants. CEA, carcinoembryonic antigen. CA19-9, cancer antigen 19-9. AUC, area under the ROC curve. PPV, positive predictive value. NPV, negative predictive value. eTable 9. Univariate and multivariate analyses comparing our 3-circulating-miRNA signature with age, sex, CEA, and CA19-9 for noninvasive detection of GC across all stages and stage I in the prospective validation cohort.