The boxes span the first to the third quartile with the line inside the box representing the median value. The whiskers show the minimum and maximum values or values up to 1.5 times the interquartile range below or above the first or third quartile if outliers are present (shown as separate dots). Con indicates control participants; LCa, lung cancer; NTLD, nontumor lung diseases; OD, other diseases.
eTable 1. Overview of Age, Gender and Smoking Status Along With Secured Diagnoses and Tumor Stages for the Collected Samples
eTable 2. Sample Site and Sample Profiling Distribution
eTable 3. Results of the ANOVA for the Four Compared Groups
eTable 4. Result Metrics for the Comparison of Lung Cancer Cases Versus All Other Participants in the Study
eTable 5. Signatures for the Comparison of Lung Cancer Cases Versus All Other Participants in the Study
eTable 6. Result Metrics for the Comparison of Lung Cancer Cases Versus Patients With Non-Cancer Lung Diseases
eTable 7. Signatures for the Comparison of Lung Cancer Cases Versus Patients With Non-Cancer Lung Diseases
eTable 8. Result Metrics for the Comparison of Low Lung Cancer Stages Versus All Non-Lung-Cancer Patients
eTable 9. Signatures for the Comparison of Low Lung Cancer Stages Versus All Non-Lung-Cancer Patients
eTable 10. Classification Performance for Different Age/Smoking Thresholds
eTable 11. PPV and NPV for a Range of Prevalence Rates
eTable 12. Result Metrics for the Comparison of NSCLC Versus SCLC
eTable 13. Signature for the Comparison of NSCLC Versus SCLC
eTable 14. Signature Overlap Between the Three Classification Scenarios
eTable 15. An Overview of Studies on miRNA Expression Changes in Lung Cancer
eFigure 1. Disease Ontology
eFigure 2. ROC-Analysis of Single miRNAs and miRNA Signatures for the Three Scenarios
eFigure 3. Heatmap for the Classification Performance for Different Age/Smoking Thresholds
eFigure 4. Pearson Correlation for 41 Repeated Experiments to Assess the Reproducibility
Customize your JAMA Network experience by selecting one or more topics from the list below.
Fehlmann T, Kahraman M, Ludwig N, et al. Evaluating the Use of Circulating MicroRNA Profiles for Lung Cancer Detection in Symptomatic Patients. JAMA Oncol. 2020;6(5):714–723. doi:10.1001/jamaoncol.2020.0001
Can the detection of lung cancer in symptomatic patients be improved by using circulating microRNAs as biomarkers?
This cohort study used genome-wide microRNA profiles from the blood samples of 3046 individuals to identify patients with lung cancer with 91.4% accuracy, 82.8% sensitivity, and 93.5% specificity.
The findings of this study suggest that the identified patterns of circulating microRNAs may enable them to be used as biomarkers in a liquid biopsy to complement imaging tests, sputum cytology, and biopsies.
The overall low survival rate of patients with lung cancer calls for improved detection tools to enable better treatment options and improved patient outcomes. Multivariable molecular signatures, such as blood-borne microRNA (miRNA) signatures, may have high rates of sensitivity and specificity but require additional studies with large cohorts and standardized measurements to confirm the generalizability of miRNA signatures.
To investigate the use of blood-borne miRNAs as potential circulating markers for detecting lung cancer in an extended cohort of symptomatic patients and control participants.
Design, Setting, and Participants
This multicenter, cohort study included patients from case-control and cohort studies (TREND and COSYCONET) with 3102 patients being enrolled by convenience sampling between March 3, 2009, and March 19, 2018. For the cohort study TREND, population sampling was performed. Clinical diagnoses were obtained for 3046 patients (606 patients with non–small cell and small cell lung cancer, 593 patients with nontumor lung diseases, 883 patients with diseases not affecting the lung, and 964 unaffected control participants). No samples were removed because of experimental issues. The collected data were analyzed between April 2018 and November 2019.
Main Outcomes and Measures
Sensitivity and specificity of liquid biopsy using miRNA signatures for detection of lung cancer.
A total of 3102 patients with a mean (SD) age of 61.1 (16.2) years were enrolled. Data on the sex of the participants were available for 2856 participants; 1727 (60.5%) were men. Genome-wide miRNA profiles of blood samples from 3046 individuals were evaluated by machine-learning methods. Three classification scenarios were investigated by splitting the samples equally into training and validation sets. First, a 15-miRNA signature from the training set was used to distinguish patients diagnosed with lung cancer from all other individuals in the validation set with an accuracy of 91.4% (95% CI, 91.0%-91.9%), a sensitivity of 82.8% (95% CI, 81.5%-84.1%), and a specificity of 93.5% (95% CI, 93.2%-93.8%). Second, a 14-miRNA signature from the training set was used to distinguish patients with lung cancer from patients with nontumor lung diseases in the validation set with an accuracy of 92.5% (95% CI, 92.1%-92.9%), sensitivity of 96.4% (95% CI, 95.9%-96.9%), and specificity of 88.6% (95% CI, 88.1%-89.2%). Third, a 14-miRNA signature from the training set was used to distinguish patients with early-stage lung cancer from all individuals without lung cancer in the validation set with an accuracy of 95.9% (95% CI, 95.7%-96.2%), sensitivity of 76.3% (95% CI, 74.5%-78.0%), and specificity of 97.5% (95% CI, 97.2%-97.7%).
Conclusions and Relevance
The findings of the study suggest that the identified patterns of miRNAs may be used as a component of a minimally invasive lung cancer test, complementing imaging, sputum cytology, and biopsy tests.
Lung cancer is among the 3 most common cancers among both sexes and is the leading cause of cancer-related deaths worldwide.1,2 Despite decreasing smoking rates in industrial countries, millions of current or former heavy smokers have an elevated risk of developing lung cancer, including more than 90 million people in the United States.3 Because the symptoms of lung cancer may not be obvious in its early stages, more than two-thirds of all cases are detected in late and inoperable stages. Patients with metastatic stage IV lung cancer have a 5-year survival rate of only 4.7% compared with a 5-year survival rate of up to 56.3% for those with stage I cancer.4
In 2011, the National Lung Screening Trial (NLST) reported a 20% decrease in lung cancer mortality associated with screening with low-dose computed tomography (CT) vs annual chest radiography in a high-risk population.3 Low-dose CT is, however, characterized by a high rate of false-positive results and an elevated risk of overdiagnosis.3,5 To detect lung tumors, the use of liquid biopsy–based strategies have been explored.6-9 In addition to DNA and proteins, microRNAs (miRNAs) have been shown to have potential for detecting a wide variety of human pathologies.10-13 Several previous studies have examined miRNA signatures in patients with non–small cell lung cancer (NSCLC), those with chronic obstructive pulmonary disease (COPD),14-19 and those with metastases after tumor resection.20
Despite several publications on tumor biomarkers, few tissue-based biomarkers and virtually no new blood-borne biomarkers are used in clinical practice.21 The clinical application of biomarkers is largely hampered by the relatively small number of cases that have been analyzed in most preclinical studies. This limitation also applies to the studies10,11,14-25 of blood-borne miRNAs in patients with lung cancer. The problem is further aggravated by the fact that most studies only consider a small number of cases, limiting the validation of the miRNA signatures that were analyzed. In addition, different methodologies (for collection, storage, analysis) have a particularly strong influence on the results of small studies. This drawback also applies to small studies that use reverse transcriptase polymerase chain reaction. Alternatively, an extended number of cases are studied using few miRNA signatures mostly via reverse transcriptase polymerase chain reaction.16,17,22-24,26 For large studies that use reverse transcriptase polymerase chain reaction, a preselection bias from small studies can often be observed, resulting in a skewed miRNA signature. These problems cannot be overcome by meta-analyses that combine the results of studies that differ methodologically in terms of collection, storage, and analysis protocols.
The present study assessed the use of miRNA signatures in whole blood samples for diagnosis of lung cancer in symptomatic patients.
This study was approved by the local ethics committee, Ärztekammer des Saarlandes, in Saarbrücken, Germany. All participants provided verbal informed consent to participate in a study analyzing blood-borne RNA signatures. The study and the analysis of the data followed the Standards for Reporting of Diagnostic Accuracy (STARD) reporting guideline.
We assessed the diagnostic value of using blood-borne miRNAs to diagnose lung cancer in 3 different scenarios. First, patients with lung cancer were identified in a group encompassing lung cancer, other lung diseases, other diseases not affecting the lungs, and unaffected control participants. Second, patients with lung cancer were compared with patients with nontumor lung diseases. Third, patients with lung cancer (stage I or II lung tumors) were compared with patients with other lung diseases, patients with other diseases not affecting lungs, and unaffected control participants.
This retrospective, multicenter cohort study included 3046 blood samples obtained from 4 different groups: patients with (1) lung cancer, (2) patients with nontumor lung diseases, (3) patients with diseases not affecting the lungs, and (4) unaffected control participants. The patients with lung cancer included those with non–small cell lung cancer (NSCLC) and small cell lung cancer (SCLC). Patients with other lung diseases were mostly diagnosed with COPD. The samples from patients with COPD were provided by the COSYCONET cohort study. Patients in the group with diseases not affecting the lungs had multiple sclerosis, Parkinson disease, breast cancer, endometriosis, various heart diseases, were undergoing abdominal surgery, or were developing sepsis. Physicians diagnosed the diseases according to the standard diagnostic procedure established at the hospital at the time of diagnosis. Samples from unaffected control participants were provided by the TREND cohort study. This heterogeneous background set was selected to discover miRNA markers that are not associated with the development of lung cancer but are general disease markers.10-13
The samples were collected between March 3, 2009, and March 19, 2018. Six years of follow-up data were available for patients from the TREND and COSYCONET cohort studies (336 and 538 patients, respectively), but only the data from the point of inclusion in the respective studies were used for the present study. For the other 2172 patients who were not included in the TREND and COSYCONET studies, samples were collected at 11 different sites (eTable 2 in the Supplement), but no follow-up data were collected. An overview of the patient groups, group sizes, and comparisons is presented in eFigure 1 in the Supplement. Table 1 and Table 2 provide the age, sex, and smoking history of the analyzed groups. Patient characteristics, including the diagnosis and assignment to 1 of the 4 groups for each sample are described in eTable 1 in the Supplement. The distribution of the samples per site is aggregated in eTable 2 in the Supplement. A total of 3102 patients were enrolled in the study. For 1.8% of these patients (56), no definitive clinical diagnosis was available; therefore, they were excluded. The collected data were analyzed between April 2018 and November 2019.
RNA was isolated as described previously.11,25 In brief, RNA from 3046 whole blood samples in PAXgene Tubes (BD Biosciences) was isolated using the PAXgene blood miRNA Kit (QIAGEN). All isolations were performed either manually or semiautomatically using the QIAcube robot (QIAGEN). The individuals who extracted blood samples (N.L., C.D., and S.D.) were familiar with the manufacturer’s instructions of the PAXgene blood tubes. RNA was quantified using a NanoDrop spectrophotometer (Thermo Fisher Scientific) and RNA integrity was checked on a bioanalyzer using the RNA Nano Kit (Agilent Technologies). The mean (SD) RNA integrity value was 7.9 (0.93).
The miRNA expression profile of mature human miRNAs was assessed as described previously using human miRNA microarrays and the miRNA Complete Labeling and Hyb Kit (Agilent Technologies).25,27,28 Labeled RNA was hybridized to the array slides for 20 hours at 55 °C with 20-rpm rotation. Arrays were washed twice, air dried, and scanned in the microarray scanner with a 3-μm resolution in double-path mode. Raw data were extracted using Agilent Feature Extraction software. The data are freely available under ArrayExpress accession E-MTAB-8026.29 All analyses were performed in 2 labs at Hummingbird Diagnostics in Heidelberg and at Saarland University. The larger number of samples (2219 of 3046) was measured in Heidelberg, and the remaining 827 samples were measured at Saarland University.
The miRNA expression intensities were background subtracted and the median intensity of replicates was computed. The resulting matrix was filtered for expressed miRNAs and normalized using quantile normalization (eAppendix in the Supplement). Analyses were performed on 1183 expressed miRNAs. As the hypothesis test, an unpaired 2-tailed t test was applied and Benjamini-Hochberg multiple testing correction was performed. In addition to the complete cohort, which is largely matched and contains patients with similar age, sex, and smoking status distribution, we performed perfect patient matching according to sex, smoking status (past, present, and never), and age, thereby yielding equally sized cohorts. Sex and smoking status were matched perfectly, and for age, a 1-year difference was allowed (eAppendix in the Supplement). To assess and compare the diagnostic performance of individual miRNAs and miRNA signatures, the area under the receiver operating characteristics curve (AUROC) was computed. The general distribution of miRNAs was investigated using analysis of variance (ANOVA). This analysis allowed identification of markers that were specifically dysregulated in only 1 or in several groups. Statistical analysis was performed using R, version 3.3 (R Foundation).
Unless stated otherwise, the significance threshold was .05 and tests were 2-sided. Positive predictive values were calculated for all scenarios according to the formula ((sensitivity × prevalence)/(sensitivity × prevalence + 1-specifity) × (1-prevalence)). If not specified otherwise, prevalence was defined as the fraction of individuals in the respective scenario. Reproducibility of measurements was assessed by computing the average Pearson correlation between repeated measurements of the same individual. We also corrected for technical batch effects as follows: (1) study or sampling site, (2) the site where the sample was profiled, and (3) the biochip identification number, using the limma package.30
To identify the groups with highest or lowest median expression we focused on the 100 most statistically significant miRNAs with an adjusted P < .05 obtained from the ANOVA.
The machine learning approach that we used is based on gradient boosted trees with a filter-based feature selection that relies on the feature importance reported by LightGBM31 (eAppendix in the Supplement). Two analysis approaches were tested and compared to evaluate predictive performance: models were trained on a training set (50% of data) and evaluated on a validation set (remaining 50% of the data), with equal distribution of individuals to the disease groups. The CIs were estimated by performing 10 times repeated 2-fold cross-validation. To further test stability, resampling was carried out. With a fixed seed used for the random number generator, 5 times repeated 5-fold cross-validation was performed. To test for potential overtraining, nonparametric permutation tests were performed. Machine learning analysis was performed using Python (Python Software Foundation) with the LightGBM 2.1.0,31 scikit-learn 0.19.1, and scipy 1.1 packages.
In this study, 3102 individuals were enrolled; 56 individuals were excluded because of missing clinical diagnoses. Clinical diagnoses were obtained for 3046 participants (mean [SD] age of 61.1 (16.3) years) who were included in the analysis; data on sex were available for 2856 individuals, with 1727 (60.5%) being men. Of 3046 participants, 606 had non–small cell or small cell lung cancer, 593 had nontumor lung diseases, 883 had diseases not affecting the lung, and 964 were unaffected control participants. Additional demographic data are provided in Table 1, Table 2, and eFigure 1 in the Supplement. Although the cohorts were matched for sex, similar age, and similar smoking history, referred to as largely matched cases, all analyses were also carried out for subcohorts, with perfect matching for age, sex, and smoking history (Table 3).
First, ANOVA was performed for miRNA expression in the 4 groups. The 5 most statistically significant markers, hsa-miR-17-3p, hsa-miR-21-3p, hsa-miR-193a-3p, hsa-miR-18b-5p, and hsa-miR-18a-5p had adjusted P values less than 10−150 after the Benjamini-Hochberg adjustment. All 5 miRNAs showed the highest mean (SD) expression in patients with nontumor lung diseases vs those with lung cancer (hsa-miR-17-3p, 161.36 [92.99] vs 51.21 [55.45]; hsa-miR-21-3p, 36.32 [21.77] vs 15.72 [11.14]; hsa-miR-193a-3p, 14.00 [4.21] vs 10.50 [3.56]; hsa-miR-18b-5p, 98.21 [65.66] vs 35.46 [37.11]; and hsa-miR-18a-5p 202.94 [144.72] vs 65.21 [73.70]), and 3 of the 5 miRNAs showed the lowest mean (SD) expression in cases of lung cancer.
The minimal median expression value among the 100 most statistically significant miRNAs was observed for 47 miRNAs in the lung cancer group (maximal median expression, 41952.030; minimal median expression, 7.766; SD, 6268.537), 25 miRNAs in nontumor lung diseases (maximal median expression, 2675.910; minimal median expression, 7.080; SD, 809.671), 15 miRNAs in diseases not affecting the lung (maximal median expression, 984.194; minimal median expression, 8.008; SD, 258.121), and 13 miRNAs in unaffected control participants (maximal median expression, 1836.074; minimal median expression, 7.418; SD, 647.110). These findings suggest a statistically significantly lower expression of miRNAs in lung cancer and a higher expression in nontumor lung diseases (P = .01 assessed using the Fisher exact test). Raw and adjusted P values of the ANOVA are given in eTable 3 in the Supplement, and the Figure shows an upregulation of hsa-miR-17-3p in patients with nontumor lung diseases, but not in patients with lung cancer as well as a downregulation of hsa-miR-140-5p in patients with lung cancer only and a downregulation of hsa-miR-628-3p and hsa-miR-374c-5p in those with lung diseases in general.
First, blood samples from symptomatic patients diagnosed with lung cancer (n = 606) were compared with those from all other individuals (n = 2440). From the training set, a signature of 15 miRNAs was identified. When applied to the validation set, the signature distinguished blood samples from patients diagnosed with lung cancer from all other individuals with an accuracy of 91.4% (95% CI, 91.0%-91.9%), a sensitivity of 82.8% (95% CI, 81.5%-84.1%), and a specificity of 93.5% (95% CI, 93.2%-93.8%).
Next, the training and validation cohorts were jointly evaluated. Overall, hsa-miR-660-5p had the highest AUROC (0.745; 95% CI, 0.723-0.767; P = 1.27 × 10−55). Raw and adjusted P values as well as the AUROC values for all miRNAs are given in eTable 4 in the Supplement. Resampling of the full cohort confirmed these results by predicting the validation set: the 5 times repeated 5-fold cross-validation highlighted a mean AUROC value of 0.965 (95% CI, 0.962-0.967). The receiver operating characteristic curve is shown in eFigure 2 in the Supplement. For the perfect matched scenario (matched age, sex, smoking status, and cohort size), 11 miRNAs were required (eTable 5 in the Supplement). The results yielded a similar mean AUROC value of 0.944 (95% CI, 0.938-0.950). In comparison, the mean permutation test AUROC was 0.499 (95% CI, 0.497-0.500) in the largely matched and in the perfectly matched comparison.
Blood samples from patients with lung cancer (n = 606) were compared with those from patients with a nontumor lung disease (n = 593). Although both cohorts had nearly the same mean (SD) age (65.4 [9.1] years for lung cancer vs 65.8 [10.1] years for patients with nontumor lung disease), the cohort of patients with lung cancer had a lower mean (SD) number of pack-years than patients with nontumor lung disease (42.3 [23.9] vs 49.3 [36.6], respectively).
Training our models on the training set yielded a 14-miRNA signature. When applied to the validation set, the model distinguished patients with lung cancer from those with a nontumor lung disease with an accuracy of 92.5% (95% CI, 92.1%-92.9%), a sensitivity of 96.4% (95% CI, 95.9%-96.9%), and a specificity of 88.6% (95% CI, 88.1%-89.2%).
The joint analysis of the training set and the validation set revealed that hsa-miR-17-3p distinguished patients with lung cancer (mean [SD] on log2 scale, 5.151 [1.215] [n = 606]) from those with nontumor lung diseases (mean [SD] on log2 scale, 7.084 [0.918] [n = 593]) with the highest significance (P = 6.94 × 10−151) and an AUROC value of 0.899 (95% CI, 0.881-0.917) for this comparison. In addition to hsa-miR-17-3p, several other miRNAs had high AUROC values for this comparison (eTable 6 in the Supplement). The miRNA signatures from the 5 times repeated 5-fold cross-validation yielded a mean AUROC value of 0.977 (95% CI, 0.975-0.978). The same comparison in the perfectly matched cohort yielded a mean AUROC value of 0.974 (95% CI, 0.971-0.977) (eTable 7 in the Supplement).
Blood samples from patients with early-stage lung cancer, including Union for International Cancer Control stages I and II (n = 194), were compared with those from all patients without lung cancer, including patients with nontumor lung diseases (n = 593), patients with diseases other than lung cancer (n = 883), and control participants (n = 964). On the training set, a signature of 14 miRNAs was established. When applied to the validation set, the signature identified patients with early-stage lung cancer with an accuracy of 95.9% (95% CI, 95.7%-96.2%), a sensitivity of 76.3% (95% CI, 74.5%-78.0%), and a specificity of 97.5% (95%CI, 97.2%-97.7%).
The joint analysis of the samples revealed that hsa-miR-374b-5p distinguished patients with early-stage lung cancer (stage I or II) (mean [SD] on log2 scale, 6.993 [1.726]) from those without lung cancer (log2 scale mean [SD] on log2 scale, 8.946 [1.280]) with high significance (P = 4.17 × 10−34) and had an AUROC of 0.830 (95% CI, 0.800-0.859) (eTable 8 in the Supplement). The miRNA signatures for the 5 times repeated 5-fold cross-validation yielded a 9-miRNA signature with a mean AUROC of 0.960 (95% CI, 0.954-0.965) (eTable 9 in the Supplement). The same computation for the perfectly matched cohort identified a signature including 5 miRNAs with a mean AUROC value of 0.936 (95% CI, 0.929-0.944).
In the scenarios described herein, we found similar results between the largely and the perfectly matched subcohorts, indicating an association between the miRNA signatures and age, sex, and smoking status. To identify how different inclusion criteria for age and smoking status change the test performance, the AUROC values of the signatures were assessed separately for different age groups and different groups of tobacco use. We defined 6 age-related thresholds (<55, 55-59, 60-64, 65-69, 70-75, and >75 years) and 4 thresholds for the numbers of pack-years (individuals having a smoking history of at least 30, 40, 50, or 60 pack-years). The highest AUROC value (0.977; 95% CI, 0.971-0.984) was computed for smokers with a history of more than 50 pack-years without any age restriction. The patient group of particular interest (based on NLST criteria), that is patients who are older than 55 years with a smoking history of at least 30 pack-years, had a high AUROC value of 0.974 (95% CI, 0.971-0.976) (eFigure 3 and eTable 10 in the Supplement). To further assess the influence of confounding variables, models were built using the resampling approach combining the miRNA expression and the confounding variables age, sex, and smoking status. All models yielded slightly higher AUROC values (compared with the models without confounders) for the 3 scenarios presented above and for nonmatched and matched cohorts (scenario 1: 0.972 [95% CI, 0.970-0.974] vs 0.952 [95% CI, 0.949-0.955]; scenario 2: 0.977 [95% CI, 0.975-0.978] vs 0.979 [95% CI, 0.977-0.982]; scenario 3: 0.964 [95% CI, 0.959-0.969] vs 0.953 [95% CI, 0.948-0.957]).
The positive predictive value (PPV) and negative predictive value (NPV) were computed for different prevalence rates for symptomatic lung cancer range from 0.645% (for the group in the NLST3) to 50% (eTable 11 in the Supplement). For the resampling approach, the NPV approaches 100% with decreasing prevalence while the PPV declines. For example, scenario 3 has a PPV of 97.0% (95% CI, 96.6%-97.4%) and an NPV of 83% (95% CI, 81.6%-84.5%) for a prevalence of 50% and a PPV of 62.8% (95% CI, 59.7%-66.5%) and an NPV of 98.9% (95% CI, 98.8%-99.0%) for a prevalence of 5%. The results based on the split in training and validation data sets revealed comparable results (eTable 11 in the Supplement).
Although the goal of this study was to identify general biomarkers for lung cancer, NSCLC and SCLC are biologically so different that we performed a classification in NSCLC and SCLC cases. The single miRNA best able to distinguish between NSCLC and SCLC was hsa-miR-30a-5p (mean [SD] on log2 scale, 6.696 [1.317] for NSCLC [n = 405] vs 5.125 [0.846] for SCLC [n = 157]) (adjusted P = 1.02 × 10−45), which had an AUROC of 0.844 (95% CI, 0.809-0.878). Using an 9-miRNA signature had an accuracy of 84.6% (95% CI, 83.7%-85.6%), a sensitivity of 90.1% (95% CI, 88.7%-91.4%), and a specificity of 70.6% (95% CI, 68.3%-72.9%) with the resampling approach (AUROC, 0.882; 95% CI, 0.870-0.893). We found a substantial overlap in the miRNAs present among the 3 diagnostic scenarios presented above, but none of the 9 miRNAs from the NSCLC vs SCLC comparison overlapped with any of the miRNAs from the 3 scenarios (Table 4). The results of the comparison are given in eTable 12 in the Supplement and the miRNA signature is provided in eTable 13 in the Supplement.
To identify potential batch effects and to test the reproducibility across time, we continuously measured 1 individual 41 times uniformly distributed more than 1000 consecutive measurements. An average Pearson correlation of 0.995 (95% CI, 0.9948-0.9952) was calculated, demonstrating the stability of this approach (eFigure 4 in the Supplement). We observed a moderate decrease in the diagnostic performance, with resampling AUROC values for patients with lung cancer vs all other individuals, those with lung cancer vs nontumor lung disease, and those with early-stage lung cancer vs non-lung cancer of 0.917 (95% CI, 0.912-0.922), 0.908 (95% CI, 0.904-0.912), and 0.886 (95% CI, 0.880-0.892), respectively.
This study on 3046 blood samples (from patients with lung cancer, nontumor lung diseases, diseases not affecting the lung, and unaffected controls) presents a comprehensive analysis of blood-borne miRNA changes in lung cancer. We identified miRNA signatures that have potential for (1) the differentiation of patients with lung cancer vs patients with nontumor lung diseases, those with diseases not affecting the lungs, and unaffected control participants; (2) the differentiation of patients with lung cancer vs patients with nontumor lung diseases; and (3) the differentiation of patients with early-stage lung cancer (Union for International Cancer Control stages I and II) vs patients with diseases other than lung cancer, and unaffected control participants. Although the present study focuses on symptomatic patients with lung cancer and the results cannot be directly translated to screening tests (eg, the low-dose CT screening established in the United States), screening represents a future application. However, this application requires studies with large cohorts to confirm miRNA signatures that yield high PPV and NPV in cohorts with low prevalence rates (<1%).
A significant overlap was observed in the signatures selected using the resampling approach (Table 4); we expected signatures used in resampling to be more stable than those selected by the models trained on only 50% of the data, which show only marginal overlap in signatures (eTable 14 in the Supplement). The signatures identified in scenarios 1 and 2 share 3 miRNAs, similarly, scenarios 2 and 3 had 3 miRNAs in common, and 6 common miRNAs were observed in scenarios 1 and 3. Several of the miRNAs that were found with altered abundances in at least 1 of the 3 scenarios have previously been associated with lung cancer, including hsa-miR-205-5p, hsa-miR-564, hsa-miR-1260b, and hsa-miR-1285-3p.19,32-34 Specific miRNAs have been previously found to have functional relevance for lung cancer (eg, hsa-miR-660-5p35) and other cancers (eg, miR-120236). However, one must be cautious in comparing studies that used different specimens, including whole blood or serum, different miRNA identifiers, and different approaches. eTable 15 in the Supplement provides an overview of selected studies, including a previous study with a much smaller cohort size,17 that shows that many of the miRNAs in the present study were relevant in previous studies.
Several challenges need to be overcome toward a clinical application of miRNA signatures for detection of lung cancer. Prospective studies with large cohorts of patients with specific diseases, a format that is readily applicable in clinical in vitro diagnostic tests (RT-qPCR or enzyme-linked immunosorbent assay [ELISA]37,38), and an evaluation of the extent to which the miRNA signatures may complement imaging, sputum cytology, or biopsy are needed for clinical application of miRNA for diagnosis of lung cancer. The general limitations of retrospective studies warrant a final prospective validation for such analyses.
Alternative strategies measuring molecules in exhaled breath condensate toward an early diagnosis of lung cancer.39,40 Bronchial genomic classifiers are commercially available, but they require invasive bronchoscopy for obtaining examination material. In addition, bronchial genomic classifiers yield high sensitivity but lack specificity (as shown in AEGIS trials).41,42 Besides miRNAs, there are also other promising blood-borne biomarkers, such as Zyxin or marker panels of serum proteins, all with their own advantages.43,44
This study has some limitations. One limitation is its retrospective design. The resulting patient distribution is different from a clinical application scenario. Although we estimated the influence of the prevalence on the expected PPVs and NPVs, these values need to be confirmed with a prospective validation study. Another limitation is the format used to measure the miRNA expression values. Microarrays are not readily applicable in clinical in vitro diagnostic tests and thus other formats need to be tested as discussed in the Clinical Application section.
We believe this study presents a standardized approach that could be used to identify symptomatic patients with lung cancer based on blood-borne miRNA signatures. Of note, patients with early-stage cancer may be distinguished from other patients, including those with nontumor lung disease and diseases other than cancer. The circulating biomarker test may complement imaging, sputum cytology, and biopsy tests in the future, after being validated by prospective studies.
Accepted for Publication: December 14, 2019.
Corresponding Author: Andreas Keller, PhD, Chair for Clinical Bioinformatics, Medical Faculty, Saarland University, Building E 2.1, Saarbrücken 66123, Germany (firstname.lastname@example.org).
Published Online: March 5, 2020. doi:10.1001/jamaoncol.2020.0001
Author Contributions: Messers Fehlmann and Kahraman contributed equally as co–first authors. Mr Kahraman, and Mr Fehlmann had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: V. Keller, Vogelmeier, Meyer, Metzger, Abdul-Khaliq, Volk, Bals, Meese, A. Keller.
Acquisition, analysis, or interpretation of data: Fehlmann, Kahraman, Ludwig, Backes, Galata, Geffers, Mercaldo, Hornung, Weis, Kayvanpour, Abu-Halima, Deuschle, Schulte, Suenkel, von Thaler, Maetzler, Herr, Fähndrich, Guimaraes, Hecksteden, Meyer, Metzger, Diener, Deutscher, Stehle, Haeusler, Meiser, Groesdonk, Volk, Lenhof, Katus, Balling, Meder, Krueger, Huwer, Bals, A. Keller.
Drafting of the manuscript: Fehlmann, Kahraman, Galata, V. Keller, Mercaldo, Abu-Halima, Hecksteden, Metzger, Abdul-Khaliq, Stehle, Meese, A. Keller.
Critical revision of the manuscript for important intellectual content: Fehlmann, Ludwig, Backes, Geffers, Hornung, Weis, Kayvanpour, Deuschle, Schulte, Suenkel, von Thaler, Maetzler, Herr, Fähndrich, Vogelmeier, Guimaraes, Hecksteden, Meyer, Metzger, Diener, Deutscher, Haeusler, Meiser, Groesdonk, Volk, Lenhof, Katus, Balling, Meder, Krueger, Huwer, Bals, Meese, A. Keller.
Statistical analysis: Fehlmann, Kahraman, Mercaldo, Bals, A. Keller.
Obtained funding: Vogelmeier, Balling, Krueger, Bals, A. Keller.
Administrative, technical, or material support: Ludwig, Backes, Geffers, Hornung, Deuschle, Schulte, Suenkel, von Thaler, Maetzler, Herr, Hecksteden, Meyer, Metzger, Diener, Deutscher, Stehle, Groesdonk, Katus, Balling, Meder, Huwer, Bals, Meese, A. Keller.
Supervision: V. Keller, Fähndrich, Meyer, Metzger, Groesdonk, Volk, Lenhof, Meese, A. Keller.
Conflict of Interest Disclosures: Mr Kahraman reported receiving personal fees from Hummingbird Diagnostics (HBDx) during the conduct of the study outside the submitted work. Dr Backes reported receiving personal fees from HBDx during the conduct of the study. Dr V. Keller reported receiving personal fees from HBDx during the conduct of the study. Dr Maetzler reported receiving grants from Neuroallianz during the conduct of the study; grants from the European Union, Janssen, and the Michael J Fox Foundation; grants and personal fees from Lundbeck; and personal fees from Abbvie outside the submitted work. Dr Fähndrich reported receiving nonfinancial support and fees for lectures from Grifols, CSL Behring, CSL Behring, AstraZeneca, Novartis, and BerlinChemie during the conduct of the study. Dr Vogelmeier reported receiving grants from the German Ministry of Research (BMBF), AstraZeneca, GlaxoSmithKline, Grifols, Novartis, Bayer-Schering, Merck Sharp & Dohme, and Pfizer during the conduct of the study; and personal fees from AstraZeneca, CSL Behring, Chiesi, GlaxoSmithKline, Grifols, Menarini, Mundipharma, and Novartis outside the submitted work. Dr Stehle reported receiving personal fees from Institut für medizinische Dokumentation, Gutachtenerstellung, Gesundheitsförderung und Qualitätssicherung, Roche, Novartis, MSD, and Boehringer Ingelheim outside the submitted work. Dr Meiser reported receiving personal fees from Pall Medical, Dahlhausen, Medtronics, and Sedana Medical outside the submitted work. Dr Katus reported receiving personal fees from Daiichi, AstraZeneca, and Bayer Vital outside the submitted work. Dr Balling reported being cofounder and holding shares in MEGENO, SARL, Information Technology for Translational Medicine SARL, and Theracule, SARL outside the submitted work. Dr Meder reported receiving grants from University of Heidelberg, the BMBF, German Centre for Cardiovascular Research, and Else Kröner Fresenius Stiftung (Excellence Fellowship) during the conduct of the study. Dr Kruger reported receiving grants from Fonds National de Recherche during the conduct of the study. Dr Bals reported receiving personal fees from AstraZeneca; grants from Boehringer Ingelheim, BMBF, Competence Network Asthma and COPD (ASCONET), and Schwiete Stiftung; and grants and personal fees from GlaxoSmithKline, Novartis, and CSL Behring outside the submitted work. Dr Meese reported receiving grants from German Cancer Aid during the conduct of the study. Dr A. Keller reported receiving grants and personal fees from HBDx during the conduct of the study and had patent US18445209P issued and licensed. No other disclosures were reported.
Funding/Support: The study was partially funded by HBDx, the BestAgeing grant (EU FP7), the Michael J Fox Foundation, and the Deutsche Krebshilfe. Dr Krueger was supported by grants from the Luxembourg National Research Fund (FNR) within the PEARL program (FNR; FNR/P13/6682797 to RK) and the National Centre for Excellence in Research on Parkinson's disease (NCER-PD; FNR/ NCER13/BM/11264123), and by the European Union’s Horizon 2020 research and innovation program under grant agreement No 692320 (WIDESPREAD; CENTRE-PD to Krueger).
Role of the Funder/Sponsor: Hummingbird Diagnostics played no part in the design or conduct of the study; HBDx played a part in collection of the data (HBDx contributed to RNA extraction, quality control of RNA, and microarray measurement). Hummingbird Diagnostics played no part in the analysis, interpretation of the data, preparation, and review of the manuscript. Hummingbird Diagnostics approved the final manuscript; HBDx did not play a role in the decision to submit the manuscript for publication.
Additional Contributions: Anna Anbarcilar, Diplom-Ing, Hannah Schroers, Cassandra Zabler, Christopher Osterhaus, Simon Stowasser, Dennis Noetzel, BSc, helped perform standard laboratory testing and were compensated for their contributions. Edanz Group (www.edanzediting.com/ac) provided editing assistance for a draft of this manuscript and were financially compensated by the internal funds of Saarland University.