Key PointsQuestion
In patients with cancer, is the detection of pathogenic germline genetic variation improved by incorporation of automated deep learning technology compared with standard methods?
Findings
In this cross-sectional analysis of 2 retrospectively collected convenience cohorts of patients with prostate cancer and melanoma, more pathogenic variants in 118 cancer-predisposition genes were found using deep learning technology compared with a standard genetic analysis method (198 vs 182 variants identified in 1072 patients with prostate cancer; 93 vs 74 variants identified in 1295 patients with melanoma).
Meaning
The number of cancer-predisposing pathogenic variants identified in patients with prostate cancer and melanoma depends partially on the automated approach used to analyze sequence data, but further research is needed to understand possible implications for clinical management and patient outcomes.
Importance
Less than 10% of patients with cancer have detectable pathogenic germline alterations, which may be partially due to incomplete pathogenic variant detection.
Objective
To evaluate if deep learning approaches identify more germline pathogenic variants in patients with cancer.
Design, Setting, and Participants
A cross-sectional study of a standard germline detection method and a deep learning method in 2 convenience cohorts with prostate cancer and melanoma enrolled in the US and Europe between 2010 and 2017. The final date of clinical data collection was December 2017.
Exposures
Germline variant detection using standard or deep learning methods.
Main Outcomes and Measures
The primary outcomes included pathogenic variant detection performance in 118 cancer-predisposition genes estimated as sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The secondary outcomes were pathogenic variant detection performance in 59 genes deemed actionable by the American College of Medical Genetics and Genomics (ACMG) and 5197 clinically relevant mendelian genes. True sensitivity and true specificity could not be calculated due to lack of a criterion reference standard, but were estimated as the proportion of true-positive variants and true-negative variants, respectively, identified by each method in a reference variant set that consisted of all variants judged to be valid from either approach.
Results
The prostate cancer cohort included 1072 men (mean [SD] age at diagnosis, 63.7 [7.9] years; 857 [79.9%] with European ancestry) and the melanoma cohort included 1295 patients (mean [SD] age at diagnosis, 59.8 [15.6] years; 488 [37.7%] women; 1060 [81.9%] with European ancestry). The deep learning method identified more patients with pathogenic variants in cancer-predisposition genes than the standard method (prostate cancer: 198 vs 182; melanoma: 93 vs 74); sensitivity (prostate cancer: 94.7% vs 87.1% [difference, 7.6%; 95% CI, 2.2% to 13.1%]; melanoma: 74.4% vs 59.2% [difference, 15.2%; 95% CI, 3.7% to 26.7%]), specificity (prostate cancer: 64.0% vs 36.0% [difference, 28.0%; 95% CI, 1.4% to 54.6%]; melanoma: 63.4% vs 36.6% [difference, 26.8%; 95% CI, 17.6% to 35.9%]), PPV (prostate cancer: 95.7% vs 91.9% [difference, 3.8%; 95% CI, –1.0% to 8.4%]; melanoma: 54.4% vs 35.4% [difference, 19.0%; 95% CI, 9.1% to 28.9%]), and NPV (prostate cancer: 59.3% vs 25.0% [difference, 34.3%; 95% CI, 10.9% to 57.6%]; melanoma: 80.8% vs 60.5% [difference, 20.3%; 95% CI, 10.0% to 30.7%]). For the ACMG genes, the sensitivity of the 2 methods was not significantly different in the prostate cancer cohort (94.9% vs 90.6% [difference, 4.3%; 95% CI, –2.3% to 10.9%]), but the deep learning method had a higher sensitivity in the melanoma cohort (71.6% vs 53.7% [difference, 17.9%; 95% CI, 1.82% to 34.0%]). The deep learning method had higher sensitivity in the mendelian genes (prostate cancer: 99.7% vs 95.1% [difference, 4.6%; 95% CI, 3.0% to 6.3%]; melanoma: 91.7% vs 86.2% [difference, 5.5%; 95% CI, 2.2% to 8.8%]).
Conclusions and Relevance
Among a convenience sample of 2 independent cohorts of patients with prostate cancer and melanoma, germline genetic testing using deep learning, compared with the current standard genetic testing method, was associated with higher sensitivity and specificity for detection of pathogenic variants. Further research is needed to understand the relevance of these findings with regard to clinical outcomes.
Quiz Ref IDGermline genetic testing is increasingly used to identify a class of inherited genetic changes called pathogenic variants, which are associated with an increased risk of developing cancer and other diseases. The detection of pathogenic variants associated with cancer can identify patients and families with inherited cancer susceptibility in whom established gene-specific screening recommendations can be implemented.1,2 Furthermore, germline genetic testing of patients with cancer can identify pathogenic variant carriers who tend to have genetically determined greater response to chemotherapy and targeted anticancer agents.3,4 However, even when the clinical presentation is highly suggestive of a particular genetic cancer-predisposition syndrome, only a small fraction of patients are found to carry germline pathogenic variants,5-8 raising concern about the possibility of incomplete detection of known or expected pathogenic variants by the current standard germline variant detection methods, which includes the Genome Analysis Toolkit.9
Quiz Ref IDComputational methods that use deep learning neural networks, which incorporate layers of networks to learn and analyze complex patterns in data, have demonstrated superior performance compared with standard methods for disease recognition,10 pathological and radiological image analysis,11 and natural language processing.12 Deep learning methods also have shown enhanced germline variant detection compared with standard methods in laboratory samples with known genetic variation.13 However, it is unknown whether using deep learning approaches can result in identifying additional patients with pathogenic variants missed by the current standard analysis framework. In this study, it was hypothesized that deep learning variant detection would have higher sensitivity compared with the standard method for identifying clinically relevant pathogenic variants when applied to clinical samples from patients with cancer.
Ethics Approval and Consent to Participate
Written informed consent from patients and institutional review board approval, allowing comprehensive genetic analysis of germline samples, were obtained by the original studies that enrolled patients in the US and Europe between 2010 and 2017. The final date of clinical data collection was December 2017 (eMethods in Supplement 1). The secondary genomic and deep learning analyses performed for this study were approved under Dana-Farber Cancer Institute institutional review board protocols 19-139 and 02-293. This study conforms to the Declaration of Helsinki.
Patient Cohorts and Genomic Data Collection
Publicly available germline whole-exome sequencing data of 2 independent sets of published cohorts, each of which comprised convenience samples, were included in this study (Figure 1). One cohort of patients with prostate cancer was described by Armenia et al,14 and a second cohort of patients with melanoma was obtained from 10 publicly available data sets (eMethods in Supplement 1). All germline whole-exome sequencing data were generated by the original studies using paired-end, short-read Illumina platforms (Illumina Inc). Patient cohorts were not selected for a positive family history of cancer or early-onset disease. Germline genetic data of all cohorts were available for analysis. This analysis was not intended to change the management of the study cohorts.
Germline Variant Detection Methods
The Genome Analysis Toolkit (version 3.7),15 the most widely used germline variant detection method,16-18 was considered the standard method in this analysis and the DeepVariant method (version 0.6.0) was used to perform deep learning variant detection13,19 (Figure 1 and eFigure 1 in Supplement 1). The standard and deep learning methods were run using the recommended parameters.20,21 Details of the standard and deep learning analysis frameworks (along with the corresponding computer program codes) appear in the eNotes in Supplement 1.
Selection of Mendelian Gene Sets
In this study, pathogenic variants in 118 established cancer-predisposition genes and 59 mendelian high-penetrance genes deemed clinically actionable by the American College of Medical Genetics and Genomics (ACMG; collectively called the ACMG gene set) were analyzed (eTable 1 in Supplement 2). Patients with cancer can also be carriers of disease-causing variants in autosomal recessive and low-penetrant genes unrelated to cancer, leading to a gene product that has decreased or no function (termed putative loss of function; pLOF). Thus, pLOF variants in 5197 clinically relevant genes in the Online Mendelian Inheritance in Man (OMIM) database (collectively called the OMIM gene set) (eTable 1 in Supplement 2) and 12 clinically oriented multigene panels (eMethods in Supplement 1 and eTable 2 in Supplement 3) also were characterized.
Germline Variant Pathogenicity Evaluation
The identified germline variants in the cancer-predisposition gene set and in the ACMG gene set were independently classified by 2 clinical geneticists (S.H.A. and L.W.) into the 5 categories of benign, likely benign, variants of unknown significance, likely pathogenic, and pathogenic using established ACMG guidelines.22 Only pathogenic and likely pathogenic variants were included in this study (hereafter collectively referred to as pathogenic variants).
Validation of Identified Germline Variants
Given the lack of a criterion reference standard to independently validate the results of the standard genetic analysis method and the deep learning method at every position of each gene in each participant in this study, an established manual validation framework was adopted.23 In this framework, variants in the cancer-predisposition and ACMG gene sets were manually evaluated in an independent blind fashion by 3 examiners (S.H.A., J.R.C., and A.T.-W.) using a genetic data visualization tool called the Integrative Genomics Viewer.24 Identified pathogenic variants were judged to be valid (ie, true-positive) if they were deemed present in the raw genomic data by at least 2 of the 3 examiners. Otherwise, the variant was judged to be false-positive (eTable 3 in Supplement 1). Germline pLOF variants in the OMIM gene set and in the multigene panel analyses were judged to be valid by examining the independently sequenced tumor samples from the same patient for the presence of these variants (eMethods in Supplement 1). Genomic regions in which no pathogenic variants were identified by either method were not manually validated for the absence of variants and were assumed to be devoid of pathogenic variation.
Calculating the Performance of the 2 Methods
Quiz Ref IDBecause of the lack of an independently generated criterion reference standard variant set for the analyzed data, the set of variants generated by combining all variants judged to be valid from either the standard genetic analysis approach or the deep learning method was used as a reference to evaluate each method’s performance (hereafter referred to as the reference variant set). Consequently, true sensitivity and specificity could not be calculated due to the lack of a criterion reference standard. Therefore, the terms sensitivity and specificity were defined and calculated for method comparisons. Sensitivity was defined as the proportion of true-positive variants (ie, judged to be valid) identified by each method to the total number of true-positive variants in the reference variant set. Specificity was defined as the proportion of true-negative variants determined by each method to the total number of true-negative variants in the reference variant set (eTable 3 and eMethods in Supplement 1).
In addition, given the rare nature of disease-causing pathogenic variants, and to enhance study power, the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for the standard method and the deep learning method were calculated using 3 predefined gene sets (cancer predisposition, ACMG, and OMIM). In this study, PPV and NPV solely referred to the probability that a participant with a variant identified by these methods had (or did not have) the molecular genetic variant, not the clinical disease phenotype. Detailed definitions of the performance metrics used in this study appear in eTable 3 in Supplement 1. In addition, the performance of using the standard method and the deep learning method in tandem (combined methods) was evaluated by assessing the number of pathogenic and pLOF variants identified by either the standard method or the deep learning method and judged to be valid.
The primary outcomes were defined as the absolute number of identified pathogenic variants judged to be valid on manual review and the sensitivity, specificity, PPV, and NPV of the deep learning and standard methods in the cancer-predisposition gene set (Figure 1). Secondary outcomes included the absolute number of pathogenic and pLOF variants judged to be valid and the sensitivity, specificity, PPV, and NPV of each method using the ACMG, OMIM, and multigene panel sets (Figure 1).
Two-sided χ2 tests were used to calculate the P values and 95% CIs for the differences in sensitivity, specificity, PPV, and NPV for each method. In addition, 2-sided binomial tests were used to calculate the 95% CIs for the proportions. Two-sided Mann-Whitney tests were used to evaluate the differences in the sequencing depth of the examined genomic regions. For the analysis of the clinical characteristics in the examined cohorts, patients with clinical data not reported in the originating study were included in a “not reported” category for each clinical characteristic.
To evaluate the performance of the standard and deep learning methods, pathogenic variants in the cancer-predisposition and ACMG gene sets (total: 151 genes) were combined to calculate the receiver operating characteristic curve by using quality score thresholds for variants as follows. For each threshold, we used the standard and deep learning models to determine if the assessed variant was real or artifactual. These decisions were then compared with the results of the manual validation of these pathogenic variants and the true-positive, true-negative, false-positive, and false-negative rates were calculated (eTable 3 and eMethods in Supplement 1). An assumption was made that positions in the genome with no variants identified by the standard or deep learning approach were true-negative variants.
To evaluate the computational effects of defining true-positive variants as judged to be valid variants by all 3 examiners for the primary outcomes of this study, a post hoc analysis was conducted comparing the absolute number and the fraction of the manually validated pathogenic variants in the cancer-predisposition gene set identified by each method vs the total number of variants identified by each method using 2-sided χ2 tests. P values <.05 were considered statistically significant without adjustment for multiple comparisons. The findings for the secondary analyses should be interpreted as exploratory. Statistical analyses were performed using the exact2x2 (version 1.5.2), binom (version 1.1.1), and stats (version 3.5.1) packages on R version 3.5.1 (R Foundation for Statistical Computing).
Sequencing Metrics and Overall Germline Variant Detection
The mean age of diagnosis was 63.7 years (SD, 7.9 years) for the prostate cancer cohort (n = 1072 men) and was 59.8 years (SD, 15.6 years) for the melanoma cohort (n = 1295; 488 [37.7%] women) (Table 1). Germline variants in the prostate cancer and melanoma cohorts were analyzed using the standard and deep learning methods (Figure 1 and eFigure 1 in Supplement 1). The mean exome-wide sequencing depth was 105.78 reads (SD, 52.92 reads) for the prostate cancer cohort and was 86.85 reads (SD, 45.27 reads) for the melanoma cohort (eFigure 2 in Supplement 1). Of 37 373 535 germline genetic variants identified in 1072 germline prostate cancer exomes, 92.1% were identified by both the standard and deep learning methods (eFigure 3 in Supplement 1), demonstrating a discrepancy between these variant detection approaches.
Detection of Pathogenic Variants in the Cancer-Predisposition Gene Set
A total of 171 pathogenic variants were identified by both the standard and deep learning methods in 118 cancer-predisposition genes in the prostate cancer cohort (n = 1072). The deep learning method exclusively identified 36 pathogenic variants, of which 27 (75% [95% CI, 58.9% to 86.2%]) were judged to be valid (true-positive findings) and 9 (25% [95% CI, 13.8% to 41.1%]) were judged to be false-positive findings (Figure 2A, Table 2, and eTable 4 in Supplement 1). The standard method exclusively identified 27 pathogenic variants, of which 11 (40.7% [95% CI, 24.5% to 59.3%]) were judged to be valid (true-positive findings) and 16 (59.3% [95% CI, 40.7% to 75.5%]) were judged to be false-positive findings (Figure 2B, Table 2, and eTable 5 in Supplement 1).
In the prostate cancer cohort, the deep learning method had higher sensitivity compared with the standard method (94.7% vs 87.1%, respectively; difference, 7.6% [95% CI, 2.2% to 13.1%]; P = .006), higher specificity (64.0% vs 36.0%; difference, 28.0% [95% CI, 1.4% to 54.6%]; P = .047), and higher NPV (59.3% vs 25.0%; difference, 34.3% [95% CI, 10.9% to 57.6%]; P = .006) (Table 2). However, the PPV for the deep learning method was not significantly different from the standard method (95.7% vs 91.9%, respectively; difference, 3.8% [95% CI, –1.0% to 8.4%]; P = .11).
The pathogenic variants exclusively identified by the deep learning method in the cancer-predisposition genes were more likely to be judged valid on manual review compared with variants exclusively identified by the standard method (75.0% vs 40.7%, respectively; difference, 34.3% [95% CI, 10.9% to 57.6%]; P = .006). Overall, the deep learning method identified 16 more patients with prostate cancer who had pathogenic variants associated with elevated cancer risk that were missed by the standard method and were judged to be valid on manual review (Table 2).
To explore the generalizability of these findings, germline whole-exome sequencing data from 1295 patients with melanoma also were analyzed. The deep learning method identified more patients with pathogenic variants judged to be valid (true-positive findings) compared with the standard method (93 vs 74, respectively) and identified fewer pathogenic variants judged to be false-positive findings (78 vs 135) (Table 2 and eFigure 4 and eTables 6 and 7 in Supplement 1). The deep learning method had higher sensitivity compared with the standard method (74.4% vs 59.2%, respectively; difference, 15.2% [95% CI, 3.7% to 26.7%]; P = .01), higher specificity (63.4% vs 36.6%; difference, 26.8% [95% CI, 17.6% to 35.9%]; P < .001), higher PPV (54.4% vs 35.4%; difference, 19.0% [95% CI, 9.1% to 28.9%]; P < .001), and higher NPV (80.8% vs 60.5%; difference, 20.3% [95% CI, 10.0% to 30.7%]; P < .001) (Table 2). Furthermore, pathogenic variants in the cancer-predisposition genes exclusively identified by the deep learning method were significantly more likely to be judged valid on manual review compared with those exclusively identified by the standard method (39.5% vs 19.2%, respectively; difference, 20.3% [95% CI, 10.0% to 30.7%]; P < .001).
The use of both the deep learning and standard methods in tandem (combined methods) resulted in the highest number of manually validated pathogenic variants in the cancer-predisposition genes. In the prostate cohort, there were 182 pathogenic variants judged to be valid (true-positive) using the standard method, 198 variants using the deep learning method, and 209 variants using combined methods. In the melanoma cohort, there were 74 pathogenic variants judged to be valid using the standard method, 93 variants using the deep learning method, and 125 variants using combined methods (Table 2).
Pathogenic variants exclusively identified by the deep learning method and judged to be valid on manual review included a frameshift in RAD51D (OMIM: 602954) (p.Ala142GlnfsTer14; rs730881935) that is associated with a 6-fold increased risk of ovarian cancer25 (Figure 2C), a nonsense variant in BRIP1 (OMIM: 605882) (p.Arg581Ter; rs780020495) that is associated with a 14-fold increased risk for high-grade serous ovarian cancer26 (Figure 2D), a truncating variant in ATM (OMIM: 607585) (p.Arg1875Ter; rs376603775) that, in the heterozygous state, confers a 2 to 5 times higher risk for breast, colorectal, and gastric cancers7,27 (Figure 2E), and a stop-codon variant in SDHA (OMIM: 600857) (p.Arg512Ter; rs748089700) that is associated with a 40% chance of developing pheochromocytoma or paraganglioma by 40 years of age28 (eFigure 5 in Supplement 1). The standard method also exclusively identified several clinically relevant pathogenic variants judged to be valid, including a frameshift in ATM (OMIM: 607585) (p.Thr2333AsnfsTer6; rs587781299) and several splice donor variants in MSH6 (OMIM: 600678) (NM_000179.3:c.4001 + 2del) (eTables 5 and 6 in Supplement 1).
Although the use of a more stringent criterion (agreement of 3 examiners instead of 2; eMethods in Supplement 1) reduced the absolute number and fraction of valid (true-positive) pathogenic variants identified by each approach, the deep learning method still identified significantly more true-positive pathogenic variants than the standard method regardless of the criteria or cohorts used (62.5% vs 20%, respectively, for the prostate cancer cohort, P < .001 and 23.1% vs 8.48% for the melanoma cohort, P < .001; eFigure 6 in Supplement 1).
Detection of Pathogenic Variants in ACMG Genes
When examining the 59 ACMG genes in 1072 patients with prostate cancer, the deep learning method identified more patients with pathogenic variants judged to be valid (true-positive) on manual review than the standard method (111 vs 106, respectively) and identified fewer variants judged to be false-positive findings (10 vs 18) (eTables 8 and 9 in Supplement 1). The deep learning method achieved a higher specificity than the standard method (64.3% vs 35.7%, respectively; difference, 28.6% [95% CI, 3.5% to 53.7%]; P = .03). However, the sensitivity of the deep learning method was not significantly different than the standard method (94.9% vs 90.6%, respectively; difference, 4.3% [95% CI, –2.3% to 10.9%]; P = .21), nor was the PPV (91.7% vs 85.5%; difference, 6.2% [95% CI, –1.7% to 14.1%]; P = .12) or the NPV (75.0% vs 47.6%; difference, 27.4% [95% CI, –0.12% to 54.9%]; P = .06) (Table 2).
A similar analysis of the melanoma cohort (n = 1295) revealed that the deep learning method identified 12 more patients with clinically actionable pathogenic variants judged to be valid in the ACMG gene set compared with the standard method (48 vs 36, respectively) (eTables 10 and 11 in Supplement 1). The deep learning method yielded a higher sensitivity than the standard method (71.6% vs 53.7%, respectively; difference, 17.9% [95% CI, 1.82% to 34.0%]; P = .03). However, the specificity of the deep learning method was not significantly different than the standard method (49.2% vs 50.8%, respectively; difference, –1.6% [95% CI, –13.6% to 10.5%]; P = .81), nor was the PPV (41.7% vs 35.6%; difference, 6.1% [95% CI, –6.9% to 19.1%]; P = .36) or the NPV (77.4% vs 68.4%; difference, 9.0% [95% CI, –3.8% to 21.9%]; P = .17) (Table 2).
The use of both the deep learning and standard methods in tandem (combined methods) achieved a higher detection rate than using either of these methods independently in the ACMG gene set. In the prostate cohort, there were 106 pathogenic variants judged to be valid using the standard method, 111 using the deep learning method, and 117 using the combined methods. In the melanoma cohort, there were 36 pathogenic variants judged to be valid using the standard method, 48 using the deep learning method, and 67 using the combined methods (Table 2).
The pathogenic variants exclusively identified by the deep learning method in the ACMG genes and judged to be valid on manual review included truncating pathogenic variants in ATP7B (OMIM: 606882) (Figure 2F and 2G), a gene associated with Wilson disease and fatal liver failure, and COL3A1 (OMIM: 120180) (Figure 2H), a gene associated with autosomal dominant vascular Ehlers-Danlos syndrome complicated by early-onset aortic dissection, viscus rupture, and premature mortality. The standard method exclusively identified pathogenic variants in the ACMG genes that were judged to be valid, including pathogenic variants in SCN5A (OMIM: 600163) that are associated with cardiac arrhythmias and variants in OTC (OMIM: 300461) associated with severe metabolic hyperammonemia.
Evaluation of Model Performance
In the analyses that combined pathogenic variants in the cancer-predisposition and ACMG gene sets, the deep learning method had an area under the curve value of 0.94 (95% CI, 0.91 to 0.97) compared with 0.89 (95% CI, 0.84 to 0.93) for the standard method in the prostate cancer cohort and of 0.76 (95% CI, 0.71 to 0.81) for the deep learning method compared with 0.60 (95% CI, 0.55 to 0.67) for the standard method in the melanoma cohort (Figure 3 and eFigures 7 and 8 in Supplement 1). Results of other model performance metrics appear in eTable 12 in Supplement 1.
Detection of pLOF Variants in 5197 Clinically Relevant Mendelian Genes
Among 286 patients with prostate cancer whose tumors were available for independent validation, more germline pLOF variants validated in the tumor sequencing data were identified by the deep learning method than by the standard method (708 vs 675, respectively), resulting in higher sensitivity (99.7% vs 95.1%; difference, 4.6% [95% CI, 3.0% to 6.3%]; P < .001), lower specificity (11.8% vs 88.2%; difference, –76.4% [95% CI, −89.0% to −64.0%]; P < .001), and lower PPV (94.0% vs 99.1%; difference, –5.1% [95% CI, −6.9% to −3.3%]; P < .001) (Table 2). The NPV for the deep learning method was not significantly different from the standard method (75.0% vs 56.3%, respectively; difference, 18.7% [95% CI, –13.2% to 50.7%]; P = .31).
Similarly, in the melanoma cohort (n = 1295), more germline pLOF variants validated in the tumor sequencing data were identified by the deep learning method than by the standard method (619 vs 582, respectively), resulting in higher sensitivity (91.7% vs 86.2%; difference, 5.5% [95% CI, 2.2% to 8.8%]; P = .001), lower specificity (30.8% vs 69.2%; difference, –38.4% [95% CI, −43.6% to −33.3%]; P < .001), and lower PPV (59.4% vs 75.6%; difference, –16.2% [95% CI, −20.4% to −11.9%]; P < .001). The NPV for the deep learning method was not significantly different from the standard method (77.0% vs 82.0%, respectively; difference, –5.0% [95% CI, –11.2% to 1.3%], P = .11; Table 2).
Detection of pLOF Variants in 12 Commonly Used Clinical Multigene Panels
Among 286 patients with prostate cancer, the deep learning method vs the standard method identified pLOF variants that were judged to be in the following clinical multigene panels: cardiovascular disorders (36 vs 34, respectively), ciliopathies (43 vs 40), dermatological disorders (24 vs 23), hearing loss (33 vs 33), hematological disorders (38 vs 36), mitochondrial disorders (49 vs 48), neurological disorders (178 vs 173), neuromuscular disorders (33 vs 32), prenatal screening (118 vs 110), pulmonary disorders (19 vs 18), kidney disorders (48 vs 44), and retinal disorders (232 vs 223) (eFigures 9A and 10A in Supplement 1).
In the melanoma cohort (n = 1295), the deep learning method vs the standard method identified pLOF variants that were judged to be valid in the following clinical multigene panels: cardiovascular disorders (45 vs 44, respectively), ciliopathies (34 vs 37), dermatological disorders (19 vs 18), hearing loss (41 vs 39), hematological disorders (31 vs 30), mitochondrial disorders (32 vs 27), neurological disorders (162 vs 155), neuromuscular disorders (30 vs 23), prenatal screening (107 vs 104), pulmonary disorders (17 vs 17), kidney disorders (39 vs 41), and retinal disorders (212 vs 204) (eFigure 9B and 10B in Supplement 1).
Properties of Pathogenic Variants Exclusively Identified by 1 Method
The deep learning method vs the standard method identified 36 vs 19 frameshift variants, respectively, 31 vs 17 stop codon variants, and 40 vs 11 canonical splice-site variants that were judged to be valid (eFigure 11 in Supplement 1). For pathogenic variants exclusively identified using the deep learning method, there were false-positive variants identified in genes with poor sequencing coverage compared with true-positive variants (mean [SD], 7.1 [6.8] reads vs 21.4 [35.6] reads, respectively; P < .001).
In contrast, for pathogenic variants exclusively identified by the standard method, false-positive variants had similarly sufficient sequencing coverage to the true-positive variants (mean [SD], 44.3 [69.6] reads vs 43.1 [59.4] reads, respectively, P = .46; eFigure 12 in Supplement 1). In addition, even though the deep learning and standard methods identified the same number of common variants (minor allele frequency >1%), the mean number of additional rare variants (minor allele frequency <1%) identified by the deep learning method in each patient was 49.6 (95% CI, 46.7 to 52.7) variants per exome in the prostate cancer cohort and 101.2 (95% CI, 95.8 to 106.9) variants per exome in the melanoma cohort (eFigures 13A and 13B in Supplement 1).
Quiz Ref IDAnalysis of pathogenic variant detection in 2 cohorts of individuals with prostate cancer and melanoma showed that a deep learning method identified more pathogenic variants in cancer-predisposition genes that were judged to be valid (true-positive) than the current standard method, resulting in higher sensitivity, specificity, PPV, and NPV. However, these findings also demonstrated that the deep learning and standard methods were complementary in that the application of both approaches to the sequence data yielded the highest number of pathogenic variants judged to be valid.
Quiz Ref IDIdentification of pathogenic variants has substantial clinical implications for pathogenic variant carriers and their at-risk family members. For example, the National Comprehensive Cancer Network recommends offering risk-reducing salpingo-oophorectomy before the age of 50 years to female carriers of pathogenic variants in RAD51D (OMIM: 602954) or BRIP1 (OMIM: 605882),29 similar to those discovered only by the deep learning method used in this analysis. In addition, it is recommended to consider a more intensive breast cancer screening approach (using breast magnetic resonance imaging starting at the age of 40 years) for female carriers of pathogenic germline ATM (OMIM: 607585) variants.29 This clinical actionability also extends to many noncancer pathogenic variants exclusively discovered by the deep learning method, including those in ATP7B (OMIM: 606882) because presymptomatic initiation of chelating therapy can effectively prevent the life-threatening complications of Wilson disease30 and those in the multigene panels because any additional germline analysis yield may translate into more patients benefiting from the clinical utility of establishing a molecular diagnosis.
Overall, these findings suggested that although both methods had comparable performance for detecting common variants, the deep learning method had a higher sensitivity for detecting rare pathogenic variants, an observation that can be explained by examining the underlying approach of each method. The standard method uses joint genotyping, which leverages population-wide information from all analyzed samples and high-quality population-based data sets, such as 1000 Genomes31 and dbSNP,32 to determine the quality of each identified variant. Although this approach enables the standard method to effectively identify variants that are seen frequently in the analyzed and reference genomic data sets (ie, relatively common in the population), joint genotyping, and the subsequent filtering step, called variant quality score recalibration, are inherently biased toward filtering out variants that are very rare (ie, only encountered once in the analyzed data set).
Because 97.3% of the cancer-predisposition variants had an allele frequency of less than 1:10 000,33 exceptionally large patient cohorts are needed to effectively use the power of joint genotyping on these ultrarare variants. Conversely, the deep learning method used in this article evaluates the sequencing images of each variant individually using deep neural networks, thus mimicking the standard workflow of how geneticists assess the evidence supporting genetic variants in each sample independently.13,24 In addition to having a higher sensitivity and specificity compared with the standard method, this sample-based analysis approach also avoids the n + 1 problem for clinical genetics laboratories in which all cohort samples need to be jointly reanalyzed every time a new sample is added to the study. Such an approach has been proven to be extremely impractical, resource intensive, and time-consuming, especially for large research studies such as the gnomAD database.34
Collectively, this analysis of 2367 germline exomes of patients with cancer consistently showed a higher molecular diagnostic yield for deep learning–based germline pathogenic variant analysis compared with standard methods, regardless of the examined gene set. The higher sensitivity of the deep learning–based method may also lead to an improved ability to uncover novel gene-disease associations in already existing genomic data sets. However, the deep learning–based method was not able to detect all manually validated pathogenic variants in the analyzed data sets, thus a hybrid variant detection approach may achieve a higher sensitivity.
This study has several limitations. First, this study only included patients with cancer diagnoses, so the performance of the deep learning method may change when used for patients affected by other conditions. Second, this study largely included patients with European ancestry, and further studies are needed to evaluate the molecular diagnostic yield increment in other ancestral groups. Third, this analysis used convenience cohorts with limited available clinical outcomes, so prospective studies are needed to further evaluate the effect of deep learning variant detection on clinical outcomes. Fourth, given the lack of a practical independent validation process for all examined genomic positions, some true pathogenic and pLOF variants could have been potentially missed by both methods.
Fifth, this study used the best practices of the standard method so analysis frameworks using alternative settings, or a modified version of the standard method, may have different performance rates for pathogenic variant detection. Sixth, this study does not evaluate the performance of these methods on genetic data generated using technologies other than the paired-end, short-read Illumina platform. Seventh, although the prostate cancer and melanoma patient cohorts were used to calculate the PPV and NPV, and for whom germline analysis of the cancer-predisposition genes are frequently performed, these patient cohorts are not commonly tested for the noncancer ACMG or OMIM gene sets, so the calculated PPV and NPV for these gene sets may not represent the actual PPV and NPV of the standard and deep learning methods for patients in whom testing these gene sets is indicated. Eighth, the calculated PPV and NPV for these 2 methods highlight the probability of having the molecular genetic change, not the clinical disease, and were calculated using gene sets, so nucleotide-based and gene-based values may differ.
Among a convenience sample of 2 independent cohorts of patients with prostate cancer and melanoma, germline genetic testing using deep learning, compared with the current standard genetic testing method, was associated with higher sensitivity and specificity for detection of pathogenic variants. Further research is needed to understand the relevance of these findings with regard to clinical outcomes.
Corresponding Author: Eliezer M. Van Allen, MD, Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, 360 Longwood Ave, LC9329, Boston, MA 02215 (eliezerm_vanallen@dfci.harvard.edu).
Accepted for Publication: October 6, 2020.
Author Contributions: Drs AlDubayan and Van Allen had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: AlDubayan, Conway, Al-Rubaish, Al-Sulaiman, Al-Ali, Taylor-Weiner, Van Allen.
Acquisition, analysis, or interpretation of data: AlDubayan, Conway, Camp, Witkowski, Kofman, Reardon, Han, Moore, Elmarakeby, Salari, Choudhry, Al-Sulaiman, Taylor-Weiner, Van Allen.
Drafting of the manuscript: AlDubayan, Conway, Camp, Han, Taylor-Weiner, Van Allen.
Critical revision of the manuscript for important intellectual content: AlDubayan, Conway, Witkowski, Kofman, Reardon, Moore, Elmarakeby, Salari, Choudhry, Al-Rubaish, Al-Sulaiman, Al-Ali, Taylor-Weiner, Van Allen.
Statistical analysis: AlDubayan, Conway, Camp, Kofman, Reardon, Han, Elmarakeby, Salari, Choudhry, Taylor-Weiner, Van Allen.
Obtained funding: AlDubayan, Van Allen.
Administrative, technical, or material support: Moore, Al-Rubaish, Al-Sulaiman, Al-Ali, Van Allen.
Supervision: AlDubayan, Taylor-Weiner, Van Allen.
Conflict of Interest Disclosures: Dr Moore reported receiving personal fees from Immunity Health. Dr Van Allen reported serving on advisory boards or as a consultant to Tango Therapeutics, Genome Medical, Invitae, Illumina, Manifold Bio, Monte Rosa Therapeutics, and Enara Bio; receiving personal fees from Invitae, Tango Therapeutics, Genome Medical, Ervaxx, Roche/Genentech, and Janssen; receiving research support from Novartis and Bristol-Myers Squibb; having equity in Tango Therapeutics, Genome Medical, Syapse, Enara Bio, Manifold Bio, and Microsoft; receiving travel reimbursement from Roche and Genentech; and filing institutional patents (for ERCC2 variants and chemotherapy response, chromatin variants and immunotherapy response, and methods for clinical interpretation). No other disclosures were reported.
Funding/Support: This work was supported by Conquer Cancer Foundation Career Development Award 13167 from the American Society of Clinical Oncology (awarded to Dr AlDubayan), Young Investigator Award 18YOUN02 from the Prostate Cancer Foundation (awarded to Dr AlDubayan), the Challenge Award from the PCF-V Foundation (awarded to Dr Van Allen), the Emerging Leader Award from the Mark Foundation (awarded to Dr Van Allen), grant R01CA222574 from the National Institutes of Health (awarded to Dr Van Allen), and grant 12-MED2224-46 (for science and technology) from King Abdulaziz City (awarded to Drs Al-Rubaish, Al-Sulaiman, and Al-Ali).
Role of the Funder/Sponsor: The funders/sponsors had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Additional Contributions: We thank all the individuals who participated in this study. We also thank Eric Banks, PhD (data sciences platform, Broad Institute of Massachusetts Institute of Technology and Harvard University; no compensation was received), for his valuable insight into the underlying model of the Genome Analysis Toolkit and for his comments on the results of this study. We also thank Jeff Kohlwes, MD, MPH (general internal medicine, University of California, San Francisco; no compensation was received), Aaron Neinstein, MD (endocrinology and clinical informatics, University of California, San Francisco; no compensation was received), and Tara Vijayan, MD (infectious disease, University of California, Los Angeles; no compensation was received), for their feedback on the content in the manuscript.
Additional Information: The results are based, in part, on data generated by the Cancer Genome Atlas managed by the National Cancer Institute and the National Human Genome Research Institute. Information about the Cancer Genome Atlas can be found at http://cancergenome.nih.gov. The raw sequence data can be obtained through dbGaP (https://www.ncbi.nlm.nih.gov/gap) or as described in the original articles (details appear in the Methods section). All software tools used in this study are publicly available.
2.AlDubayan
SH. Leveraging clinical tumor-profiling programs to achieve comprehensive germline-inclusive precision cancer medicine.
JCO Precision Oncology. 2019;3:1-3. doi:
10.1200/PO.19.00108Google ScholarCrossref 5.AlDubayan
SH, Pyle
LC, Gamulin
M,
et al; Regeneron Genetics Center (RGC) Research Team. Association of inherited pathogenic variants in checkpoint kinase 2 (CHEK2) with susceptibility to testicular germ cell tumors.
JAMA Oncol. 2019;5(4):514-522. doi:
10.1001/jamaoncol.2018.6477PubMedGoogle ScholarCrossref 12.Wu
Y, Schuster
M, Chen
Z,
et al. Google’s Neural Machine Translation system: bridging the gap between human and machine translation. Published September 26, 2016. Accessed October 13, 2020.
https://arxiv.org/abs/1609.08144 18.Tennessen
JA, Bigham
AW, O’Connor
TD,
et al; Broad GO; Seattle GO; NHLBI Exome Sequencing Project. Evolution and functional impact of rare coding variation from deep sequencing of human exomes.
Science. 2012;337(6090):64-69. doi:
10.1126/science.1219240PubMedGoogle ScholarCrossref 22.Richards
S, Aziz
N, Bale
S,
et al; ACMG Laboratory Quality Assurance Committee. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.
Genet Med. 2015;17(5):405-424. doi:
10.1038/gim.2015.30PubMedGoogle ScholarCrossref 25.Loveday
C, Turnbull
C, Ramsay
E,
et al; Breast Cancer Susceptibility Collaboration (UK). Germline mutations in
RAD51D confer susceptibility to ovarian cancer.
Nat Genet. 2011;43(9):879-882. doi:
10.1038/ng.893PubMedGoogle ScholarCrossref 26.Ramus
SJ, Song
H, Dicks
E,
et al; AOCS Study Group; Ovarian Cancer Association Consortium. Germline mutations in the
BRIP1, BARD1, PALB2, and
NBN genes in women with ovarian cancer.
J Natl Cancer Inst. 2015;107(11):djv214. doi:
10.1093/jnci/djv214PubMedGoogle Scholar 28.Bausch
B, Schiavi
F, Ni
Y,
et al; European-American-Asian Pheochromocytoma-Paraganglioma Registry Study Group. Clinical characterization of the pheochromocytoma and paraganglioma susceptibility genes
SDHA, TMEM127, MAX, and
SDHAF2 for gene-informed prevention.
JAMA Oncol. 2017;3(9):1204-1212. doi:
10.1001/jamaoncol.2017.0223PubMedGoogle ScholarCrossref 31.1000 Genomes Project Consortium. A global reference for human genetic variation.
Nature. 2015;526(7571):68-74.
Google ScholarCrossref 32.Sherry
ST, Ward
M, Sirotkin
K. dbSNP—database for single nucleotide polymorphisms and other classes of minor genetic variation.
Genome Res. 1999;9:677-679.
Google Scholar 33.Kobayashi
Y, Yang
S, Nykamp
K, Garcia
J, Lincoln
SE, Topper
SE. Pathogenic variant burden in the ExAC database: an empirical approach to evaluating population data for clinical variant interpretation.
Genome Med. 2017;9(1):13. doi:
10.1186/s13073-017-0403-7PubMedGoogle ScholarCrossref