[Skip to Content]
[Skip to Content Landing]
Views 168
Citations 0
Original Investigation
December 5, 2018

Use of Big Data to Estimate Prevalence of Defective DNA Repair Variants in the US Population

Author Affiliations
  • 1Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland
  • 2Academy Enrichment Program Scholar, Office of Intramural Training & Education, Office of the Director, National Institutes of Health, Bethesda, Maryland
  • 3Human Genetics Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland
JAMA Dermatol. Published online December 5, 2018. doi:10.1001/jamadermatol.2018.4473
Key Points

Question  Do databases of exome sequences reliably correlate with the prevalence of individuals with defective DNA repair?

Findings  In this molecular epidemiologic study examining 3 large exome sequence databases totaling more than 200 000 alleles, unexpectedly high frequencies were found of 2 mutations associated with xeroderma pigmentosum in DNA repair genes (XPF [ERCC4] p.P379S, 0.4% and XPC p.P334H, 0.3%). These frequencies estimate the presence of more than 8000 people with xeroderma pigmentosum in the United States with these mutations, yet only 4 individuals were clinically identified in this study.

Meaning  Unsuspected mutations in known genes with a predisposition for skin cancer may be responsible for some of the high frequency of skin cancers in the general population.

Abstract

Importance  Wide use of genomic sequencing to diagnose disease has raised concern about the extent of genotype-phenotype correlations.

Objective  To correlate disease-associated allele frequencies with expected and reported prevalence of clinical disease.

Design, Setting, and Participants  Xeroderma pigmentosum (XP), a recessive, cancer-prone, neurocutaneous disorder, was used as a model for this study. From January 1, 2017, to May 4, 2018, the Human Gene Mutation Database and a cohort of patients at the National Institutes of Health were searched and screened to identify reported mutations associated with XP. The clinical phenotype of these patients was confirmed from reports in the literature and National Institutes of Health medical records. The genetically predicted prevalence of disease based on frequency of known pathogenic mutations was compared with the prevalence of patients clinically diagnosed with phenotypic XP. Exome sequencing of more than 200 000 alleles from the Genome Aggregation Database, the National Cancer Institute Division of Cancer Epidemiology and Genetics database of healthy controls, and an Inova Hospital Study database was used to investigate the frequencies of these mutations in the general population.

Main Outcomes and Measures  Listing of all reported mutations associated with XP, their frequencies in 3 large exome sequence databases, determination of the number of patients in the United States with XP using modeling equations, and comparison of the observed and reported numbers of patients with XP with specific mutations.

Results  A total of 156 pathogenic missense and nonsense mutations associated with XP were identified in the National Institutes of Health cohort and the Human Gene Mutation Database. The Genome Aggregation Database provided frequency data for 65 of these mutations, with a total allele frequency of 1.13%. The XPF (ERCC4) mutation, p.P379S, had an allele frequency of 0.4%, and the XPC mutation, p.P334H, had an allele frequency of 0.3%. With the Hardy-Weinberg equation, it was determined that there should be more than 8000 patients who are homozygous for these mutations in the United States. In contrast, only 3 patients with XP were reported as having the XPF mutation, and 1 patient was reported as having the XPC mutation.

Conclusions and Relevance  The findings from this study suggest that clinicians should approach large genomic databases with caution when trying to correlate the clinical implications of genetic variants with the prevalence of disease risk. Unsuspected mutations in known genes with a predisposition for skin cancer may be responsible for some of the high frequency of skin cancers in the general population.

×