A, Manhattan plot in the discovery buccal set (n = 400) samples. A total of 1501 smoking-associated CpGs passed a Bonferroni threshold of approximately 10−7 (indicated by red dotted line). B, The methylation β value (y-axis) of the top-ranked CpG, mapping to the gene body of AHRR, as a function of smoking pack-years (x-axis). P value from a linear regression is given. C, Manhattan plot in the replication set of 390 buccal samples. D, Scatterplot of the linear regression DNA methylation t statistics of the 1501 smoking-associated CpGs in the discovery set (x-axis) against those in the replication set (n = 390). P value of agreement is from a Fisher exact test. Vertical green dashed lines indicate the level of significance as given by the Bonferroni threshold in the discovery set. Horizontal green dashed lines indicate a level of significance of P = .05 in the replication set.
A, Heat map of relative DNAme values (CpG methylation β values were standardized to mean zero and unit variance across the 400 buccal samples) of the 1501 smoking-associated CpGs separated according to hypermethylation and hypomethylation as indicated. The 400 buccal samples were ordered according to the smoking index, a measure of the deviation in DNAme from a normal reference (here buccal cells from nonsmokers). B, Results of the gene set enrichment analysis assessing enrichment of genes or transcription factor binding sites among the hypermethylated and hypomethylated CpG categories. We give the odds ratios and enrichment P values (1-tailed Fisher test) of some of the enriched categories. Enrichment was assessed for hypermethylated CpGs and hypomethylated CpGs separately. Blue color indicates enrichment in the hypermethylated class compared with CpGs not associated with smoking; orange indicates enrichment in the hypomethylated class compared with CpGs not associated with smoking.
A, Box-and-whisker plots comparing the smoking index of cancers (C) to their respective normal tissue (N) for 8 independent data sets encompassing the following cancers: LSCC (lung squamous cell carcinoma), LUAD1 and LUAD2 (lung adenocarcinoma data sets 1 and 2), HNSC (head and neck squamous carcinoma), ESCA (esophageal carcinoma), EC (endometrial cancer), BRCA (breast cancer), and BLCA (bladder cancer). The number of samples in each category is below the x-axis. P values are from a Wilcoxon rank sum test. The smoking index for each sample was computed using the 1501 smoking-associated CpGs derived from the discovery set of 400 buccal samples. The horizontal line in the middle of each box indicates the median, while the top and bottom borders of the box mark the 75th and 25th percentiles, respectively. The horizontal lines above and below the box mark deviations from the median given by 1.5 times the interquartile range. The points beyond these horizontal lines define outliers. B, Corresponding receiver operating curve (ROC) and area under the curve (AUC) analysis for each of the 8 data sets and for the smoking indices derived from the original 400 buccal samples (BC) (brown), the 152 matched buccal samples (orange), the 152 matched blood samples (WB) (blue), and a random 1501-CpG signature (black). C, Comparison of the smoking index (as calculated using the smoking DNA methylation signature derived from buccal cells) of preinvasive lung lesions that regress with those that progress to lung cancer. Wilcoxon rank sum test P value is given. D, The ability of the smoking index to discriminate regressors from progressors in an ROC and AUC analysis. The AUC plus 95% confidence interval is given.
eFigure 1. Flowchart Figure
eFigure 2. Correlation between smoking pack years and the time of last quit before sample collection
eFigure 3. DNA methylation reversal for AHRR
eFigure 4. Singular Value Decomposition analysis of the discovery set DNA methylation data matrix of 400 buccal samples and 479,491 CpGs
eFigure 5. Correction for cellular heterogeneity using RefFreeEWAS in the discovery buccal sample set (n=400)
eFigure 6. Comparison of Buccal and Whole Blood smoking DNA methylation signatures
eFigure 7. Linear correlation between smoking index and smoking pack years
eFigure 8. The Smoking Index is aggravated in cancer
eFigure 9. The Smoking Index across normal/cancer sets (part-1), as evaluated by restricting to four different CpG subsets from the full 1501 smoking-associated DNAme signature
eFigure 10. The Smoking Index across normal/cancer sets (part-2), as evaluated by restricting to four different CpG subsets from the full 1501 smoking-associated DNAme signature
eFigure 11. The Smoking Index as evaluated in endometrial carcinogenesis
eFigure 12. The Smoking Index evaluated in a series of 152 cytologically normal cervical smear samples
eFigure 13. Functional significance of smoking DNAme signature
eFigure 14. The Smoking Index from three GSEA-enriched DNAme subsignatures in the normal cancer data sets (part-1)
eFigure 15. The Smoking Index from three GSEA-enriched DNAme subsignatures in the normal cancer data sets (part-2)
eFigure 16. Prediction of smoking status using DNA methylation profiles based on an elastic net classifier
eFigure 17. Prediction of smoking status from buccal DNA methylation profiles using an elastic net classifier, and using a different training/test set partition of the 790 buccal samples
eTable 1. Statistics of association of the 1501 smoking-associated CpGs
eTable 2. RefFreeEWAS selected CpGs
eTable 3. Gene Set Enrichment Analysis summary table of the hypermethylated smoking-associated CpGs
eTable 4. Gene Set Enrichment Analysis summary table of the hypomethylated smoking-associated CpGs
eTable 5. Enrichment Analysis Table of Transcription Factor (TF) Binding Sites
eTable 6. Smoking associated fold-expression changes in non-tumour lung tissue of smoking associated CpGs
Customize your JAMA Network experience by selecting one or more topics from the list below.
Teschendorff AE, Yang Z, Wong A, et al. Correlation of Smoking-Associated DNA Methylation Changes in Buccal Cells With DNA Methylation Changes in Epithelial Cancer. JAMA Oncol. 2015;1(4):476–485. doi:10.1001/jamaoncol.2015.1053
The utility of buccal cells as an epithelial source tissue for epigenome-wide association studies (EWASs) remains to be demonstrated. Given the direct exposure of buccal cells to potent carcinogens such as smoke, epigenetic changes in these cells may provide insights into the development of smoke-related cancers.
To perform an EWAS in buccal and blood cells to assess the relative effect of smoking on the DNA methylation (DNAme) patterns in these cell types and to test whether these DNAme changes are also seen in epithelial cancer.
Design, Setting, and Participants
In 2013, we measured DNAme at more than 480 000 CpG sites in buccal samples provided in 1999 by 790 women (all aged 53 years in 1999) from the United Kingdom Medical Research Council National Survey of Health and Development. This included matched blood samples from 152 women. We constructed a DNAme-based smoking index and tested its sensitivity and specificity to discriminate normal from cancer tissue in more than 5000 samples.
Main Outcomes and Measures
CpG sites whose DNAme level correlates with smoking pack-years, and construction of an associated sample-specific smoking index, which measures the mean deviation of DNAme at smoking-associated CpG sites from a normal reference.
In a discovery set of 400 women, we identified 1501 smoking-associated CpG sites at a genome-wide significance level of P < 10−7, which were validated in an independent set of 390 women. This represented a 40-fold increase of differentially methylated sites in buccal cells compared with matched blood samples. Hypermethylated sites were enriched for bivalently marked genes and binding sites of transcription factors implicated in DNA repair and chromatin architecture (P < 10−10). A smoking index constructed from the DNAme changes in buccal cells was able to discriminate normal tissue from cancer tissue with a mean receiver operating characteristic area under the curve of 0.99 (range, 0.99-1.00) for lung cancers and of 0.91 (range, 0.71-1.00) for 13 other organs. The corresponding area under the curve of a smoking signature derived from blood cells was lower than that derived from buccal cells in 14 of 15 cancer types (Wilcoxon signed rank test, P = .001).
Conclusions and Relevance
These data point toward buccal cells as being a more appropriate source of tissue than blood to conduct EWASs for smoking-related epithelial cancers.
Exposure to tobacco smoke is one of the best-known and potent risk factors for many diseases, including epithelial cancers and notably lung cancer.1,2 Mortality rates are set to increase, with more than 1 billion expected tobacco-related deaths during this century.3,4 There is, therefore, an urgent need to advance our understanding of tobacco-related cancer etiology.5,6
Recent work has demonstrated a role for epigenetic changes, especially changes in DNA methylation (DNAme), during the earliest stages of carcinogenesis.7-14 It is therefore important to study the potential impact of tobacco smoke exposure on the epigenome. Although several epigenome-wide association studies (EWASs) conducted in whole blood have been performed that have identified smoking-related DNAme changes centered around specific genes, eg, AHRR,15-19 these associations are relatively few in number and the role of these alterations in cancer etiology is unclear.
We hypothesized that a more natural and relevant tissue in which to perform an EWAS for smoking would be buccal cells because this constitutes an easily accessible source of epithelial cells with direct exposure to tobacco smoke. In particular, we posited that buccal cells would provide us with a more relevant tissue than blood to conduct an EWAS aimed at understanding epigenetic misprogramming in epithelial cancer. Specifically, we set out to explore the hypothesis that DNAme marks measured in buccal cells, and which correlate with a measure of the cumulative exposure to smoke, may exhibit similar changes in epithelial cancers, especially in those cancers strongly linked to smoking.
To test these hypotheses, we conducted an EWAS in 790 buccal samples collected from women within the Medical Research Council National Survey of Health and Development (NSHD),20,21 a longitudinal birth cohort with extensive epidemiological data, including detailed information on smoking history and smoking status at sample collection.22 To assess the relevance of smoking-associated DNAme changes in cancer, we further analyzed genome-wide DNAme data from more than 5000 samples encompassing normal, preneoplastic, and cancer tissue from 15 different epithelial cancer types.
The purpose of the research was to assess the suitability of buccal cells as an epithelial source of tissue to examine the effects of smoking on the epigenome, and to test whether these effects are also seen in smoke-related epithelial cancers.
Smoking is associated with widespread changes in the DNA methylation landscape of buccal cells.
Some smoking-associated DNA methylation changes are common to buccal and blood tissue, but buccal cells exhibit significantly more changes than blood cells.
Smoking-associated DNA methylation changes in buccal cells correlate with DNA methylation changes in epithelial cancers and do so most strongly in smoke-related epithelial cancer.
The overall analysis strategy is summarized in eFigure 1 in the Supplement.
In 2013, we analyzed buccal samples that had been provided in 1999 by 790 women enrolled in the NSHD, a birth cohort study of men and women all born in Britain in March 1946.22,23 All women gave written informed consent for their samples to be used in genetic studies of health, and the Central Manchester Ethics Committee approved the use of these samples for epigenetic studies of health in 2012. Women were selected from those who provided a buccal and blood sample at age 53 years in 1999, who had not previously developed any cancer, and who had complete information on epidemiological variables of interest and follow-up (Table 1). For 152 of the 790 women, we also analyzed a matched whole blood sample.
Samples from preinvasive lung lesions were taken from a cohort described recently.24,25 A subset of 24 laser-microdissected samples, consisting of lesions that did (n = 19) and did not (n = 5) progress to invasive lung cancer (all assessed by means of bronchoscopy) and that were matched for smoking pack-years (SPY), was used. In addition, 21 normal lung samples (bronchial brushings) from individuals at high risk of developing lung cancer were taken from anatomical sites with no documented history of preinvasive lesions. See eMethods in the Supplement for details regarding the data sets used.
Methylation analysis was performed on DNA from 790 buccal and 152 blood samples using the Illumina Infinium Human Methylation450 BeadChip array.26,27 The methylation status of a specific CpG site was calculated from the intensity of the methylated (M) and unmethylated (U) alleles, as the ratio of fluorescent signals β = max(M,0)/[max(M,0) + max(U,0) + 100]. On this scale, 0 < β < 1, with β values close to 1 (0) indicating 100% methylation (no methylation). Data were processed and normalized, correcting for type 2 probe bias,28,29 and using a quality control pipeline that assesses the nature of the largest components of variation30 (see eMethods in the Supplement). DNA from preinvasive lung lesions and normal adjacent tissue was extracted from fresh frozen laser capture microdissected sections (or bronchial brushings from controls), and genome-wide DNAme profiles were obtained using the Methylation450 BeadChip.
A discovery set was defined by randomly selecting 400 of the 790 buccal samples, and linear regression analyses adjusted for bisulfite conversion efficiency were used to identify CpGs correlating significantly with SPY. The Bonferroni threshold was subsequently used to define a 1501-CpG smoking-associated DNAme signature. Gene set enrichment analysis (GSEA) of this signature was done using the Molecular Signatures Database,31 using 1-tailed Fisher exact tests, as done previously.30 The 1501 smoking-associated CpGs were then used to construct a sample-specific smoking index, which measures the deviation of DNAme in a given sample from a normal reference, with the mean taken over the 1501 CpGs. In more detail, given a set of normal reference DNAme profiles, we computed, for each of the 1501 CpGs, the mean β-value, μc, and standard deviation, σc, across the reference samples. For any given sample, s, we then defined the smoking index score, SI(s), as
where wc is +1 (−1) if the smoking-associated CpG, c, is hypermethylated (hypomethylated) in smokers and where βcs is the β-methylation value of the CpG c in sample s. In the formula, n is the number of the 1501 CpGs that have β-values in the given samples, and the summation is over all n of these smoking-associated CpGs. In the case of the buccal set cohort, the normal reference samples were those of the nonsmokers. When computing the smoking index in the cancer samples from a given cancer type, the normal reference was chosen as the normal samples (from nonsmokers if this information was available) from the corresponding tissue type. This is key, because this avoids confounding of the smoking index by methylation differences between tissue types (see eMethods in the Supplement for full details).
It is important to justify the use of SPY as the outcome of interest, and not the smoking status at sample collection. The latter would have provided a biased measure of the cumulative risk exposure of an individual, especially for ex-smokers. In fact, SPY anticorrelated significantly with the time between quitting and sample collection in ex-smokers (eFigure 2 in the Supplement). Moreover, we observed that specific genes such as AHRR exhibited a reversal of DNAme changes in ex-smokers who quit smoking at least 10 years before sample collection (eFigure 3 in the Supplement). Regression analysis using smoking status as a graded response variable (never-smokers, ex-smokers, smokers at sample draw) would have resulted in only 406 CpGs with P < 10−7 compared with 1501 when using SPY.
To assess whether smoking history affects the DNA methylome of buccal cells, we performed genome-wide DNAme analysis26 in a discovery set of 400 buccal samples from women all aged 53 years (Table 1), thus eliminating chronological age and sex as potential confounding sources of data variation.32 Singular value decomposition of the DNAme data matrix, encompassing 479 491 probes, revealed that the largest component of variation, accounting for approximately 55% of data variation, correlated with SPY, an epidemiological indicator of an individual’s smoking history (eFigure 4 in the Supplement). Other epidemiological variables, such as parity or body mass index, were not significantly correlated with smoking history (Table 1). None of these other variables showed a stronger association with the top principal component than SPY, supporting the view that most of the data variation is linked to smoking (eFigure 4 in the Supplement).
Using linear regression models, adjusted for bisulfite conversion efficiency, we identified 1501 CpGs whose DNAme β-values correlated with SPY, all passing a Bonferroni threshold of P < .05/479 491 (approximately 10−7) (Figure 1A and eTables 1 and 2 and eFigure 5 in the Supplement). Top-ranked CpGs were mostly hypomethylated in smokers (eTable 2 in the Supplement). Changes in DNAme were modest, with the top-ranked CpG (mapping to the gene AHRR) showing a 6% decrease for every 10 SPY (Figure 1B and Table 2). More than 73% of the 1501 smoking-associated CpGs were validated at the same Bonferroni significance level of 10−7 in an independent replication set of 390 buccal samples (Figure 1C and eTable 2 in the Supplement), with all sites exhibiting the same directional changes as in the discovery set (Figure 1D).
The top-ranked CpGs were all hypomethylated with increasing SPY and many mapped to genes previously identified in smoking EWASs conducted in whole-blood tissue (eg, AHRR, CYP1A1, F2RL3, PTK2, GNG12, GFI1)15,18,33-35 (Table 2), suggesting that much of the DNA hypomethylation is common to both tissue types. To investigate this further, we conducted a detailed comparison on a matched subset of 152 women for whom both a blood and buccal sample had been collected at age 53 years. Using only the matched samples from these 152 women, we derived smoking-associated DNAme signatures from the buccal and blood cells. Focusing on the top-ranked CpGs, this revealed consistency and broad agreement between the 2 tissue types, driven mainly by the commonly hypomethylated sites (Table 2 and eFigure 6 in the Supplement).
However, the analysis also revealed significantly many more associations in buccal cells (eFigure 6A in the Supplement). In the case of whole blood, only 38 CpGs passed a false discovery rate threshold of <.05, in contrast to more than 1500 (ie, a 40-fold increase) passing this same threshold in the 152 matched buccal samples. Thus, although the top-ranked CpGs, which were generally hypomethylated, agreed between the 2 tissue types, smoking was associated with a greater proportion of altered CpG sites in buccal cells.
To assess biological significance, we performed GSEA separately on the 912 hypermethylated and 589 hypomethylated sites of the 1501 differentially methylated CpGs (Figure 2). For the hypermethylated sites, the strongest enrichment was attained for genes bivalently marked in human embryonic stem cells, for binding sites of transcription factors implicated in chromatin organization and specification of stem cell identity (RAD21, CTCF, and EZH2),36-40 and finally also for genes hypermethylated in lung cancer,41 a cancer strongly linked to tobacco smoke exposure (Figure 2 and eTables 3-5 in the Supplement). The results of GSEA on the hypomethylated sites did not reveal a strong enrichment of bivalently marked genes but instead showed an enrichment of genes overexpressed in a poorly differentiated human papillomavirus–negative subtype of head and neck cancer,42 a cancer for which smoking is a main risk factor (Figure 2 and eTable 4 in the Supplement). Thus, the fact that the top-ranked enriched biological terms point toward smoking-related cancers strongly supports the biological relevance of our smoking DNAme–based signature.
Given the GSEA results, we reasoned that smoking-associated DNAme changes in buccal cells might be seen in epithelial cancers for which smoking is a potent risk factor. To investigate this and to further assess whether the changes are specific to smoke-related cancers, we collected DNAme data from 15 epithelial cancer types, profiled as part of the Cancer Genome Atlas, encompassing more than 5000 samples, some strongly linked to smoking (lung squamous cell carcinoma [LSCC] and lung adenocarcinoma [LUAD]), others for which smoking is a moderate risk factor (esophageal, head and neck, bladder), and others unrelated to smoking (endometrial and breast cancer).43,44
To be able to quantify the similarity of smoking-associated DNAme changes in buccal cells to those in cancer, we constructed a DNAme-based “smoking index,” computable for any given independent sample, from the 1501 smoking-associated CpGs of the discovery buccal set. To validate the smoking index, we verified that it correlated significantly (P < 10−10) with SPY in the independent replication buccal set (eFigure 7 in the Supplement).
Next, we computed the smoking index in the normal vs cancer data sets. This revealed higher values in cancer compared with its corresponding normal tissue, independently of tissue type (Figure 3A and eFigure 8 in the Supplement). Notably, the smoking index was highest for LSCC, a lung cancer with the highest proportion of smokers,45 followed by LUAD (Figure 3A). Importantly, the smoking index values were similar for 2 independent LUAD data sets, demonstrating the reproducibility of these scores (Figure 3A). All other cancer types (13 in total) exhibited significantly lower index values than the lung cancers (Wilcoxon P < .01), in line with smoking being a weaker risk factor for these other cancers (Figure 3A and eFigure 8 in the Supplement). Detailed receiver operating characteristic analyses showed that the smoking index was highly discriminative of normal vs cancer status in every tissue type (Figure 3B and eFigure 8 in the Supplement). Thus, these results indicate that smoking-associated DNAme changes in normal cells are not only present in smoking-associated cancers but also in those cancers for which smoking is not a risk factor, suggesting that other cancer risk factors may be causing similar epigenetic aberrations in normal cells. Similar results were obtained for the smoking signature derived from the 152 buccal samples (Figure 3B and eFigure 8 in the Supplement). In contrast, the signature derived from the matched 152 blood samples and a signature constructed by randomly selecting 1501 CpGs were significantly less accurate (Wilcoxon P = .001 [blood] and P < .001 [random]) (Figure 3B).
The GSEA results also suggested to us that the power of the smoking signature to discriminate cancer from normal tissue could be due to the hypermethylated component. Thus, to dissect the relative contribution of hypermethylation and hypomethylation to the smoking index, we recomputed the smoking index in the normal vs cancer sets using 4 different subsets of the 1501 differentially methylated CpGs: (1) all 912 hypermethylated CpGs, (2) all 589 hypomethylated CpGs, (3) the top 50 hypermethylated CpGs, and (4) the top 50 hypomethylated sites. Including a gene set based on the top 50 CpGs allowed us to assess the significance of the CpGs exhibiting the strongest effect sizes. By restricting to the hypermethylated subsets, we observed that the smoking index nearly always increased in cancer, providing high discriminatory power (eFigures 9 and 10 in the Supplement). In contrast, by restricting to the hypomethylated subsets, we found that the smoking index was less frequently discriminative of normal vs cancer status (eFigures 9 and 10 in the Supplement). Focusing on the top 50 hypomethylated CpGs (which included genes such as AHRR, CYP1A1), we found that the changes seen in cancer were often random, with some cancer tissues exhibiting directional changes exactly opposite to those seen in smoking (eFigures 9 and 10 in the Supplement).
The observed correlation between smoking-associated DNAme changes in buccal cells with those seen in epithelial cancers prompted us to explore whether these specific DNAme changes are a consequence of cancer or whether they represent an early event in carcinogenesis. Indeed, the observed overlap between smoking-associated and cancer-associated DNAme changes could be the result of the widespread DNAme changes caused by uncontrolled proliferation of cancer cells. To address this, we asked whether the smoking index is increased in preneoplastic lesions. To this end, we generated Illumina 450k DNAme data for a series of 8 endometrial hyperplasias and 33 endometrial cancers46 and analyzed these data jointly with the 46 normal endometrial tissue samples and 403 endometrial cancers from the Cancer Genome Atlas.47 We observed a significant increase in the smoking index between normal tissues and hyperplasia, with a further increase between hyperplasia and cancer (eFigure 11 in the Supplement). We also computed the smoking index in a series of 152 cytologically normal cervical smear samples at different risk of neoplastic transformation,8 revealing an increased smoking index in cells at higher risk of neoplastic transformation (eFigure 12 in the Supplement). Thus, the DNAme changes seen in normal buccal cells may indeed represent early events in carcinogenesis.
Given the particularly strong association of the smoking index with lung cancers, we next asked whether the 1501 CpGs from our smoking signature are informative of smoking-associated gene expression changes in the normal tissue that gives rise to lung cancer. Specifically, we analyzed gene expression data of 344 nontumor lung tissue samples from smokers and nonsmokers, all of whom developed lung cancer (Bossé et al).48 For this analysis, we focused on the subset of the 1501 smoking-associated CpGs, which mapped to within 200 bp of the transcription start sites of genes profiled in Bossé et al.48 We observed that CpGs hypermethylated in buccal cells of smokers exhibited lower levels of expression in the normal lung tissue of patients who were smokers compared with those of patients who did not smoke, whereas hypomethylated CpGs showed significantly higher expression (eFigure 13 and eTable 6 in the Supplement). Importantly, these results were validated in 2 independent but equivalent cohorts totaling 509 (285 + 224) samples (eFigure 13 and eTable 6 in the Supplement). This supports the view that part of the smoking-associated DNAme changes identified in buccal cells, if also present in nontumor lung tissue of smokers, may represent functional changes in a tissue relevant to the development of lung cancer.
Given the fact that the smoking index is able to discriminate between normal tissue and cancer irrespective of the tissue analyzed, we wanted to test whether this signature is also able to predict the fate of lesions that originate from the same organ. Despite only analyzing 24 samples, we found that the smoking index was able to identify preinvasive lung lesions that subsequently progress to an invasive lung cancer with a high sensitivity and specificity (area under the curve, 0.88 [95% CI, 0.76-1.00]; P = .001) (Figure 3C and D).
The EWAS presented here has demonstrated the suitability of using buccal cells to examine the effects of smoking on the epigenome. Specifically, our key novel findings are as follows: (1) Smoking is associated with widespread changes in the DNAme landscape of buccal cells, in contrast to blood cells, supporting the view that buccal tissue is a more appropriate source to examine the effects of smoking. (2) Nevertheless, the top-ranked CpG sites, which were overwhelmingly hypomethylated in smokers and which mapped to genes such as AHRR, CYP1A1, and CYP1B1 (all involved in toxin response pathways), were common to buccal and blood tissue. (3) Smoking-associated DNAme changes in buccal cells, in particular, hypermethylation of bivalent marks and binding sites of transcription factors implicated in DNA repair (RAD21) and chromatin architecture (CTCF), correlated with DNAme changes in epithelial cancers and did so most strongly in smoke-related cancer, notably lung cancer (eFigures 14 and 15 in the Supplement). This suggests that smoking is associated with disruption of RAD21 binding and hence DNA repair deficiency, which in turn has been associated with increased lung cancer risk.49 (4) The smoking DNAme signature correlated with an increased risk that a preinvasive lung lesion will progress to lung cancer. This not only opens a new window of opportunities in personalized medicine but also provides additional evidence that alteration of the epigenome is an important early step in cancer development.
One of the most intriguing observations of our analysis is the increased smoking index observed in all epithelial cancers relative to their normal tissue, although in line with our expectations, this index was highest for lung cancers. We note, however, that the correlation with cancer was driven by hypermethylation of sites that are strongly enriched for bivalently marked genes in human embryonic stem cells, which in turn is a common feature of DNAme signatures associated with other cancer risk factors (including aging) and with cancer itself.10,12,14,32,50-52 We stress that hypomethylated CpG sites could not consistently discriminate normal from cancer tissue, unless we specifically focused on genes previously implicated in cancer, for instance those observed to be overexpressed in human papillomavirus–negative head and neck cancer (a cancer for which smoking is a risk factor). It would thus appear that most of the DNA hypomethylation seen in the buccal and blood cell epigenome of smokers is unrelated to cancer etiology. This is not inconsistent with the observation that global DNA hypomethylation is seen in cancer and preneoplastic lesions,10 because this widespread hypomethylation has so far only been seen in cells that have already undergone morphological transformation. In contrast, our buccal DNAme signature was derived from entirely normal cells exposed to different levels of a carcinogen, so the DNA hypomethylation that we observe in these normal cells could reflect a different underlying mechanism such as a specific response to smoke toxins.
We end by discussing 2 potential limitations of this study. First, the signature was derived from women only. Thus, whether the smoking DNAme signature would change substantially had we used a male population is unclear. Previous EWASs in blood suggest, however, that most smoking-related changes are independent of sex.15,18 A second limitation is that we did not analyze any functional or expression data in the same buccal samples. However, we did analyze gene expression data from nontumor lung tissue, comparing expression levels of genes implicated by our DNAme signature between smokers and nonsmokers, providing evidence that specific DNAme changes in a relevant cell of origin may indeed be functional.
This study has demonstrated that smoking is associated with a widespread alteration of the DNAme landscape in buccal cells but not so in blood tissue. The DNAme alterations seen in buccal tissue may be important for the etiology of specific epithelial cancers and the fate of preinvasive lesions.
Accepted for Publication: March 23, 2015.
Corresponding Authors: Martin Widschwendter, MD, Department of Women’s Cancer, EGA Institute for Women’s Health, University College London, 74 Huntley St, Rm 340, London WC1E 6AU, England (firstname.lastname@example.org) and Andrew E. Teschendorff, PhD, Statistical Cancer Genomics, UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, England (email@example.com).
Published Online: May 14, 2015. doi:10.1001/jamaoncol.2015.1053.
Author Contributions: Drs Teschendorff and Wong had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Drs Teschendorff, Yang, and Wong served as co–first authors, each with equal contribution to the manuscript.
Study concept and design: Teschendorff, Widschwendter.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Teschendorff, Widschwendter.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Teschendorff, Yang, Pipinikas, Jiao, Anjum.
Obtained funding: Teschendorff, Kuh, Janes, Widschwendter.
Administrative, technical, or material support: Wong, Pipinikas, Jones, Salvesen, Janes, Widschwendter.
Study supervision: Teschendorff, Thirlwell, Janes, Widschwendter.
Conflict of Interest Disclosures: None reported.
Funding/Support: This work was funded by the Eve Appeal (https://www.eveappeal.org.uk/) and the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement 305428 (Project EpiFemCare) and was done at University College London Hospital/University College London, which received a proportion of its funding from the Department of Health National Institute for Health Research Biomedical Research Centres funding scheme. Drs Teschendorff and Yang and Mr Jiao are supported by the Chinese Academy of Sciences, the Shanghai Institute for Biological Sciences, and the Max-Planck-Gesellschaft. Drs Wong, Hardy, and Kuh are supported by the Medical Research Council (MC_UU_12019/1). Dr Janes is a Wellcome Trust Senior Fellow in Clinical Science (WT091730MA) and Drs Pipinikas and Janes were supported by the Rosetrees Trust, Roy Castle Lung Cancer Foundation, and University College London Hospital Charitable Foundation. Dr Thirlwell is a Cancer Research UK funded Clinician Scientist. Dr Widschwendter was supported by the Eve Appeal and the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement 305428 (Project EpiFemCare).
Role of the Funder/Sponsor: The funding bodies had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Correction: This article was corrected online May 21, 2015, for an error in the abstract Results.