The model can be found at http://www.adjuvantonline.com. The numbers at risk refer to number of patients at risk entering each 25-month interval. Low (<25), intermediate (26-50), and high (>50) risk categories are based on prespecified risk scores generated using a computerized clinicopathological prognostic model based on age, comorbidities, estrogen receptor status, tumor grade, tumor size, and lymph node status.
D1 indicates initial discovery data set. Columns on heatmap represent patient samples. Rows are representative pathways that were evaluated for deregulation: WH, wound healing; TNF, tumor necrosis factor; IGS, invasiveness gene signature; EPI, epigenetic stem cell; CIN, chromosomal instability. Cluster assignments are based on the dominant expression pattern, where ≥50% of the samples express the phenotype (pathways activated) that drives the cluster formation. Pairwise cluster comparisons using Kaplan-Meier survival plots yield the associated statistical significance (log-rank P ≤ .05) of the clusters with unique biology associated with them in terms of pathway activation. Statistically nonsignificant comparisons of prognostic clusters (log-rank P = .05) are not reported. Clusters 1, 4, and 5 have prognostic significance. Clusters 1 and 5 represent patients with intermediate and good prognosis, respectively, and cluster 4 represents patients with the worst prognosis.
D1 indicates initial discovery data set. Columns on heatmap represent patient samples. Rows are representative pathways that were evaluated for deregulation: WH, wound healing; TNF, tumor necrosis factor; IGS, invasiveness gene signature; EPI, epigenetic stem cell; CIN, chromosomal instability. Cluster assignments are based on the dominant expression pattern, where ≥50% of the samples express the phenotype (pathways activated) that drives the cluster formation. Pairwise cluster comparisons using Kaplan-Meier survival plots yield the associated statistical significance (log-rank P ≤ .05) of the clusters with unique biology associated with them in terms of pathway activation. Statistically nonsignificant comparisons of prognostic clusters (log-rank P = .05) are not reported. Clusters 2 and 3 have prognostic significance. Cluster 2 represents patients with good prognosis and cluster 3 represents patients with poor prognosis.
D1 indicates initial discovery data set. Columns on heatmap represent patient samples. Rows are representative pathways that were evaluated for deregulation: WH, wound healing; TNF, tumor necrosis factor; IGS, invasiveness gene signature; EPI, epigenetic stem cell; CIN, chromosomal instability. Cluster assignments are based on the dominant expression pattern, where ≥50% of the samples express the phenotype (pathways activated) that drives the cluster formation. Pairwise cluster comparisons using Kaplan-Meier survival plots yield the associated statistical significance (log-rank P ≤ .05) of the clusters with unique biology associated with them in terms of pathway activation. Statistically nonsignificant comparisons of prognostic clusters (log-rank P = .05) are not reported. Clusters 1, 4, and 5 have prognostic significance. Clusters 1 and 5 represent patients with good to intermediate prognosis, and cluster 4 represents patients with poor prognosis.
D2 indicates validation data set. Columns on heatmap represent patient samples. Rows are representative pathways that were evaluated for deregulation: WH, wound healing; TNF, tumor necrosis factor; IGS, invasiveness gene signature; EPI, epigenetic stem cell; CIN, chromosomal instability. Low-risk cohort: patterns of oncogenic pathway activation and tumor biology/microenvironment deregulation are shown as clusters (clusters 1 and 2) representing prognostic subphenotypes of the low-risk cohort illustrating that the patterns of pathway activation identified in the initial discovery data set (D1) are reproducible in D2. In this case, Kaplan-Meier survival analysis illustrates prognostic clusters (clusters 1 and 2), with No. of patients at risk reported at 25-month intervals of follow-up. Intermediate-risk cohort: Kaplan-Meier survival analysis illustrates prognostic clusters (clusters 1 and 5), along with their respective patterns of oncogenic pathway and tumor biology/microenvironment deregulation shown as a heatmap. High-risk cohort: Kaplan-Meier survival analysis demonstrates the prognostic significance of cluster 1 and cluster 3 along with their patterns of oncogenic pathway and tumor biology/microenvironment deregulation as a heatmap. Patterns observed in D2 are identical to the patterns of pathway activation observed in the prognostic clusters (within the high-risk cohort) identified in D1. Red color in the heatmaps indicates a high probability of deregulation and blue indicates a low probability of deregulation. The cluster numbers in D2 are not the same expression patterns as clusters defined in D1.
Scatterplots depict classification of prognostic clusters based on the first 3 principal components (based on principal component analysis, the results of which contribute to most of the variation in a data set) to demonstrate that the poor prognostic clusters in D1 (red dots) and D2 (blue dots) are similar, while also being clearly distinct from the good prognostic cluster in D1 (light blue dots). Each dot represents a specific sample in that cluster with respect to the first 3 principal components. Each axis of the 3-dimensional scatterplot represents the amount of variance as represented by that principal component. The first 3 principal components present the maximum amount of variation in a data set.
In each instance, chemosensitivity predictions were plotted such that a high probability of sensitivity (or response) is indicated by red and a low probability of sensitivity (or resistance) is indicated by blue. Cluster designations indicate previously determined clusters of pathway
patterns (see Figures 2, 3, and 4). Relative sensitivity patterns to cytotoxic agents used in breast cancer were identified from Kruskal-Wallis 1-way analysis of variance followed by Dunn's posttest performed on the statistically significant prognostic clusters; only results from Dunn’s posttest at P <.05 are shown here. The chemosensitivity patterns as observed in the heatmap are used to determine whether the statistically significant (P <.05) prognostic clusters are resistant or sensitive to that particular cytotoxic agent.
Acharya CR, Hsu DS, Anders CK, Anguiano A, Salter KH, Walters KS, Redman RC, Tuchman SA, Moylan CA, Mukherjee S, Barry WT, Dressman HK, Ginsburg GS, Marcom KP, Garman KS, Lyman GH, Nevins JR, Potti A. Gene Expression Signatures, Clinicopathological Features, and Individualized Therapy in Breast Cancer. JAMA. 2008;299(13):1574-1587. doi:10.1001/jama.299.13.1574
Author Affiliations: Duke Institute for Genome Sciences and Policy (Drs Hsu, Anguiano, Redman, Tuchman, Moylan, Mukherjee, Barry, Dressman, Ginsburg, Garman, Nevins, and Potti, Mr Acharya, and Mss Salter and Walters) and Institute for Statistics and Decision Sciences (Drs Mukherjee and Barry), Duke University; and Department of Medicine, Duke University Medical Center (Drs Hsu, Anders, Anguiano, Redman, Tuchman, Moylan, Marcom, Garman, Lyman, and Potti), Durham, North Carolina.
Context Gene expression profiling may be useful for prognostic and therapeutic strategies in breast carcinoma.
Objectives To demonstrate the value in integrating genomic information with clinical and pathological risk factors, to refine prognosis, and to improve therapeutic strategies for early stage breast cancer.
Design, Setting, and Patients Retrospective study of patients with early stage breast carcinoma who were candidates for adjuvant chemotherapy; 964 clinically annotated breast tumor samples (573 in the initial discovery set and 391 in the validation cohort) with corresponding microarray data were used. All patients were assigned relapse risk scores based on their respective clinicopathological features. Signatures representing oncogenic pathway activation and tumor biology/microenvironment status were applied to these samples to obtain patterns of deregulation that correspond with relapse risk scores to refine prognosis with the clinicopathological prognostic model alone. Predictors of chemotherapeutic response were also applied to further characterize clinically relevant heterogeneity in early stage breast cancer.
Main Outcome Measures Gene expression signatures and clinicopathological variables in early stage breast cancer to determine a refined estimation of relapse-free survival and sensitivity to chemotherapy.
Results In the initial data set of 573 patients, prognostically significant clusters representing patterns of oncogenic pathway activation and tumor biology/microenvironment states were identified within the low-risk (log-rank P = .004), intermediate-risk (log-rank P = .01), and high-risk (log-rank P = .003) model cohorts, representing clinically important genomic subphenotypes of breast cancer. As an example, in the low-risk cohort, of 6 prognostically significant clusters, patients in cluster 4 had an inferior relapse-free survival vs patients in cluster 1 (log-rank P = .004) and cluster 5 (log-rank P = .03). Median relapse-free survival for patients in cluster 4 was 16 months less than for patients in cluster 1 (95% CI, 7.5-24.5 months) and 19 months less than for patients in cluster 5 (95% CI, 10.5-27.5 months). Multivariate analyses confirmed the independent prognostic value of the genomic clusters (low risk, P = .05; high risk, P = .02). The reproducibility and validity of these patterns of pathway deregulation in predicting relapse risk was established using related but not identical clusters in the independent validation cohort. The prognostic clinicogenomic clusters also have unique sensitivity patterns to commonly used cytotoxic therapies.
Conclusions These results provide preliminary evidence that incorporation of gene expression signatures into clinical risk stratification can refine prognosis. Prospective studies are needed to determine the value of this approach for individualizing therapeutic strategies.
Cancer prognosis, including for breast cancer, is largely driven by the assessment of key clinical characteristics, including tumor size, nodal involvement, and the extent of metastatic spread. These are generally combined to categorize a patient in a clinical stage, which then defines the prognosis. Drawing on information from the Surveillance, Epidemiology, and End Results database, and the results of various individual clinical trials as well as the published literature, Ravdin et al1 developed a novel computerized system called Adjuvant! to provide an evidence-based tool that can facilitate clinical decisions based on relative risk and potential benefit from adjuvant therapy. This approach has become a widely used strategy2 in the assessment of risk of disease relapse and estimating the benefit from adjuvant therapy in patients with early stage breast cancer. In a clinical study involving 4083 patients, Adjuvant! Online (the clinicopathological prognostic model) has been shown to overestimate disease recurrence by as much as 14% in younger patients, although predicted outcomes are within 5% of the actual event-free survival for other populations.3
The advent of genomic technology for the analysis of human tumor samples has now added an additional source of information to aid prognosis and clinical decisions.4,5 In particular, the development of genomic profiles that accurately assess risk of recurrence offers the hope that this information will more precisely define clinical outcomes in breast cancer. The dimension and complexity of such data provide an opportunity to uncover clinically valid trends that can distinguish subtle phenotypes in ways that traditional methods cannot.6- 10
Recently, many studies have used newer concepts of cancer biology to predict clinical outcomes in breast cancer.11 For example, a growing awareness of the important role of cancer-associated fibroblasts in tumor progression yielded the possibility that the gene expression pattern of serum-stimulated fibroblasts in culture might be related to the gene expression pattern of fibroblasts in wounded tissues, including tissue wounded by an invasive tumor.12 Specifically, patients who have breast tumors that express such a “wound response or wound healing” signature have a poor clinical outcome.12 Furthermore, aggressive tumors, which are more likely to have functional aneuploidy, resulted in the development of a “chromosomal instability” signature that is associated with poor outcome in many cancers.13 Similarly, the tantalizing aspect of the role of pluripotent cells in cancer initiation and progression led to the description of the invasiveness gene signature (IGS) in breast cancer.8 Finally, risk stratification of various tumors, including breast cancers, based on the gene signatures of deregulated signaling pathways were recently reported.6 The association between an aggressive form of breast cancer and markers of tumor biology and the microenvironment (chromosomal instability, wounded stroma, or invasiveness) and likewise the presence of certain hyperactive pathways suggests that cancer gene signatures may reflect specific biological traits of genetically unstable cancer cell populations that can evolve into increasingly aggressive entities under environmental pressures to which they adapt.14
The clinical value of these studies is to identify disease subtypes that represent distinct subphenotypes of breast carcinoma in order to better approach opportunities for individualized therapeutics. However, despite these advances, few studies have attempted to demonstrate the value in integrating genomic information with the traditional clinical risk factors to provide a more detailed assessment of clinical risk15 and an improved prediction of response to therapy.16
The results we present herein extend the application of gene expression profiling several important steps, by biologically dissecting a commonly used clinical prognostic classification in early stage breast cancer.1 We also present data that provide useful clinical insights into matching cytotoxic and targeted therapeutic strategies in individual risk cohorts of patients with early stage breast cancer.
A total of 964 clinically annotated breast tumor samples from 5 data sets (Gene Expression Omnibus [GEO] microarray data repository at http://www.ncbi.nlm.nih.gov/geo/), GSE31436 (CODEX), GSE2034,17GSE4922,18GSE6532,19 and GSE7849,20 were identified and chosen for our analysis. Data sets were selected based on the availability of clinically annotated data (Affymetrix Human Genome U133A, U133 2.0 plus, or U95 Av2) from early stage breast tumors.
Data sets GSE2034 (n = 247), GSE4922 (n = 105), GSE3143
(n = 161), and GSE7849 (n = 60) constituted the initial discovery data set (D1, n = 573). For samples included in GSE7849 (n = 60), total RNA was extracted from the tumor tissue with RNeasy kits (Qiagen, Valencia, California). The RNA quality was assessed with the use of a bioanalyzer (Agilent 2100 model; Agilent Technologies Inc, Santa Clara, California). Hybridization targets were prepared from the total RNA according to standard Affymetrix protocols. All patients were enrolled according to protocols approved by the institutional review board of Duke University, after written informed consent was obtained, and all data are reported according to Minimum Information About Microarray Experiment guidelines, a standard method of reporting data from microarray experiments that is intended to specify all the necessary information for easy and unambiguous interpretation of the results.21 Data
set GSE6532 (n = 391) constituted the validation data set (D2). The clinical and demographic data for patients included in D1 and D2 are shown in Table 1 and Table 2. All patients in each of the data sets included in the study were followed up for an average of more than 11 years.
Both D1 and D2 were classified into 3 prespecified relapse risk categories (low, intermediate, and high), based on the recurrence scores generated by using the Standard Version (Clinical) 8.0 Adjuvant! Online (http://www.adjuvantonline.com) strategy, which uses age, comorbidities, estrogen receptor status, tumor grade, tumor size, and lymph node status as covariates. Low-risk patients (182 in D1 and 117 in D2) were defined as those patients with a relapse risk score of 25 or less, intermediate-risk patients (248 in D1 and 178 in D2) have a score between 26 and 50, and high-risk patients (143 in D1 and 96 in D2) have a score of more than 50. The risk score cutoffs were prespecified and were chosen to classify the data set into almost equal tertiles. However, to rule out the possibility that our results were dependent on the choice of the clinical cutoff, a sensitivity analysis was performed in which the cutoffs were varied between 15% and 75%, without distinction in results.
Signatures of oncogenic pathway activation, tumor biology/microenvironment status, and chemosensitivity were applied to each of the low-, intermediate-, and high-risk cohorts of patients to identify prognostic subclusters of patients within each risk category.9,16,22 Gene expression signatures represent a biological state in the form of a pattern of gene expression that is unique to that specific phenotype of disease.
An in-house program, Chip Comparer (http://tenero.duhs.duke.edu/genearray/perl/chip/chipcomparer.pl), was used to map probesets across various generations and platforms of Affymetrix GeneChip arrays.
To reduce the likelihood of batch effect, a normalizing algorithm, ComBat23 (http://statistics.byu.edu/johnson/ComBat/) was applied to D1 before performing any analysis. When combining data sets from different platforms and different experiments, nonbiological experimental variation or batch effects are most commonly faced by researchers. It is inappropriate to combine data sets without adjusting for batch effects. The ComBat method applies either parametric or nonparametric empirical Bayes framework for adjusting data for batch effects that is robust to outliers in a given data set.
After appropriate preprocessing to reduce batch variations across data sets, in vitro signatures of altered tumor microenvironment states, along with oncogenic pathway deregulation and chemotherapy response signatures were applied to the discovery (D1) and validation (D2) cohorts.6,22 Briefly, genes that defined signatures of chromosomal instability,13 wound healing,24 IGS,8 epigenetic stem cell,25 and tumor necrosis factor α (TNF-α)26 were first obtained. Chip Comparer was used to consolidate all the signatures to the Affymetrix HG-U133A platform. Expression values constituting specific genes of each specific tumor microenvironment signature were extracted from a data set of 30 human cancer cell lines27 using an in-house program FileMerger (http://tenero.duhs.duke.edu/genearray/perl/filemerger.pl) based on shared identifiers. An initial discovery set representing the biological phenotypes was first developed by a simple unsupervised hierarchical clustering of the gene expression data that constitutes a signature using R/Bioconductor statistical package version 188.8.131.52,29 An unsupervised method is a form of gene expression analysis that involves detection of empirical structures (patterns) in a given microarray data set with no prior knowledge of the underlying biology. Hierarchical clustering is an unsupervised approach to identify classes of biological states, including clinical states that often were not previously recognized. Class labels (0 or 1) were assigned to those classes that were placed into 2 distinct clusters. Using Bayesian binary regression methodologies previously described,22,30,31 predictors for each of the aforementioned tumor microenvironment signatures as well as previously described oncogenic pathways were developed. Predictions of relative oncogenic pathway deregulation and tumor microenvironment produced estimated relative probabilities, which measure the uncertainty of deregulation or expression across the validation samples. A similar strategy was also implemented to predict the sensitivity of the tumor samples to commonly used chemotherapeutic agents (Adriamycin, docetaxel, paclitaxel, topotecan, etoposide, fluorouracil, and cyclophosphamide), using models of chemosensitivity.22
Hierarchical clustering of tumor predictions was performed by using R/Bioconductor statistical packages.28,29 The predictive probability values of the patient tumor samples and their associated oncogenic pathway deregulation and tumor microenvironment status were clustered together by using complete linkage clustering with the Euclidean distance metric, which will bring together objects whose absolute expressions are similar.32 This process was repeated on patient samples in all 3 risk categories for both D1 and D2 data sets. Using the clustered order of the patients from all 3 recurrence risk categories (based on the clinicopathological prognostic model), heatmaps were regenerated by using HeatmapViewer module of GenePattern version 2.0.33 A heatmap is a tool for visualizing patterns in microarray data. Gene expression patterns are best visualized through a heatmap, with red representing a higher differential expression and green/blue representing a lower differential expression of a gene across a data set.
Standard Kaplan-Meier survival curves and their significance levels were generated for clusters with similar patterns of oncogenic pathway deregulation and tumor microenvironment status using GraphPad version 4.03 for Windows (GraphPad Software, San Diego, California; http://www.graphpad.com). Statistical significance of the prognostic clusters is determined from pair-wise comparisons by using Kaplan-Meier survival plots. A prognostically significant result is defined by log-rank P ≤ .05. All statistically insignificant comparisons (with P = .05) were not reported. Finally, Kruskal-Wallis 1-way analysis of variance, a nonparametric method, was applied to analyses relevant to chemosensitivity patterns unique to subphenotypes or clusters followed by Dunn's posttest using GraphPad software.
In an effort to fully understand the prognostic significance of clusters identified through patterns of pathway deregulation within the low-, intermediate-, and high-risk cohorts, univariate and multivariate analyses were performed with the use of the Cox proportional hazards regression model. Analyses were performed separately on the 3 risk groups identified in D1. Only factors that were significant in univariate analyses were used in the multivariate models. Multivariate models included continuous covariates for age, tumor size, histologic grade, and dichotomous covariates for lymph node status and estrogen receptor positive/negative tumors. Missing values (estrogen receptor status, 7 [1.7%] and histologic grade, 61 [16%]) led to patients being excluded from the analysis of the validation cohort. No adjustment for multiple testing was necessary. Hazard ratios and 95% confidence intervals (CIs) were reported with respect to the cluster with the worst survival, and identified by Kaplan-Meier survival plots. In all 3 risk cohorts (low, intermediate, and high), based on the clinicopathological model, the reference group for further analysis was consistently made up of patients in the cluster with inferior prognosis. P values were based on likelihood ratio tests, and analyses were performed by using the statistical package R.29
All patients were stratified into 3 prespecified risk categories (low, intermediate, and high) based on relapse risk scores generated by using the clinicopathological prognostic model. To ascertain the effect that selection of a particular relapse risk score cutoff (ie, score of 25 or 50) would influence the results of survival analysis by gene signatures, a sensitivity analysis was performed and the cutoffs used to define low-, intermediate-, and high-risk categories were varied sequentially. The resulting risk groups were then used to determine the effect or loss thereof on the survival patterns of patients in D1. Regardless of the cutoff chosen, time to relapse was statistically significant (P < .001) between low-, intermediate-, and high-risk groups. Finally, results from previous studies would also suggest that the ascertainment of specific risk categories is independent of the choice of clinical cutoff.34 Based on these findings, we have used cutoffs that divide cohorts into 3 clinically relevant, almost equal tertiles to identify patients in the low-, intermediate-, and high-risk categories for the purposes of testing the hypothesis that incorporating gene signatures into currently used clinicopathological risk stratification systems in early stage breast cancer might further improve prognostic and predictive abilities.
Among the 573 patients identified within D1, there was a statistically significant difference in relapse-free survival as indicated by risk stratification via the clinicopathological prognostic model (P < .001) (Figure 1). Within each risk category defined by this clinicopathological prognostic model, genomic signatures, including both oncogenic pathway and tumor biology/microenvironment signatures, revealed subclusters holding prognostic significance. The subclusters in each risk category were identified based on the branching points of a dendrogram obtained after the hierarchical clustering of the patient samples. Each subcluster thus identified is associated with specific patterns of pathway activation. Statistical significance for thus identified clusters is determined by Kaplan-Meier survival curves. Only clusters with a statistically significant log-rank P ≤ .05 are reported while those that are not statistically significant (log-rank P = .05) are not. Specifically, patterns defined by the expression signatures among patients classified as low risk revealed 5 distinct clusters, each holding unique prognostic significance (log-rank P < .02) (Figure 2). Among patients deemed low risk for recurrence by traditional measures, signatures of oncogenic pathway activation and altered tumor microenvironment provide a mechanism to refine the prognosis by identifying patients with a poorer prognosis than that suggested by clinical data alone. Breast tumors identified within cluster 4 were associated with an inferior relapse-free survival compared with other subclusters (cluster 1 [P = .004] and cluster 5 [P = .03]). The median relapse-free survival time for patients in cluster 4 was 16 months less than that for patients in cluster 1 (95% CI, 7.5-24.5 months) and 19 months less than that for patients in cluster 5 (95% CI, 10.5-27.5 months).
Gene expression patterns not only provide a basis for dissecting the heterogeneity within the subgroups defined by conventional clinicopathological models, they also provide important information regarding tumor biology. For instance, the poor prognosis cluster (cluster 4) is characterized by a high probability of MYC pathway deregulation with a concurrent low probability of E2F transcription factor 1 (E2F1) and CTNNB (β-catenin) deregulation. Additionally, breast tumors in this poor prognosis cluster illustrate features of a very aggressive phenotype due to the observed high incidence of chromosomal instability, IGS (a signature of tumor invasiveness), and wound healing activation. Conversely, the good prognosis cluster (cluster 1) exhibited the inverse pattern, low chromosomal instability, wound healing, and IGS, but activation of CTNNB, SRC, and RAS.
Similar to findings among patients in the low-risk category, distinct patterns of oncogenic pathway and tumor biology/microenvironment deregulation were identified among patients classified as intermediate risk. Six main clusters holding prognostic significance were identified (Figure 3). This analysis revealed patient cohorts represented by the extremes of prognosis; a good prognosis (cluster 2, SRC, RAS, epigenetic stem cell, and CTNNB pathway activation) and poor prognosis (cluster 3, wound healing, chromosomal instability, IGS, and epigenetic stem cell activation) subgroup of patients (log-rank P = .01) were identified within the intermediate-risk cohort. The activity of the tumor biology and microenvironment pathways, including the wound healing, TNF-α, IGS, epigenetic stem cell, and chromosomal instability signatures, appear to be driving the aggressive nature of breast tumors classified as poor prognosis (cluster 3). The median relapse-free survival time for patients in cluster 3 was 54 months less than that for patients in cluster 2 (95% CI, 41.48-66.52).
Among patients classified as high risk based on the clinicopathological risk stratification model, hierarchical clustering of the gene signatures once again revealed several distinct cohorts of patients demonstrating prognostic differences, as driven by oncogenic pathway and tumor biology/microenvironment signatures (P = .04) (Figure 4). Guided by genomic differences, patients identified within cluster 4 illustrate inferior relapse-free survival compared with patients identified within other genomic clusters (cluster 1, P = .003; and cluster 5, P = .01). The median relapse-free survival time for patients in cluster 4 was 15 months less than that for patients in cluster 1 (95% CI, 8.1-21.9) and 8 months less than that for patients in cluster 5 (95% CI, 0.33-16.33). Tumor characteristics specific to patients in the poor prognosis cluster (cluster 4) include a high probability of SRC, RAS, and CTNNB pathway deregulation with concurrent high expression of the TNF-α tumor microenvironment signature. In contrast with patients with poor prognosis (cluster 4), patients classified within the good prognosis clusters (cluster 1 and cluster 5) illustrate low expression of the TNF-α tumor microenvironment signature, but high probabilities of wound healing, chromosomal instability, epigenetic stem cell, phosphatidylinositol-3-kinase (PI3K) (cluster 1), and SRC, RAS, and CTNNB deregulation (cluster 5).
As further confirmation that specific patterns of oncogenic pathway deregulation and tumor biology/microenvironment status complement the clinicopathological prognostic model's risk classification, we performed multivariate analyses by using a Cox proportional hazards regression model. As shown in Table 3, in the high-risk cohort (cluster 4), with activated TNF-α, SRC, RAS, and CTNNB pathways, was identified as an independent poor prognostic variable (P = .02). Similar analyses confirmed the independent prognostic value of pathway patterns in the low-risk cohort as well (P = .05) (Table 3). These findings demonstrate that patterns of pathway coactivation have prognostic implications independent of traditional prognostic criteria such as tumor size, receptor status, lymph node status, or histologic grade.
In addition to identifying prognostically significant subclusters of patients based on pathway deregulation within each risk category in D1 and confirming their independent prognostic significance in multivariate analyses, the results observed in D1 were further validated in a large independent data set (GSE6352, D2) (Table 4). Using precisely the same criteria with regards to patterns of pathway activation as those observed in D1, prognostically distinct cohorts were also observed in D2 in each of the risk categories (low, intermediate, and high) that were identified based on the prespecified patterns of pathway and tumor biology/microenvironment activation observed earlier in D1. As an example, patients from D2 with a low-risk score, as estimated by the clinicopathological prognostic model, were further stratified into 2 distinct prognostic subcohorts. Tumors showing activation of wound healing, IGS, chromosomal instability, and MYC pathways were likely to have a worse outcome, which mirrors the observation observed in D1 (Figure 5). These findings of coactivated signaling pathway and tumor biology/microenvironment patterns defining distinct prognostic states in all 3 risk categories (low-, intermediate-, and high-risk cohorts) of D2 independently validate the results observed in D1 (Figure 5 and Table 4). Because the validation data set is completely independent of the initial data set, the cluster assignments (by number) are different, although patterns of pathway activation are similar between clusters from the initial discovery data set and the validation cohort. To further establish the molecular similarity between the poor prognostic clusters in D1 and D2, the extent to which the implicated clusters from both data sets (D1 and D2) overlap was evaluated by using principal component analysis. Principal component analysis is an unsupervised technique that is used for dimensional reduction of the data sets by retaining those characteristics of the data set that contribute to most of its variance. Principal component analysis results in principal components (dominant expression patterns in a data) that contribute to most of the variation in a data set. They are then projected on a scatterplot with each axis representing variation due to each principal component. The first 3 principal components normally account for the most variation in the data set. In this context, principal component analysis was performed as a surrogate to predicting cluster membership and showed that the molecular traits of patients in the poor prognostic clusters were highly specific and distinct from those of the good prognostic clusters. In other words, poor prognostic clusters from both D1 and D2 overlap and form a distinct group compared with good prognostic clusters (Figure 6). Furthermore, a pooled analysis of samples from D1 and D2 by risk categories demonstrates that the poor and good prognosis patients cluster together irrespective of whether the samples were part of the initial discovery data (D1, in this case) or the independent validation (D2, in this case). Although there are several previously described independent prognostic variables (including other described genomic classifiers), an improved prognosis can be obtained by integrating knowledge of prognostically relevant clinicopathological variables and their corresponding patterns of gene expression.
We made use of a series of gene expression signatures developed to predict sensitivity to various cytotoxic therapies22 to identify patterns of chemotherapy sensitivity within the distinct clusters defined by unique patterns of oncogenic pathway and tumor microenvironment deregulation (Figures 2, 3, and 4), within the low- (n = 182), intermediate- (n = 248), and high-risk (n = 143) cohort of patients. Figure 7 shows in the low-risk cohort that patients with the worst prognosis (cluster 4) with activated wound healing, chromosomal instability, IGS, and MYC were predicted to be resistant to adriamycin and paclitaxel and sensitive to docetaxel, etoposide, and topotecan. In contrast, patients within the good prognosis cluster (cluster 5) with activated TNF-α, SRC, RAS, and CTNNB were predicted to be resistant to docetaxel, etoposide, and topotecan and sensitive to fluorouracil, adriamycin, cyclophosphamide (cytoxan), and paclitaxel, while patients within cluster 5 with activated chromosomal instability, SRC, MYC, and CTNNB were predicted to be resistant to fluorouracil and cyclophosphamide and sensitive to topotecan (Figure 7). Similarly, among patients identified within the intermediate-risk group, differences in chemotherapy sensitivity were identified between the poor prognosis cohort (cluster 3 with wound healing, chromosomal instability, IGS, and epigenetic stem cell pathway activation) and the good prognosis cohort (cluster 2 with SRC, RAS, epigenetic stem cell, and CTNNB pathway activation). Patients in cluster 3 were more likely to be resistant to docetaxel and topotecan, and patients in cluster 2 were more likely to be sensitive to both docetaxel and topotecan (Figure 7).
Among patients deemed to be high risk by the clinicopathological model, differences in chemotherapy sensitivity were also identified. When comparing patients at the extremes of prognosis, patients in cluster 4 (poor prognosis) with activated TNF-α, SRC, RAS, and CTNNB were predicted to be sensitive to fluorouracil and resistant to docetaxel compared with patients in cluster 1 (good prognosis) with wound healing, chromosomal instability, epigenetic stem cell, and PI3K pathway activation. Conversely, patients in cluster 1 were predicted to be sensitive to docetaxel and resistant to fluorouracil compared with cluster 4 (Figure 7). Taken together, these data further emphasize the heterogeneity within each of the prognostic subclasses of breast cancer, while also suggesting opportunities for therapeutic strategies that could potentially improve patient outcomes.
In all cancer, including early stage breast cancer, the challenge of matching the right therapeutic regimen with the right patient is often limited by our inability to dissect the heterogeneity within these populations of patients. For example, although adjuvant chemotherapy/radiotherapy has been shown to be effective in reducing the risk of metastatic disease, the absolute survival benefit is still relatively modest, at approximately 15%.35 Various strategies have been developed with the goal of identifying more homogeneous subgroups of patients, including the use of prognostic information to stratify according to relative risk. Currently, common clinical and pathological variables such as tumor size, nodal status, estrogen receptor status, and histologic grade constitute a mechanism to guide patients diagnosed with breast cancer and their physicians in their options for adjuvant therapy.1,36 The use of the clinicopathological prognostic model (Adjuvant!), a very common tool in clinical practice, integrates this information, together with knowledge of large studies measuring outcome to develop a useful classification according to risk. Nevertheless, although these assessments do provide an opportunity to dissect a diverse and heterogeneous population of patients with breast cancer, they do not provide a strategy for therapeutic options that might best match the individual patient. For instance, several genomic classifiers have been previously described to identify breast cancer subtypes that have distinct clinical characteristics.7,37- 39 However, these subdivisions generally do not offer improved strategies for individualizing therapeutic options. As such, there is an opportunity to apply genomic signatures that can best capture the biological nature of a tumor on the patient, and incorporating that knowledge with relevant clinical and pathological data may improve prognostic ability, obtain a better understanding of the underlying biology of breast cancer, and identify effective therapeutic options for an individual patient. The study we described herein represents one such approach—signatures reflecting pathway activation or chemotherapy sensitivity not only provide an opportunity to further dissect the heterogeneity of risk groups identified by traditional clinicopathological prognostic models, they also provide information that may guide therapeutic decisions. The strength of such an approach is the ability to look at unique patterns or clusters of pathway deregulation, not single gene sets. The results we report herein suggest that this approach of looking at patterns of deregulation representing both oncogenic signaling pathways and relevant tumor microenvironment may lead to a better understanding of clinically relevant subphenotypes of breast cancer.
Beyond the practical impact of providing an integrated strategy for assessing risk and therapeutic opportunities, our results provide an opportunity to better understand the biology underlying the prognostic subphenotypes that are clinically relevant, because they build on the current standard of care, clinicopathological risk stratification. But beyond prognosis, this approach has the potential to dissect broad phenotypes while providing data to reveal novel therapeutic opportunities for patients at highest risk of recurrence. Our past work has demonstrated an association between predicting oncogenic pathway deregulation and sensitivity to therapeutics that target a component of the deregulated pathway.22 Our present analysis using signatures of oncogenic pathway deregulation, tumor microenvironment status, and chemotherapy sensitivity suggests several possibilities for patients stratified by prognostic group. As an example, a substantial fraction (>65%) of patients within the high-risk cohort exhibit evidence of multiple oncogenic pathway activation with concomitant alteration of the tumor microenvironment. In particular, activation of TNF-α, RAS, CTNNB, and SRC is associated with the worst prognosis. Although there are no specific targeted therapies linked to CTNNB pathway activation or modulation of TNF-α, there are agents developed to target SRC or activities downstream of SRC.40,41
Furthermore, identification of prognostically distinct subsets of patients using gene expression–based pathway signatures, which were confirmed in multivariate analysis and validated in large independent validation cohorts, provides not only an opportunity to tailor targeted therapeutic approaches but also suggests opportunities for selection of therapies that may be the most effective in patients with specific subphenotypes. For example, a cohort of poor prognostic patients, identified based on pathway deregulation, within the high-risk group are predicted to be sensitive to fluorouracil and thus may benefit from initial treatment with regimens including fluorouracil (eg, FAC: a chemotherapy regimen involving the use of fluorouracil-adriamycin and cyclophosphamide as a combination therapy) rather than a docetaxel-containing regimen. We also demonstrated that similar results and opportunities for further individualization of prognosis and therapy were observed in the low- and intermediate-risk cohorts as well. As such, these observations provide an approach to developing strategies for prognosis that build on clinically relevant risk stratification systems, and also identify the potential for novel therapeutic regimens tailored to the individual patterns of gene expression. Indeed, pending further prospective validation, we envision that the genomic information from an individual patient could easily be incorporated within the context of the clinicopathological prognostic model, to provide a basis for more refined prognosis.
One potential limitation of our study is the lack of data regarding hormonal therapy in some patients. However, the reproducibility and validity (in multivariate testing) of the integrated approach we describe suggests that maybe incorporation of knowledge of hormonal receptor status (via the clinicopathological model) may function as a surrogate for response to hormonal therapy. Another relevant issue is that the panel of signatures representing sensitivity to cytotoxic agents in our present analysis is not exhaustive and regimen-specific signatures for drug combinations proven to be effective in breast cancer would probably be more practical. However, we have previously demonstrated that individual signatures representing chemotherapeutic sensitivity can indeed be used to estimate the probability that a regimen incorporating multiple individual cytotoxic agents would be effective.18 Another limitation of our study is the small number of patients within certain pathway clusters, which hampers statistical comparisons.
Pending future prospective clinical validation, these results provide preliminary evidence that the profusion of gene expression signatures in defining breast cancer, if used appropriately, represent less of a paradox and should be viewed as an important complementary approach to current clinicopathological risk stratification systems. Furthermore, knowledge of increased likelihood of sensitivity to specific chemotherapeutic agents from a repertoire of drugs that are commonly used to treat breast cancer is something that could be more immediately used in current clinical practice, once issues regarding cost and accessibility are addressed, in instances wherein multiple chemotherapeutics or chemotherapeutic combinations are Food and Drug Administration approved, as in early stage breast cancer, and are considered the standard of care.
Corresponding Author: Anil Potti, MD, Duke Institute for Genome Sciences and Policy, 101 Science Dr, Box 3382, Duke University, Durham, NC 27708 (firstname.lastname@example.org).
Author Contributions: Dr Potti and Mr Acharya had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Acharya, Nevins, Potti.
Acquisition of data: Acharya, Hsu, Anders, Potti.
Analysis and interpretation of data: Acharya, Anguiano, Salter, Walters, Redman, Tuchman, Moylan, Mukherjee, Barry, Dressman, Ginsburg, Marcom, Garman, Lyman, Nevins, Potti.
Drafting of the manuscript: Acharya, Redman, Moylan, Garman, Nevins, Potti.
Critical revision of the manuscript for important intellectual content: Acharya, Hsu, Anders, Anguiano, Salter, Walters, Redman, Tuchman, Moylan, Mukherjee, Barry, Dressman, Ginsburg, Marcom, Garman, Lyman, Nevins, Potti.
Statistical analysis: Acharya, Mukherjee, Barry, Lyman, Potti.
Obtained funding: Potti.
Administrative, technical, or material support: Anders, Potti.
Study supervision: Acharya, Ginsburg, Nevins, Potti.
Financial Disclosures: None reported.
Funding/Support: This study was supported by research grants from the American Cancer Society, National Cancer Institute, the Emilene Brown Genomic Cancer Research Fund, and the Jimmy V Foundation.
Role of the Sponsors: All study funding was from public grants for scientific research. The funding organizations had no role in the design and conduct of the study, in the collection, analysis, and interpretation of the data, or in the preparation, review, or approval of the manuscript.
Additional Sources: Supplementary information is available from the authors online at http://data.genome.duke.edu/acharya_cr.
Additional Contributions: Gunjan Verma and Justin Guinney, both PhD candidates (Duke University), and Jeffrey T. Chang, PhD (Duke University), provided meaningful discussions on the statistical aspects of the study. John A. Foekens, PhD (Erasmus MC), Yixin Wang, PhD (Erasmus MC), Yi Zhang, PhD (Veridex Inc, Johnson & Johnson), and Lance D. Miller, PhD (Genome Institute of Singapore), provided additional clinical data on their samples. None of the above mentioned contributors received any kind of compensation for their time.