Prognostic and Predictive Value of Immune-Related Gene Expression Signatures vs Tumor-Infiltrating Lymphocytes in Early-Stage ERBB2/HER2-Positive Breast Cancer

Key Points Question Which immune-related biomarker provides the most valuable information to predict pathologic complete response and event-free survival in patients with early-stage ERBB2/HER2-positive breast cancer: tumor-infiltrating lymphocytes, immune-related gene expression signatures, or both? Findings In this predictive prognostic study in which a combined correlative analysis of the CALGB 40601 and PAMELA trials was conducted, 305 patients with early-stage ERBB2/HER2-positive breast cancer, 6 B-cell–related signatures were more strongly associated with pathologic complete response than were tumor-infiltrating lymphocytes. In a multivariable Cox model performed in the CALGB 40601 trial, the immunoglobulin G signature, but not tumor-infiltrating lymphocytes, was independently associated with event-free survival. Meaning Findings suggest that when both tumor-infiltrating lymphocytes and gene expression are available, the prognostic and predictive value of RNA sequencing–based immune signatures is superior.

D uring the last 2 decades, the outcome of patients with early-stage ERBB2/HER2-positive breast cancer has markedly improved owing to new treatment strategies combining polychemotherapy and multiple ERBB2/HER2targeted drugs. 1-10 However, it is increasingly evident that many patients are overtreated by the recommended regimens, whereas others still experience metastatic relapse. A primary research focus in breast cancer is to better tailor treatments to risk; to accomplish this, effective prognostic and predictive biomarkers are needed.
Increasing evidence suggests that the activation of the host immune system mediates the response to ERBB2/HER2targeted therapies in breast cancer. 11 Currently, there are several methods to assess intratumor immune activation. The presence of tumor-infiltrating lymphocytes (TILs) in the hematoxylin-eosin-stained tumor slides is one of these methods, and an international working group has established standardized tools for measuring TILs. 12 The percentage of TILs that infiltrate the breast tumor is positively prognostic in patients with early-stage ERBB2/HER2-positive breast cancer treated with anti-ERBB2/HER2 therapies in multiple scenarios: in the neoadjuvant and adjuvant setting, in the presence or absence of chemotherapy, with single and dual ERBB2/HER2 blockade, and when assessed at baseline and during treatment. [13][14][15][16][17][18] Other than TILs, immune activation can also be measured by gene expression. 19,20 In patients with early-stage ERBB2/HER2positive breast cancer treated in the neoadjuvant setting, immune-related gene expression signatures (iGESs) are associated with higher pathologic complete response (pCR) rates and prolonged survival. [21][22][23] Specifically, the immunoglobulin G (IgG) signature 24 has previously shown strong and independent prognostic value across many studies. 2,19,22,25 However, the comparative prognostic ability of these different means of measuring immune activation has not been well examined. In this retrospective predictive and prognostic study, we tested which biomarker, or combination of biomarkers, is the most powerful for response and survival in 2 independent clinical trials: the Cancer and Leukemia Group B (CALGB) 40601 trial (NCT00770809) and the PAMELA trial (NCT01973660), respectively. The CALGB is now part of the Alliance for Clinical Trials in Oncology.

Neoadjuvant Trials
The CALGB 40601 trial study design, pCR, event-free survival (EFS), overall survival, and genomic correlative studies have been previously published. 2,22 In this predictive and prognostic study, a total of 305 women with stage II to III ERBB2/ HER2-positive breast cancer were randomly assigned to receive neoadjuvant weekly paclitaxel with the addition of trastuzumab, lapatinib, or both for 16 weeks. The primary end point was pCR, defined as no invasive tumor in the breast at surgery, and secondary end points included EFS. The PAMELA trial study design, pCR, and biomarker correlative studies have also been previously published. 14,26,27 In this phase 2 trial, 151 patients with stage I to IIIA ERBB2/HER2-positive breast cancer received neoadjuvant lapatinib plus trastuzumab for 18 weeks. The primary outcome was the ability of the HER2-enriched subtype to predict pCR, defined as no invasive tumor in the breast at surgery. Each trial participant signed an institutional review board-approved (National Cancer Institute Central institutional review board for the CALGB 40601 trial and Hospital Universitari Vall d'Hebron for the PAMELA trial), protocol-specific informed consent document following federal and institutional guidelines.

Tumor Gene Expression Analyses and iGESs
Gene expression profiles from pretreatment core biopsies were obtained from 264 of 305 CALGB 40601 trial participants (86.6%) and 142 of 151 PAMELA trial participants (94.0%) (eFigure 1 in the Supplement). Whole-transcriptome analyses by messenger RNA sequencing (RNA-Seq) were performed in the University of North Carolina High-Throughput Sequencing Facility and analyzed by the university's Lineberger Comprehensive Cancer Center Bioinformatics Core. The RNA sequencing libraries were made from total RNA with the TruSeq (Illumina) messenger RNA kit in the CALGB 40601 trial and the TruSeq RNA Access kit in the PAMELA trial and were sequenced on an Illumina HiSeq 2000 using a 2 × 50-base pair configuration. The CALGB 40601 trial RNA-Seq FASTQ files are available on the dbGAP repository (phs001570.v3.p1). The PAMELA trial RNA-Seq FASTQ files are available on EGA (EGAS00001006410/EGAD00001009054). Purity-filtered reads were aligned to the human reference GRCh38/hg38 genome, using Spliced Transcripts Aligned to a Reference, version 2.4.2a. 28 Transcript (GENCODE, version 22) abundance estimates were generated by Salmon, version 0.6.0 29 in "-quant" mode, based on the Spliced Transcripts Aligned to a Reference alignments. Raw read counts for all RNA-Seq samples were normalized to a fixed upper quartile. 30 Messenger RNA sequencing-normalized gene counts were then log 2 transformed, and genes were filtered for those expressed in 70% of samples. The batch effect between the gene expression

Key Points
Question Which immune-related biomarker provides the most valuable information to predict pathologic complete response and event-free survival in patients with early-stage ERBB2/HER2positive breast cancer: tumor-infiltrating lymphocytes, immune-related gene expression signatures, or both?
Findings In this predictive prognostic study in which a combined correlative analysis of the CALGB 40601 and PAMELA trials was conducted, 305 patients with early-stage ERBB2/HER2-positive breast cancer, 6 B-cell-related signatures were more strongly associated with pathologic complete response than were tumor-infiltrating lymphocytes. In a multivariable Cox model performed in the CALGB 40601 trial, the immunoglobulin G signature, but not tumor-infiltrating lymphocytes, was independently associated with event-free survival.
Meaning Findings suggest that when both tumor-infiltrating lymphocytes and gene expression are available, the prognostic and predictive value of RNA sequencing-based immune signatures is superior.
from the CALGB 40601 and PAMELA trials was corrected by applying the distance-weighted discrimination method, 31,32 version 1.0.2 33 (SlicerSALT) and using the CALGB 40601 trial as reference. Intrinsic subtypes were obtained from RNA-Seq gene expression data as described elsewhere. 22 Expression of 202 iGESs from 43 publications (eReferences in the Supplement) was calculated. The list of iGESs and the genes within each signature are summarized in eTable 1 in the Supplement, and the R code is provided. 34 Finally, the iGESs were classified into 22 immune classes based on their gene ontology, with CIBERSORT as reference. 35

TIL Evaluation
In the CALGB 40601 and PAMELA trials, slides from core biopsies were available for 230 of 264 patients (87.1%) and 138 of 142 patients (97.2%) from the RNA-Seq cohort (eFigure 1 in the Supplement), respectively. The stromal TILs from both clinical trials were scored by the CALGB 40601 trial lead study pathologist (B.S.), following the International TILs Working Group recommendations. 12 In patients with more than 1 core biopsy available, the mean of the 2 TILs assessments was calculated, and patients without TILs assessment were censored.

Statistical Analysis
The criteria of the Reporting Recommendations for Tumour Marker Prognostic Studies (REMARK) guidelines were followed for this study. 36 Comparisons of differences in baseline clinicopathologic variables between the CALGB 40601 trial and the PAMELA trial were made with the Wilcoxon rank sum test (continuous variables) and the χ 2 test (categorical variables).
For pCR and EFS modeling, the iGES scores were analyzed as continuous variables. Stromal TILs were analyzed as continuous and discrete variables with different prespecified cutoffs (ie, 20%, 40%, and 60%). Immune-related gene expression signatures and TILs were also categorized by tertiles (ie, low, medium, and high) for visualization purposes.
The association between TILs and iGESs was measured with Spearman correlation coefficients. For differential gene expression analysis, we performed a multiclass significance analysis of microarrays. 37 The association of immune biomarkers with pCR was evaluated in the CALGB 40601 and PAMELA combined cohort by logistic regression models. P values were adjusted for multiple testing with a Benjamini-Hochberg method to control the false discovery rate. To compare the goodness of fit of 2 models, we used the Akaike information criterion (AIC). As accuracy metric, we calculated the area under the receiver operating characteristic curve (AUC) for pCR, using the CALGB 40601 trial as a train set and the PAMELA trial as validation. In the CALGB 40601 trial, a mean of the AUC was calculated with 10-fold cross-validation.
In the CALGB 40601 trial, EFS was defined as the time from randomization to a breast cancer relapse after surgery, second primary malignant neoplasm, or death without recurrence for women who underwent surgery. For individuals who did not undergo surgery, the event was defined as death during clinical follow-up or noncompletion of neoadjuvant therapy due to progressive disease. The median follow-up was 9.1 years (IQR,). The association of immune biomarkers with EFS was evaluated with Cox regression models. P values were adjusted for multiple testing with a Benjamini-Hochberg correction. To compare the goodness of fit, we used the AIC. To evaluate the accuracy, we calculated an average C index using 5-fold cross-validation. To avoid a potential guarantee time bias in the multivariable EFS models including pCR status, we performed a 30-week landmark analysis. The landmark subpopulation included only patients without events who were followed up at 30 weeks after randomization. 38,39 Finally, to compare the prognostic ability of 2 nested models, we used the likelihood ratio test (LRT).
All the analyses were based on the study clinical database frozen on June 10, 2021. All tests were 2-sided, and a .05 level of significance was used. All statistical analyses were performed with R version 3.5.2 (R Foundation for Statistical Computing) and Python version 3.6 (Python Software Foundation). Data analyses were performed from June 1, 2020, to January 1, 2022.

Baseline Patient Characteristics and TIL Distribution
The characteristics of the 305 patients included in the study are summarized in Table 1. Data on race and ethnicity were collected in the CALGB 40601 trial but not the PAMELA trial; therefore, we decided not to include the information in this study. The median age of the patients was 50 years (IQR, 42-50 years), and 305 (100%) were women. Patients enrolled in the CALGB 40601 trial were significantly younger, more likely to be premenopausal, and at a more advanced clinical stage at diagnosis than those enrolled in the nonchemotherapy PAMELA trial. There were no significant differences between the trials in the hormone receptor status and intrinsic subtype distribution. In the CALGB 40601 trial, there were no statistically significant differences in the baseline clinicopathologic characteristics between the TIL cohort (n = 230) and the landmark cohort (n = 227) (eTable 2 in the Supplement).

Association Between TILs and iGESs
We found that 166 of 202 iGESs (82.2%) were significantly correlated with TILs in both studies, 179 in the CALGB 40601 trial and 174 in the PAMELA trial (eTable 3 in the Supplement). Spearman correlation coefficients are summarized in eTable 3 in the Supplement, and the coefficients from the 20 signatures that were the most correlated with TILs in both studies are represented in eFigure 4 in the Supplement. The iGESs most correlated with TILs were largely T-cell related. The highest Spearman correlation coefficient for TILs was 0.61 in the CALGB 40601 trial and 0.71 in the PAMELA trial. Three signatures associated with resistance to immunotherapy, inflammation, and immunosuppression were significantly negatively correlated with TILs in both studies (Spearman correlation coefficients for the CALGB 40601 and PAMELA trials: −0.29 and −0.37,  Cancer and Leukemia Group B 40601 (CALGB 40601) and PAMELA trial samples were classified into 3 different groups by study, depending on the TIL and IgG levels by tertiles (ie, low, medium, and high TILs; and low, medium, and high IgG). Then, a multiclass significance analysis of microarrays was performed.
The standardized mean differences between the immune signatures in 1 class vs the overall mean expression for each study are represented in 4 heatmaps. NK indicates natural killer; T regs, T regulatory cells. To further study the association between TILs and iGESs, we compared the differences in immune cell infiltration, using our CIBERSORT-derived signatures to perform a multiclass significance analysis of microarrays by TIL levels (ie, low, medium, and high) and by IgG signature levels (ie, low, medium, and high) in the CALGB 40601 and PAMELA trials. The standardized mean differences between the iGESs in 1 class vs the overall mean expression are represented in Figure 2A (the CALGB 40601 trial) and Figure 2B (the PAMELA trial). Tumors with high TIL and IgG levels were significantly enriched for T cells compared with those with low TIL and IgG levels. However, although tumors with high IgG levels were enriched for B-cell and plasma cell signatures compared with those with low levels, in both studies, samples with high and low TIL levels showed a high expression of B-cell and plasma cell signatures. This analysis suggests that TILs do not recapitulate B-cell and plasma cell immune infiltration, and the 2 biomarkers should not be considered the same.

Association of TILs and iGESs With pCR in the CALGB 40601 and PAMELA Trials
In the combined cohort, the percentage of TILs as a continuous variable was significantly associated with pCR, with an odds ratio of 1.01 (95% CI, 1.01-1.02; P = .02) for each 1% increase in TILs. This association was observed regardless of the clinical trial and treatment group ( Figure 3A). High vs low levels of TILs using a cutoff of 20% and 40% were also significantly associated with pCR (20%: odds ratio, 1.86; 95% CI, 1.20-2.91; P = .04; 40%: odds ratio, 2.29; 95% CI, 1.40-3.77; P = .02). The model including TILs with a cutoff of 40% rather than TILs as a continuous variable better predicted pCR (AIC, 471.68 for TILs with a cutoff of 40% vs 472.23 for TILs as a continuous variable; AUC, 0.59 for TILs with a cutoff of 40% in the PAMELA trial vs 0.57 for TILs as a continuous variable).
Thirty-six of 202 iGESs (17.8%) were also significantly associated with pCR independently of the treatment group and study (eTable 4 in the Supplement; Figure 3B; eFigure 6 in the Supplement). The biomarker models from 13 of these signatures better predicted the response end point than the best TILs model for pCR prediction, showing lower AIC values (AIC, 450.06-471.30 for iGESs vs 471.68 for TILs with a cutoff of 40%) (eTable 4 in the Supplement). When the accuracy metrics were examined, 7 iGESs outperformed TILs for pCR, showing higher AUC values in the train (CALGB 40601 trial) and the validation set (PAMELA trial) (eTable 5 in the Supplement). Moreover, in multivariable models adjusted by multiple clinical parameters (study, treatment group, stage, age, hormone receptor status, menopausal status, and subtype), these signatures but not TILs were significantly associated with pCR (eTable 5 in the Supplement). Most of these signatures were associated with B cells, plasma cells, and immunoglobulins (eFigure 7 in the Supplement).

Association of TILs and iGESs With EFS in the CALGB 40601 Trial
In the CALGB 40601 trial, 37 iGESs, but not TILs, were significantly associated with EFS in Cox regression models adjusted by treatment group (eTable 6 in the Supplement). As with pCR, the top-performing immune signatures were also associated with B cells.
Finally, we wanted to test whether the combination of iGESs and TILs was more prognostic than each alone by comparing multiple multivariable Cox regression models. To test this, we selected 6 immune signatures that outperformed TILs for pCR and were also prognostic in the CALGB 40601 trial: 3 IgG signatures, 2 B-cell signatures, and 1 plasma cell signa-  ture. The results for 1 of the IgG signatures are shown in Table 2.
We first built a base model that included known prognostic variables: pCR status, treatment group (weekly paclitaxel plus trastuzumab plus lapatinib, weekly paclitaxel plus lapatinib, or weekly paclitaxel plus trastuzumab, where the latter is the reference group), hormone receptor status, clinical stage, and PAM50 intrinsic tumor subtype (HER2-enriched vs other subtypes) (model 1). In this model, treatment group, clinical stage, pCR status, and intrinsic subtype were significantly associated with EFS. Adding TILs (model 2) did not provide additional prognostic information (LRT P = .12). We then built a model adding our previously published IgG signature to model 1; in this model (model 3), treatment group, clinical stage, pCR status, intrinsic subtype, and the IgG signature were all significantly associated with EFS (IgG-adjusted hazard ratio, 0.63; 95% CI, 0.45-0.87; P = .006). Also, model 3 was significantly better than model 1 for EFS (LRT P = .005). The last model (model 4) included both TILs and the IgG signature. In this model, the IgG signature, but not TILs, was significantly associated with EFS (IgG-adjusted hazard ratio, 0.63; 95% CI, 0.42-0.93; P = .02; TILadjusted hazard ratio, 1.00; 95% CI, 0.98-1.02; P = .99) ( Table 2).
Model 4, including both TILs and the IgG signature, was significantly better than the model including only TILs (model 4 vs model 2; LRT P = .02), but it was not significantly better than model 3, which included only the IgG signature (model 4 vs model 3; LRT P = .99). Similar results were observed with the other 5 iGESs (eTable 7 in the Supplement) when TILs was used as a discrete variable with a cutoff of 40% (eTable 8 in the Supplement) and when a landmark analysis was performed (eTable 9 in the Supplement). When different multivariate models including clinical parameters and 1 iGES were compared, the immunoglobulin-The Cancer Genome Atlas signature model performed slightly better than the rest (eTable 10 and eTable 11 in the Supplement).

Discussion
In the CALGB 40601 and PAMELA trials, both the proportion of TILs and the multiple iGESs were significantly associated with pCR. Moreover, in the CALGB 40601 trial, several immune signatures were also associated with EFS in univari- The proportion of TILs as a continuous variable has proven to be an independent prognostic biomarker in early-stage ERBB2/HER2-positive breast cancer. 13-18 Thus, the evaluation of TILs has been proposed as a readily available tool to identify different prognostic groups in this setting. Despite the standardization of TIL scoring by the International TILs Working Group, this biomarker still has low reproducibility rates across pathologists, with κ values in the 0.4 to 0.6 range 44,45 even after efforts to define the optimal TIL cutoff. 46 However, these are ongoing harmonization efforts, and TILs have value, particularly in resource-constrained settings.
Not surprisingly, quantitative gene expression of immunerelated genes and signatures is strongly associated with the infiltration of TILs. 47,48 Immune genes and signatures have also proven to have prognostic and predictive value in patients with early-stage ERBB2/HER2-positive breast cancer, and the IgG signature is included in the HER2DX genomic test. [21][22][23]25 To our knowledge, this is the first study demonstrating that when both TILs and immune gene expression data are available, iGESs, particularly B-cell signatures, provide more prognostic information in ERBB2/HER2-positive breast cancer without the additional value of adding TILs. Similar results suggest the potential superiority of iGESs over TILs in triple-negative breast cancer treated with chemotherapy alone in the CALGB 40603 trial, 48 in which B-cell features, including IgG expression, were found to be the most prognostically valuable metric.

Limitations
Our study has limitations. First, a substantial proportion of patients included in the CALGB 40601 and PAMELA trials received trastuzumab combined with lapatinib, a dual treatment used in the metastatic setting but not approved for earlystage ERBB2/HER2-positive breast cancer. Moreover, the PAMELA trial differed from the CALGB 40601 trial in that patients received only anti-ERBB2/HER2 therapy before surgery, without chemotherapy. Second, although EFS was a key secondary end point of the CALGB 40601 trial, the trial was not powered for long-term outcomes, so EFS prediction modeling should be interpreted with caution. Third, even when scored as a continuous variable, the proportion of TILs (ie, 0% to 100%) follows a semiquantitative pattern, with increments of 5% to 10%, and thus is not a true continuous variable like iGESs. However, when TILs were divided into high vs low levels based on multiple prespecified cutoffs, their ability to predict response and survival was lower than that of multiple B-cell-related signatures. Finally, we performed numerous statistical predictions simultaneously by building 1 model for each immune biomarker to predict pCR and EFS. To control type I error, we adjusted the models' P values for multiple testing.

Conclusions
To conclude, accumulating evidence supports the validity of using evidence of immune activation, which can be measured with TILs or immune-related gene expression biomarkers, to stratify patients with early-stage ERBB2/HER2positive breast cancer into different prognostic groups. This study supports that measurement of immune activation, either by TIL measurement or by immune-related gene expression profiling, is predictive of treatment response and that immunerelated gene expression is prognostic. In the presence of both immune biomarker types, iGESs, especially B-cell-related signatures, outperform TILs for both pCR and prognosis, and the combination of both biomarkers does not yield improved prognostic value. These results highlight the essential role of B cells in antitumor immunity and suggest that B-cell immunerelated gene expression provides valuable prognostic information for treatment escalation and de-escalation in patients with early-stage ERBB2/HER2-positive breast cancer.