To compare the results of clinical and pathological staging for a large cohort of patients with head and neck squamous cell carcinoma (HNSCC) and to examine patterns and ramifications of the disparity between staging methods.
Prospective inception cohort (median follow-up, 7 years).
Multi-institutional cooperative group study (Eastern Cooperative Oncology Group 4393/Radiation Therapy Oncology Group 9614) involving 17 academic medical centers.
A total of 560 patients with new-onset or recurrent HNSCC enrolled during a 7-year period.
Surgical resection with curative intent with or without adjuvant or previous radiotherapy or chemotherapy.
Main Outcome Measures
Clinical staging and pathological staging and the component TN tumor categories were compared with overall and disease-specific survival. Association of survival with staging was derived by means of the proportional hazards model.
Of the 501 cases in which both clinical and pathological staging was available, a disparity was found between at least 1 component tumor category assigned by the 2 methods in almost 50% of cases. Both methods showed a strong association of stage with overall survival for the cohort at large. However, pathological nodal category was a superior predictor (P < .001 vs P = .005), whereas there was an advantage to pathological tumor category in predicting disease-specific survival (P = .01).
Both staging methods are useful in predicting survival, whereas information gained at neck dissection regarding nodal metastases provides some refinement in prognostic results. These findings demonstrate the need for enhanced methods of tumor assessment and apparent benefit of data gathered at neck dissection for accurate disease assessment and stratification.
Clinical data regarding a large cohort of patients with head and neck squamous cell carcinoma (HNSCC) have been gathered through the auspices of 2 cooperative groups, the Eastern Cooperative Oncology Group (EGOG) and the Radiation Therapy Oncology Group (RTOG). Study ECOG 4393/RTOG 9614 has as its first objective to determine the clinical utility of molecular detection of cancer cells in tumor margins, which continues under analysis. An independent second objective of the protocol is to determine the incidence of TP53 (OMIM *191170) mutation in HNSCC and seek associations between TP53 status and clinical outcome. This analysis forms the basis of a recent publication.1 The current analysis was motivated by the observation that pathological N category, but not clinical T or N category, was associated with TP53 mutation status. The reasons for this variance were not immediately apparent, and so we set out to explore differences between the 2 staging paradigms. Our hypothesis was that pathological stage should correspond more faithfully to biological aggressiveness of disease, resulting ultimately in greater precision in predicting outcome. This study was undertaken to further elucidate the significance of clinical and pathological stage and the component T and N categories as prognostic factors for overall and disease-specific survival. This is, to our knowledge, the first such report in the literature to compare clinical and pathological stage in a large cohort of patients with HNSCC.
From January 17, 1996, to October 18, 2002, 560 patients with HNSCC were enrolled in a prospective multicenter study involving 17 member institutions of ECOG and RTOG. The protocol was approved by the cooperative groups and by each participating institutional investigational review board. Subjects provided written informed consent. Individuals with newly diagnosed or recurrent HNSCC were eligible if the treatment plan included primary surgical extirpation with curative intent. The cohort included 118 patients in whom cancer at presentation had persisted or recurred after previous therapy (typically radiotherapy or wide local excision).
Tumor and margin samples were collected during the operation. Demographic and clinical data were collected from participating institutions by the ECOG and RTOG data managers perioperatively and at scheduled intervals during the follow-up period. Follow-up data from patients who were alive without recurrence at the time of the analysis were censored at the last follow-up contact with those patients. Clinical stage was evaluated by each local treating surgeon and reported at study entry by means of American Joint Committee on Cancer criteria.2 Clinical stage of patients with recurrent or persistent cancer was to be that of the index presentation, following clinical staging convention. At the time of subject enrollment, computerized imaging (computed tomography and magnetic resonance imaging) was typically performed and results were included in the staging, but positron emission tomographic scanning was not used routinely. Pathology reports were submitted to ECOG/RTOG and the findings were reviewed, pathological stage was determined, and findings were tabulated. At 6-month intervals for the first 3 years and annually thereafter, the status of each subject, which included information about recurrent or second primary cancer and subsequent treatment, was reported to the ECOG central office.
Analysis was performed at the ECOG Statistical Center. Descriptive statistics were used to characterize patients at baseline. Survival curves were derived by the Kaplan-Meier method,3 and differences were examined by means of the log-rank test.4 Survival was defined as the time from study entry to death or to the last follow-up visit. Progression-free survival was defined as the time from study entry to death or recurrence. Data from patients who were alive without recurrence at the time of the analysis were censored at the last follow-up contact with those patients. The Fisher exact test,5 unpaired, 2-sided t test, and Mehta exact test for ordered categorical data6 were used to compare patient categories.
Proportional hazards models7 were used to assess the univariate prognostic significance of tumor variables on overall and disease-specific survival. P values shown with hazard ratios are from the likelihood ratio test. Hazard ratios were calculated relative to a reference group as shown for univariate analyses. To better elucidate cause-specific mortality, the method developed by Gray8 was used to examine competing causes of death, distinguishing between death from head and neck cancer and death from other or unknown causes.
To explore further the association between stage and survival, multivariable models were developed. Factors of potential interest included sex, age, site of the primary tumor, ECOG performance status, smoking history, alcohol use at study entry, cell differentiation, status of disease at study entry (eradicated, recurrent, residual disease after previous therapy, or untreated), and whether the tumor extended into adjacent structures. To reduce the impact of missing data, factors were introduced into the model as level-specific indicator variables, with a level for missing values. Models were evaluated by means of the Akaike information criterion,9 which moderates the effect of the number of covariates on changes in the log likelihood by taking the number of covariates into account. First, the full model was evaluated; factors were then removed if their elimination resulted in a model with improved Akaike information criterion.
A total of 560 patients were registered to the study. Fifteen patients were ineligible because of concurrent cancer within 5 years (1 patient), consent given after surgery (1), metastases at study entry (2), distant metastases not assessed (2), no surgery performed (3), or no HNSCC identified at resection (6), and comparable stage information (both T and N category) was missing for 43 patients. One patient was missing a valid survival time interval. The remaining 501 patients are the focus of this analysis. This group is more extensive than the group analyzed for TP53 mutation status2 because inclusion was not dependent on the availability or integrity of tissues for molecular analysis.
Survival was evaluated as of June 16, 2008, and was defined as the time from registration to death or the last follow-up visit. Among the 501 patients, 289 had died. Median follow-up among patients still alive was 7 years. Characteristics of patients included in this analysis are provided in Table 1, along with numbers of deaths, median survival by category, 95% confidence intervals, and P values for the log-rank test.
Tables 2, 3, and 4 show the relationship between clinical and pathological T category, N category, and overall stage, respectively. Perfect concordance between clinical and pathological T category was 52.2%; N category, 53.5%; and overall stage, 54.9%. In particular, nearly 40% (38 of 97) of the cases judged to be T4 clinically were found to be in a lower T category at pathological evaluation, whereas more than 40% of cases (43 of 102) found to be T4 on pathological evaluation had been classified at less than T4 clinically (Table 2). Only 69.7% of cases judged to be N0 clinically (145 of 208) were found to be N0 pathologically (true-negative clinical staging), which resulted in 30.3% false-negative clinical staging (82 cases could not be scored for pN category because no neck dissection had been performed). Twenty-six of 172 cases (15.1%) found not to have tumor in dissected lymph nodes (pN0) had been clinically judged to have nodal metastasis (cN+) (false-positive clinical staging) (Table 3).
Table 5 provides overall survival by each level of clinical and pathological T and N category and by combined stage for all patients with known stage. The numbers of patients and deaths may therefore vary among categories. There was a strong association between both clinical and pathological stage and overall survival. The median survival for those judged N0 by pathological evaluation was 7.8 years, compared with 5.9 years for those judged N0 on the basis of clinical criteria.
Table 6 provides survival by each level of clinical and pathological T and N category and by overall stage for the 380 patients who were registered with a disease status of “untreated” at study entry. These were assessed alone because it was assumed that the TNM category provided at study entry would be most closely associated with prognosis for these index-case patients. Again, among this subset of patients, there was a strong association between both clinical and pathological stage and overall survival.
Figure 1 and Figure 2 show Kaplan-Meier survival curves for overall survival for all patients and for untreated patients only by clinical and pathological T and N category. Note the wider separation of curves for pathological categorization of nodal status than for clinical category figures. Figure 3 and Figure 4 show the cumulative incidence of death due to disease (competing cause of death: death from unrelated or unknown cause) for the entire population and also for patients entering the study with newly diagnosed cancers, demonstrating the association between rate of death and overall stage, both clinical and pathological. In each case, the survival curves for death owing to disease were separated more widely and associated with stage compared with curves for death due to other or unknown cause. For death owing to disease, the association of survival with pathological stage was stronger than the association with clinical stage.
Table 7 provides hazard ratios and 95% confidence intervals from the optimal multivariable model developed. We conclude that age, ECOG performance status, clinical tumor category, and pathological nodal category are significant prognostic factors for overall survival.
We explored whether subgroups that demonstrated disparity between clinical and pathological stage had corresponding differences in outcome. If patients with perfect T or N category concordance were compared with those without concordance (in which pathological staging differed from clinical), no statistically significant differences were identified (data not shown). Disparate groupings included the 8 cases evaluated as cT1 and pT4, 18 cases evaluated as cT4 and pT1 or pT2, 33 cases evaluated as cN0 but pN2, and 24 cases evaluated as cN1 or cN2 and pN0. Median overall survival and P values for the log-rank test for these groups of patients are given in Table 8. The discordant groups are small, and the statistical power to detect differences may be limited. However, cases for which nodal status changed from N0 to N+ and those changing from N+ to N0 had significantly different length of survival compared with cases correctly classified by clinical method, indicating the greater prognostic accuracy of pathological nodal assessment.
The results of our analysis reconfirm that both clinical and pathological staging are strongly predictive of clinical outcome, despite the fact that components of the 2 forms of staging are discordant in nearly one-half of cases. Among all patients, pathological T and N categories were slightly more closely associated with deaths from head and neck cancer (disease-specific survival) than was clinical stage. Pathological nodal category was a somewhat better predictor of overall survival. This was also true when previously untreated patients were considered alone. There were 274 patients who were discordant in either T or N category, and 181 (66.1%) of these had discordant overall stage. The other 93 patients had a T or N shift that did not result in a change to overall stage (change either did not affect overall stage or canceled reciprocal change in the other category). This may explain, in part, why discordances in T and N components were associated with detectable differences in outcome whereas comparison of the method of overall staging did not result in significant differences in survival. Alternatively, the results might be interpreted as demonstrating deficiencies in the overall staging approach (Table 7).
There are a number of reasons for differences between clinical and pathological staging. Judgment of size and depth of cancer penetration form the bases of most components of the primary T category. Clinical T category is based on size as judged by physical and radiographic examination. These methods have limited accuracy, making overestimation and underestimation possible. On the other hand, pathological evaluation of T category is done by the pathologist either at the time of gross inspection of the resected specimen or after fixation, sectioning, and microscopic analysis. Pathologic assessment, then, may be flawed by separation or filleting of tumor fragments at the time of resection and by shrinkage in the fixation process. For our population as a whole, the number of cases that had lower pathological than clinical T category was balanced by those that had greater pathological than clinical T status. In general, minor adjustments in T category comparing cT with pT did not result in meaningful differences in the association with survival. It would seem that, when a tumor thought on clinical evaluation to be T4 (usually because of suspected bone or cartilage involvement) was found to be of a lesser pathological class, the adjustment should result in a more accurate and better prognosis for that group. Similarly, cases judged not to be T4 clinically yet found to be T4 pathologically would be expected to have substantially worse prognosis. However, as seen in Table 8, differences in survival associated with these adjustments did not achieve statistical significance, although the mean survivals of these groups appear to match the hypothesis. It is noteworthy that the pT4 group had a better survival outcome than the pT3 group. The reason for this unexpected result is not clear. The number of cases within subgroupings limits the power of the comparisons. Although cT category had a lower P value for its association with overall survival, when the association between T category and disease-specific survival was evaluated, pT had a slight advantage (shown by more evenly distributed and wider separation between pT and cT category curves in Figure 3A and Figure 4A vs Figure 3B and Figure 4B).
Clinical nodal categorization is a simple process based primarily on the size of suspicious lymph nodes, combined with radiographic features such as loss of fatty hilum, central necrosis, or increased vascularity. (Note, again, that all cases were accrued and staged before the availability of positron emission tomographic and computed tomographic scanning.) None of the radiographic features is necessarily indicative of actual tumor involvement, and early nodal involvement may not be identified by these means. The decision of when to do a neck dissection was based on the clinical judgment of the participating surgeons. All were faculty members of major cancer centers, and it is expected that a similar rationale was used when a decision was made to dissect a cN0 neck (need for exposure of vessels for free tissue transfer; greater than 20% likelihood of occult nodal involvement on the basis of primary tumor site, stage, or thickness; history of previous treatment [neck dissection or radiotherapy]; availability of posttreatment adjuvant treatment options, etc). No neck dissection was performed in 82 cN0 cases. It is possible that bias was introduced in our results in that the dissected nodes would be selected for cases expected to have a greater likelihood of tumor involvement, thereby inflating the apparent value of neck dissection to upgrade nodal category by identifying occult disease.
Nevertheless, on the basis of the availability of more detailed information derived from surgical nodal sampling, a stronger association of pN than cN category with clinical outcome was demonstrated by the results. Comparison of cases in which nodal status changed from cN0 to pN+ (presence of occult nodal disease) with cases judged cN0 that were truly pN0 showed a significantly poorer outcome associated with pathological upgrade (P = .02; Table 8). Similarly, cases judged suggestive of metastatic disease on the basis of nodal size (cN+), yet found not to harbor tumor (pN0), had better survival than those judged rightly to have metastases by both methods (P = .01; Table 8). As a result, Kaplan-Meier survival curves with log-rank statistic show the pN category for the entire group to be more accurately associated with overall survival (Tables 5, 6, and 7; Figure 1 and Figure 2C vs D). Patel and Lydiatt,10 in their review of staging in head and neck cancer, stated that “the adverse effect of subclinical nodal disease has not been settled.”10 Our results indicate that accurate assessment of cN0 disease does contribute substantially to accurate prognostication.
The results of analysis of stage related to cause of death demonstrate that tumor stage affects death owing to disease, but not death from other causes or unknown cause (Figures 3 and 4). It may be implied that successful treatment for more extensive disease does not lead to earlier death owing to morbidity of treatment, nor is more extensive disease associated with higher comorbidity that would lead to earlier death. The independence of stage and comorbidity is also indicated in our multivariable model in which performance status at study entry (a rough measurement of comorbidity) and tumor stage are independent prognostic factors.
We hypothesized that inclusion of cases of recurrent cancer for which clinical stage remains that of index disease at initial presentation would introduce substantial inaccuracy. As such, it was expected that only newly diagnosed cases would have accurate clinical stage. However, our results indicate no substantial decrement in association of clinical stage with outcome when all cases—recurrent and newly diagnosed—were considered together. This could be because some recurrent cases were initially small, localized lesions amenable to simple salvage excision while others were extensive advancements from a small index lesion.
The results shown in Table 1 demonstrate the association with survival of several key clinical factors not included in the staging system. Not surprisingly, patients who experience significant weight loss before diagnosis and treatment and those with poorer performance status have a shorter median survival. It is interesting that smoking status did not correlate significantly with survival, whereas alcohol consumption had a strong prognostic value. The primary site of cases in our cohort was strongly associated with survival in ways that we have noted previously; particularly, cases of oropharyngeal cancer have a good prognosis, whereas those of the hypopharynx, which are most often associated with heavy smoking and drinking, have the poorest survival.
There are few reports of large cohorts of patients with HNSCC for which both clinical and pathological stage are available for comparison. De Waal and colleagues11 published retrospective data of neck nodal stage for 186 patients. Similar to our findings, the overall sensitivity of clinical staging was 80.1% compared with the standard criterion of pathological stage, whereas specificity was only 52.2%. There were occult nodal metastases in 32% of elective neck dissection specimens, making the specificity of intraoperative staging on the N0 neck 33.3% with sensitivity of 72.4%.
Our study is limited by the retrospective nature of determination of pathological staging. Cases were accrued from 18 institutions, each with its own approach to reporting surgical specimen results. Uniform processing and reporting protocols would have provided more reliable results and might have produced a clearer advantage for pathological T category. Another limitation is that our cases accrued before the regular availability and use of positron emission tomographic scanning. The improved sensitivity of positron emission tomography for occult nodal disease might improve the accuracy of clinical staging; however, specificity may be reduced.
There is often a substantial difference between pathological and clinical categorization, resulting in changes to overall staging 45% of the time (concordance is only 55%). However, the median survival for clinical and pathological staging is very similar, with nearly completely overlapping confidence intervals. Thus, one can reliably use either pathological or clinical stage to predict cancer-specific survival. There is not a major difference in the 2 staging methods as tools for prognostication. This is likely because of flaws in both systems, the plurality and complexity of factors involved in overall survival of a patient, and mutual cancellations of small changes in stage within a large cohort.
The pN category is a better individual predictor of overall survival than is pT, cT, or cN. The histologic assessment of lymph nodes provides accuracy in disease assessment that enhances outcome prediction. Nodal status as judged through the findings at neck dissection yields demonstrably more accurate predictive information for HNSCC, particularly when major corrections are made from N0 to N+ and N+ to N0.
Correspondence: Wayne M. Koch, MD, Department of Otolaryngology–Head and Neck Surgery, The Johns Hopkins University School of Medicine, 601 N Caroline St, Baltimore, MD 21287 (firstname.lastname@example.org).
Submitted for Publication: December 1, 2008; final revision received April 22, 2009; accepted April 26, 2009.
Author Contributions: Drs Koch, Ridge, and Forastiere had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: Koch and Forastiere. Acquisition of data: Koch, Ridge, and Manola. Analysis and interpretation of data: Koch, Ridge, and Manola. Drafting of the manuscript: Koch, Ridge, and Manola. Critical revision of the manuscript for important intellectual content: Ridge, Forastiere, and Manola. Statistical analysis: Manola. Obtained funding: Koch. Administrative, technical, and material support: Ridge and Forastiere. Study supervision: Koch.
Financial Disclosure: None reported.
Funding/Support: This study was coordinated by the Eastern Cooperative Oncology Group (Robert L. Comis, MD, Chair) and supported in part by Public Health Service grants CA23318, CA66636, CA21115, CA16116, and CA27525; grant R01 DE013152 from the National Institute of Dental and Craniofacial Research; and grants from the National Cancer Institute and the Department of Health and Human Services.
Disclaimer: The contents of this report are solely the responsibility of the authors and do not necessarily represent the official views of the National Cancer Institute.
Previous Presentation: This study was presented at the Seventh International Conference on Head and Neck Cancer; July 22, 2008; San Francisco, California.
This article was corrected online for typographical errors on 9/21/2009.
et al. TP53 mutations and survival in squamous-cell carcinoma of the head and neck. N Engl J Med
2552- 2561PubMedGoogle ScholarCrossref
American Joint Committee on Cancer, AJCC Cancer Staging Manual. 6th New York, NY: Springer-Verlag;2002;
P Nonparametric estimation from incomplete observations. J Am Stat Assoc
1958;53457- 481Google ScholarCrossref
J Asymptotically efficient rank invariant test procedures. J R Stat Soc Ser A
185- 206Google ScholarCrossref
DR Analysis of Binary Data. London, England: Methuen & Co, Ltd;1970;
AA Exact significance testing to establish treatment equivalence with ordered categorical data. Biometrics
819- 825PubMedGoogle ScholarCrossref
DR Regression models and life tables (with discussion). J R Stat Soc Ser B
187- 220Google Scholar
RJ A class of k-sample tests for comparing the cumulative incidence of a competing risk. Ann Stat
1141- 1154Google ScholarCrossref
H A new look at the statistical model identification. IEEE Trans Automat Contr
716- 723Google ScholarCrossref
WM Staging of head and neck cancers: is it time to change the balance between the ideal and the practical? J Surg Oncol
653- 657PubMedGoogle ScholarCrossref
S Pre- and intra-operative staging of the neck in a developing world practice. J Laryngol Otol
976- 978PubMedGoogle ScholarCrossref