Kaplan-Meier survival curves for patients with gastrointestinal stromal tumors. A, Tumor size for patients without metastatic disease. B, Tumor grade for patients without metastatic disease. C, Nodal status for all patients. D, Metastatic stage. For an explanation of grades, see the “Methods” section.
Kaplan-Meier survival curves for patients with gastrointestinal stromal tumors (GISTs). A, TG classification for patients without metastatic disease. B, TGM classification for all patients. C, TGM classification for patients diagnosed as having GISTs in 2000 or later. D, GM classification for patients diagnosed as having GISTS in 2000 or later.
Woodall CE, Brock GN, Fan J, Byam JA, Scoggins CR, McMasters KM, Martin RCG. An Evaluation of 2537 Gastrointestinal Stromal Tumors for a Proposed Clinical Staging System. Arch Surg. 2009;144(7):670-678. doi:10.1001/archsurg.2009.108
A gastrointestinal stromal tumor (GIST) staging system can be created with the Surveillance, Epidemiology and End Results (SEER) database.
A review of records in the SEER database from 2537 patients with GISTs from June 1, 1977, through August 1, 2004.
Patients and Methods
Patients were compared using all available clinicopathologic factors, and a TGM (tumor, grade, metastasis) staging system was created according to these parameters. Survival data were analyzed using Kaplan-Meier methods, log-rank analyses, and Cox regression models.
Median follow-up time was 21 months, 47.6% of patients were men, and the median age was 64 years; 5.0% had lymph node involvement, and 22.6% had distant metastasis. Tumor size (T1, ≤70 mm; T2, >70 mm; P <.001), grade (G1, grades I and II; G2, grades III and IV; P <.001), and the presence of metastases (M0, no; M1, yes; P <.001) did affect overall survival. When combined in a TGM staging system, grade and metastasis were the factors most predictive of survival.
A staging system for GISTs that provides valuable prognostic information was developed. Further work to refine this system and validate it with other data sets should be undertaken. Mitotic index and standardized reporting may provide additional prognostic information and should be recorded for all tumors so that the most accurate staging system can be created.
Gastrointestinal stromal tumors (GISTs) are the most common mesenchymal tumor of the gastrointestinal tract,1 and they represent a recently defined entity comprising most tumors previously labeled leiomyomas, leiomyosarcomas, and leiomyoblastomas, and some previously diagnosed as schwannomas or neurofibromas.2 Research remained limited mostly to pathologic studies until the late 1990s, when data demonstrated that gain-of-function mutations of the c-Kit gene were seen in the vast majority of GISTs.3 The creation of a standardized definition allowed estimates of incidence for the first time.4,5 Based largely on epidemiologic studies from other countries,6- 8 the incidence of GISTs in the United States has been estimated to be 3300 to 6000 new cases per year,9 Determining exact numbers of cases is difficult because the varying biological behavior of GISTs likely leads to underreporting of lesions thought to be benign or asymptomatic. Of interest, some studies that include autopsy data have shown that 10% to 35% of stomachs contain GISTs 1 to 10 mm in size.8,10,11
Despite the interest in the literature regarding GISTs, a relative paucity of data exist regarding a clinically meaningful staging system. Although not widely accepted, a TGM (tumor, grade, metastasis) system, similar in design to the TNMG (tumor, node, metastasis, grade) soft-tissue sarcoma staging system now used by the American Joint Committee on Cancer (AJCC), was first proposed in 1992,12 providing guidance for the potential design of a GIST staging system. Subsequent to this, 2 large-scale risk stratification systems predictive of outcomes have recently been proposed,2,5,13,14 as have refinements to each.15,16 However, these schemes differ from the format of the AJCC TNM staging systems,17 the currently accepted standard of tumor staging. Given the lack of standardization in reporting for GISTs, along with recent advances in therapy, a clinically valid staging system that provides valuable prognostic information is needed.
We hypothesized that the Surveillance, Epidemiology and End Results (SEER) database (http://seer.cancer.gov/registries/index.html) would provide sufficient data to generate a prognostic staging system for GISTs. The data from the SEER database were used to create a staging system that stratifies patients according to readily available variables and provides clinically meaningful prognostic information that might allow health care providers to predict outcomes for patients with GISTs.
The SEER database was queried to find all cases of GIST as identified by International Statistical Classification of Diseases and Related Health Problems (ICD-0-3) code 8936/3, gastrointestinal stromal sarcoma/malignant, and an extended ICD-0-3 International Classification of Childhood Cancer site recode of XII(a.1), gastrointestinal stromal tumor.18 All records in the current database were included, and year of diagnosis ranged from 1978 to 2004.
Clinicopathologic factors were analyzed to determine the effect on overall survival using the log-rank test.19 Multivariate analysis was performed for statistically significant variables using the Cox proportional hazards model.20 All significant variables from univariate analysis were initially added to the multivariate model, and nonsignificant variables with P values greater than .05 were removed in a stepwise fashion.
For each patient with a primary tumor and no metastatic disease at the time of diagnosis, T category and G category were defined. T category was defined by tumor size, and patients were dichotomized based on the cutoff point that gave the greatest separation in survival, 70 mm. Tumors 70 mm or smaller in diameter were staged as T1, and those greater than 70 mm were staged as T2. In a similar fashion, G category, based on histologic grade, was defined as G1 for grade I (well-differentiated) or II (moderately differentiated) tumors. Grade III (poorly differentiated) and IV (undifferentiated) tumors constituted the G2 group. Individuals with presence of metastatic (including lymph nodes) disease at the time of diagnosis were classified as M1, and those without were classified as M0. Kaplan-Meier curves were used to visualize survival for the groupings based on the T, G, and M stages, with differences between groups tested using the log-rank test.21
The T, G, and M categories were then combined in an attempt to form a standard 4-tier staging system for GISTs. Nodal status was not included as a separate factor in the proposed staging system because it was not significant in the multivariate statistical model. However, because of the low propensity for regional lymph node disease with GISTs and the poor prognosis this portends in other sarcomatous tumors, we evaluated whether defining patients with positive nodal involvement as having M1 disease improved the staging model. The T and G categories were combined with the M category to form an overall staging system.
The predictive ability of the individual T, G, and M categories were compared with the final TGM staging system using measures of predictive accuracy appropriate for survival data. Measures of prognostic separation,22 explained variation,23 and time-dependent receiver operating characteristic (ROC) curves24 were calculated. Prognostic separation (D) in essence measures the separation of the Kaplan-Meier survival curves defined by the risk groups, with larger values indicating greater separation, and, thus, differentiation, between prognostic groups. Explained variation (V and VW) gives the proportion of variation in survival that is accounted for by the staging system, with values closer to 1 indicating a greater predictive ability of the risk grouping. Time-dependent ROC curves plot the sensitivity and specificity calculated at specified times, with individuals still alive at time t serving as the control group and those not alive at time t serving as the case group. Time-dependent ROC curves were calculated at 5 and 8 years, and the area under the ROC curve was presented as a summary measure that gives the concordance of the risk grouping with the survival outcome at those times. The bootstrap percentile method was used to calculate 95% confidence intervals (CIs) for all the measures in each case, using 1000 bootstrap replicates.25 The R package survey26 was used for calculation of the V and VW measures, package survival ROC27 was used for calculation of the time-dependent ROC curves and area under the curve values, and the boot package28 was used for all bootstrap calculations. All statistical analyses for this study were completed using R statistical software, version 188.8.131.52
Clinicopathologic characteristics of the patient population are displayed in Table 1. Only 18.8% were diagnosed before 2000; therefore, 81.2% of patients were diagnosed as having a GIST in 2000 or later, and 81.1% underwent surgical treatment. Most primary GISTs were in the stomach (48.2%), with the small intestine having the second greatest number (29.4%). The mean and median sizes of the primary tumors in patients without metastatic disease were 88.8 and 73.0 mm, respectively (range, 1.0-45.0 mm). A large percentage of patients (22.6%) had distant metastases at the time of diagnosis, indicating that many patients with missing lymph node data might already have had other metastatic disease. Most tumors were moderately differentiated (11.8%); however, a large percentage (69.1%) did not have differentiation recorded because there is no standard for pathologic recording. The mean duration of follow-up was 26.7 months (median, 21 months; range, 0-303 months).
Multiple clinicopathologic characteristics were analyzed for their effect on overall survival (Table 2 and Table 3). Age older than 65 years, sex, year of diagnosis before 2000, surgical treatment, previous malignant disease, size of the primary tumor, degree of differentiation, lymph node involvement, and distant metastatic involvement were highly significant on univariate analysis (P <.001) (Table 2). For the multivariate model, age older than 65 years, year of diagnosis before 2000, surgical treatment, size of tumor, tumor grade, and metastatic disease remained significant (Table 3).
Statistically significant variables from our analysis were used to stratify patients and develop a staging system. Various cutoff points of tumor size (20, 50, 70, and 100 mm) for patients without metastatic disease were evaluated and compared via log-rank tests, and the cutoff point of 70 mm gave the greatest separation in survival curves (hazard ratio [HR], 1.62; 95% CI, 1.36-1.94; P <.001) (Figure 1A). These points were selected based on their use in the National Institutes of Health (NIH) and Armed Forces Institute of Pathology (AFIP) systems (20, 50, and 100 mm), as well as an approximation of the median nonmetastatic tumor size (70 mm). Differences in survival based on tumor grades I and II vs III and IV for patients without metastatic disease were also highly significant (HR, 3.73; 95% CI, 2.88-4.83; P <.001) (Figure 1B). Lymph node status was highly significant in univariate analysis (HR, 2.6; 95% CI, 1.96-3.46; P <.001) (Figure 1C) but was not significant in multivariate analysis. Thus, incorporation of lymph node status as an independent factor was not helpful in development of the staging system. However, nodal involvement was included as M1 disease in the final model. M classification was clearly significant in univariate and multivariate analyses (HR, 3.42; 95% CI, 2.93-4.00; P <.001) (Figure 1D).
The combinations of tumor size (T1 vs T2) and grade (G1 vs G2) in patients without metastatic disease were compared using the log-rank test to form groups with different survival rates. Among the 4 possible combinations, only the T1G2 and T2G2 groups did not differ significantly from each other. The survival curves of the 3 TG groupings are displayed in Figure 2A. We used patients with M1 and N1 tumors to define stage IV disease. Defining stage IV disease as M1 category alone vs M1 or N1 did not alter the risk stratification curves significantly; hence, we elected to define stage IV as either M1 or N1 status, which is in line with other soft-tissue sarcomas. Thus, the final staging system was determined as follows: stage I, T1G1M0; stage II, T2G1M0; stage III, T(any)G2M0; and stage IV, T(any)G(any)M1. Because nodal involvement was included as M1 disease, the system was designated a TGM staging system. The proposed staging system for GISTs is displayed in Table 4, along with HRs and CIs for each stage relative to stage I. In a multivariate model including age, surgical treatment, and diagnosis before and after 2000, the HRs associated with each stage changed only slightly, except the one associated with stage IV (HR, 5.6; 95% CI, 3.6-8.7). Survival differences between stages were statistically significant (P <.001) (Figure 2B). Stage I disease was found in 187 patients, stage II in 153, stage III in 196, and stage IV in 639. Five-year survival rates were 82.4%, 74.1%, 50.5%, and 32.4% for stages I, II, III, and IV, respectively.
Evaluation of the proposed TGM staging system using only patients diagnosed in year 2000 or later revealed that disease stages I and II were not statistically different in this patient group (P = .76) (Figure 2C). Because stages I and II are characterized by differences in tumor size (T1 vs T2), for patients with G1M0 tumors we modified our staging system to include only 3 stages defined by tumor grade and metastasis (G1M0, G2M0, and G[any]M1) (Table 5). Again, because patients with N1 and M1 tumors are grouped together, we refer to this as a GM staging system. Of 2061 patients diagnosed during or after the year 2000, stage I disease was found in 245 patients, stage II in 145, and stage III in 517 (1154 patients had missing values). Overall differences in survival rates among the 3 stages were highly significant (P <.001). Three-year survival rates were 87.7%, 59.6%, and 48.7% for stages I, II, and III, respectively (Table 6).
Given the increasing interest in GISTs, there is a need for standardization of a staging system. Staging systems for most tumors, including uncommon ones such as soft-tissue sarcoma, currently exist.20 Accurate staging of cancer allows standardized reporting of disease and treatment outcomes. Staging helps identify those patients most likely to benefit from therapy and can aid in determining the need for follow-up surveillance. Because of the rarity of GISTs and, therefore, the lack of large-scale numbers of specimens and patients, we elected to use the SEER database as a source of clinical, pathologic, and survival data to determine the usefulness of the proposed TGM clinical staging system.
Ideally, cancer staging systems should be simple yet accurate. Although there is, to date, no widely accepted staging system for GISTs, a classification system representing a consensus opinion13 was put forth at an NIH-sponsored meeting in 2001. This system differentiates, based on size and mitotic index, between very low–, low-, intermediate-, and high-risk lesions. Since it was initially put forth, the NIH model has been shown to be accurate in predicting biological characteristics of tumors,30 but some have found that it better differentiates tumor aggressiveness as a 2-tier, high risk vs all others, system.8,31,32 Other authors have combined the very low– and low-risk groups and further stratified the high-risk group to enhance the predictive probability.15
A second proposed system, offered by the AFIP, further differentiates risk based not only on size and mitotic rate but also between gastric vs nongastric primary tumors.2,4,5,14 This system has not been as widely adopted, possibly because it is a more complicated 4-tier system with 15 subcategories. The NIH and AFIP systems incorporate size and mitotic rate but do not address the role of metastases, a significant predictor of survival in other studies,33 including one based on a SEER data set similar to that used in the current report.34 Although predictive, both systems, as well as their proposed modifications, differ significantly from the AJCC-preferred35 TNM format that is commonplace in oncology.
Primary tumor size is a well-known component of sarcoma staging systems. In the current system, tumor grade and presence of metastatic disease were more predictive of outcome than size. T1 tumors were defined as 70 mm or smaller and T2 were larger than 70 mm. Although significantly larger than the cutoff point for a “small” tumor in the NIH and AFIP systems, 70 mm was similar to the median tumor size in this cohort and was within the range of other authors' published data,9,36 confirming its relevance. It was also the point most predictive of outcome. In addition, a single cutoff point allowed for a simplified staging system with only 2 tumor size subclasses.
Grade in GISTs has been largely replaced by mitotic rate, as defined by the number of mitoses per 50 high-power fields. Mitotic rate is known to be predictive of recurrence37 and survival.38 It is generally discussed as fewer than 5, 5 to 10, and more than 10 mitoses per 50 high-power fields for low-, intermediate-, and high-risk groups, respectively.2,5,13,14 However, valid limitations, such as the lack of a standardized definition of 1 high-power field, and some confusion regarding mitotic index count thresholds have been suggested by others.39 When the SEER database was created, the role of mitotic index had not yet been realized; therefore, the data set did not include data related to mitotic index. Data on grade were available and were therefore used as a surrogate because grade had been shown to be a predictor of outcome in GISTs in earlier studies12,40,41 and differentiates well between low- and high-risk lesions. In fact, some authors found it to be the major prognostic determinant of survival in patients undergoing complete surgical extirpation.42 Grade continues to be used as a significant predictor of outcome for staging non-GIST soft-tissue sarcomas.
From a histologic standpoint, mitotic rate is incorporated as one criterion for differentiating low-grade from high-grade tumors. Other findings factored into final grade include differentiation, cellularity, and necrosis. Over time, calls from the pathology community for the standardization of sarcoma grading43 have been addressed.44 Despite the heterogeneity of sarcomas, and the knowledge that no system is ideal for every tumor in the group, identifying tumor grade seems to be reproducible.45 At present, sarcoma grading defined as either low (grades I and II) or high (grades III and IV) is incorporated into the AJCC soft-tissue sarcoma staging system and was used in the past to more precisely predict gastrointestinal mesenchymal tumor behavior.2,7,13 Like these past studies, the proposed system uses a dichotomous low vs high tumor grading scheme.
Patients with lymph node metastasis, although rare in this data set (5.0%), were analyzed separately initially but included in the M1 group in the final staging system. Although it was significant in univariate analysis, nodal involvement was not significant in the multivariate model and did not improve the ability to predict survival as a separate factor in the TGM staging system. However, it did improve predictive capability when included in the M category.
This data set contains records from only those patients considered to have malignant lesions, which may reflect some patient selection bias. There is likely no truly benign GIST,31,46 but rather a continuum of lesions with varying potential to recur and/or metastasize. Most authors suggest prolonged surveillance, even for low-risk lesions.13,30 The data from this system would suggest that even patients with stage I disease, a group that would include most unreported “benign” lesions, warrant appropriate surveillance for recurrence or metastasis. But inclusion of these additional low-stage patients may result in stage migration, which may impede the effectiveness of this system in predicting survival. Therefore, more work needs to be done in terms of staging lesions thought to be benign.
The present system uses grade as a surrogate for the more commonly discussed mitotic index. Although the determination of grade includes mitotic index as one of its components, this might be less sensitive than mitotic index in staging GISTs.47,48 Historically, grade has been used when discussing the biological aggressiveness of GISTs. One potential drawback for using traditional sarcoma parameters is that the mitotic rate thresholds are set too high and, therefore, are not as sensitive for GISTs. However, grade did perform well, both alone and in combination with tumor size in predicting survival in this system. Perhaps future studies can address this by incorporating mitoses.
Like mitotic index, the SEER database presently does not include documentation of c-Kit status for any of the tumors used to create this staging system. Tumors included in this study were entered into the SEER database as malignant GISTs by ICD-0-3 code. Given the evolution of this diagnosis since the SEER database was created, it is feasible that some non-GIST mesenchymal tumors make up a small percentage of the data used for this staging system.
Creation of a staging system for GISTs using the SEER database was feasible, and the proposed system seems to stratify patients effectively. Future modification of this scheme by incorporating mitotic rate may enhance its predictive power. Further research using independent data sources should be considered to refine and validate this proposed staging system.
Correspondence: Robert C. G. Martin II, MD, PhD, Division of Surgical Oncology, Department of Surgery, University of Louisville, Norton Healthcare Pavilion, 315 E Broadway, Ste 311, Louisville, KY 40202 (firstname.lastname@example.org).
Accepted for Publication: January 30, 2009.
AuthorContributions: Drs Woodall, Brock, Byam, Scoggins, and Martin had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: Woodall, Scoggins, McMasters, and Martin. Acquisition of data: Woodall, Byam, and Scoggins. Analysis and interpretation of data: Woodall, Brock, Fan, McMasters, and Martin. Drafting of the manuscript: Woodall, Brock, Fan, Byam, McMasters, and Martin. Critical revision of the manuscript for important intellectual content: Woodall, Brock, Scoggins, McMasters, and Martin. Statistical analysis: Woodall, Brock, and Fan. Administrative, technical, or material support: Byam, Scoggins, McMasters, and Martin. Study supervision: Scoggins, McMasters, and Martin.
Financial Disclosure: None reported.
Previous Presentation: This paper was presented at the 116th Annual Meeting of the Western Surgical Association; November 11, 2008; Santa Fe, New Mexico; and is published after peer review and revision. The discussions that follow this article are based on the originally submitted manuscript and not the revised manuscript.
Mark Talamonti, MD, Evanston, Illinois: Gastrointestinal stromal tumors, or GISTs, are relatively rare tumors with an annual incidence of approximately 5000 to 6000 new cases. Yet, there are several important reasons why these tumors have warranted such recent scrutiny and study. The recent development of clinically effective inhibitors, such as imatinib mesylate, better known as Gleevec [Novartis, Basel, Switzerland], targeting the mutated receptor unique to this specific tumor ushered in the era of molecularly targeted cancer therapy. So why is it important for surgeons to study these tumors? Because they represent the paradigm by which cancer therapy will be developed in the future for much more common tumors, such as lung, breast and colon cancers.
And why is it important for surgeons to be involved in the development of staging systems for such relatively rare tumors? Because these new targeted therapies provide an opportunity to study adjuvant or neoadjuvant therapy for localized disease and, indeed, formed the basis for 2 adjuvant trials initiated by the American College of Surgeons’ Oncology Group for these very tumors. Yet, the pricing of new cancer drugs in combination with limited health care resources and our current economic downturn mandate the selective and judicious use of these expensive therapies in those patients most likely to benefit from them, and thus the relevance and importance of papers like this one.
I have 3 questions for the authors. You have used the large SEER database to generate a prognostic staging system for GIST, and your proposed staging system is significantly more straightforward than 2 currently used systems developed by the NIH and the AFIP for GIST risk stratification. But many variables found to be important in other staging systems and institutional series such as tumor location, tumor rupture, and margin status were not studied or were not significant in your database. I suspect that is because you used overall survival as the outcome metric instead of recurrence-free survival. When limiting your patient base to the modern Gleevec era, do you not think it might be more appropriate to determine the significance of these clinical and pathologic variables on the effect until first recurrence rather than overall survival, since we know overall survival will be effected by subsequent treatment with Gleevec? Have you run the same statistical models using time to first recurrence as your outcome metric rather than time to death and, if so, were there differences?
Similarly, my second question has to do with the inclusion of patients who may have undergone subsequent salvage therapy with repeat resections. Limited recurrences with GISTs can sometimes undergo subsequent operations which may extend survival and perhaps even offer reasonable chances for cure. These patients may confound the determinacy of prognostic significance of variables at initial presentation, and I would like to know if they were screened in your initial inclusion criteria. Additionally, it would confirm the relevancy of your findings if these same variables had prognostic significance in this repeat resection group.
Finally, your results confirm that patients with metastases do worse than those without, that size matters but probably not that much, and that tumor biology as reflected by tumor grade is probably the most important determinant of outcomes in patients with localized disease. In your presentation and the article, the distinction between tumor differentiation and tumor grade is a bit ambiguous, and the use of tumor grade as a surrogate for more objective measures of biological aggressiveness such as cellularity, necrosis, vascular invasion, and mitotic index is a bit of a projection. In a prospective and to-be-determined staging system, what biological factors would you propose as ones that should be included in determining the inherent biological aggressiveness of these tumors?
Dr Martin: We have seen an explosion of papers relating to GISTs, and, obviously, I do not believe that it is because of an explosion of the actual disease but simply because we are now more accurately diagnosing this disease. We have been misdiagnosing it or, more importantly and probably more dangerously, calling it a benign tumor for a long period of time. I also want to echo Dr Talamonti's comments in that we must be the leaders in the staging of this disease. With the expanded use of Gleevec, there has been considerable use and possibly overuse for this disease, especially in patients with very early stage disease, with no long-term outcome data.
To answer your questions, all staging systems based on AJCC staging are related to overall survival. I echo your belief that recurrence-free survival is of utmost importance. However, that does not relate to the overall survival. Be that as it may, though, overall survival and the prognostic factors related to that can be utilized in reverse as a surrogate marker for relapse. Patients who underwent salvage procedures were excluded.
Last, your question about grade. Without a doubt there are limitations to this staging system. It is oversimplified because of the lack of pathologic standardization. It has been well established that grade and mitotic index play a role in increasing the risk of overall recurrence and decreased survival. Current staging systems prospectively should include both grade differentiation and mitotic index.