Overall Survival Among Patients With De Novo Stage IV Metastatic and Distant Metastatic Recurrent Non–Small Cell Lung Cancer

This cohort study evaluates whether there is a survival difference between de novo stage IV and distant recurrent metastatic non–small cell lung cancer.

This supplemental material has been provided by the authors to give readers additional information about their work.Note: Highlighted variables were variables used as covariates in the primary analysis, and thus again included in the validation analysis.* De novo Stage IV is defined as patients who were initially diagnosed at Stage IV with metastatic lung cancer; Recurrent is defined as patients who developed metastatic disease after an initial diagnosis of Stage I-III lung cancer.
† Classification of 'other' race include, Native Hawaiian or Other Pacific Islander, and all other race groups.‡ Classification of 'Non-small cell' include all histology types other than adenocarcinoma, squamous cell carcinoma, and large cell carcinoma.† † Tumor burden was defined as the sum of the longest dimension of all cancerous lesions indicated on a radiology report at the time of distant metastasis; High tumor burden was defined as values above the 75 th percentile (i.e., 141.75mm), and low otherwise.** Good ECOG was defined as ECOG 0-1, Poor ECOG was defined as ECOG 2-4.
eMethods 1. Tumor burden was curated from radiology report nearest to the date of diagnosis for metastatic disease selected in the following priority: 1) CT Chest Abdomen Pelvis, 2) CT Abdomen Liver with IV Contrast Triphasic Chest Pelvis, 3) CT Chest + CT Abdomen Pelvis, 4) PET-CT, 5) CT Chest or CT Thorax, if more than one is present within 30 days before or after the date of diagnosis.Tumor burden is defined as the sum of the longest dimension of all cancerous lesions indicated on a radiology report at the time of distant metastasis. 17This measure was categorized into low versus high groups where it is considered high if above the 75th percentile (i.e., 141.75 mm) of the entire cohort and low if otherwise.
All other variables curated (e.g., driver mutations present, treatments for metastatic disease, and site of metastatic disease at the time of diagnosis of distant metastatic disease) were curated from oncology progress notes and their categories are listed in Supplementary Table S1.
eFigure 1. Overview of Validation Cohort At SHC, 547 patients with metastatic NSCLC who have molecular profiling were identified.Of these patients, 200 patients were randomly selected to undergo manual chart review.After data curation, the validation cohort consisted of 180 patients with confirmed distant metastasis.

Calculating Propensity Score
In a clinical study, propensity score is an estimated probability of a patient being in the exposed group given the covariates that represent the characteristics of the patient, as opposed to the unexposed group.The goal of the propensity score is to account for potential confounders, and it is most often applied in observational studies when randomization did not or cannot take place.In these studies, the exposed and unexposed groups may not be comparable due to inherent differences resulting in being in one group or the other.For example, a patient who received surgery (thus in the exposed group), may have undergone surgery because of better medical condition and prognosis.The goal of calculating propensity scores is to balance measured potential confounders and make the two groups comparable.
The propensity score is calculated using logistic regression.This is the probability of being in the exposed group given the covariates being balanced (Eqn. 1).The logit model is built with the exposure as the binary outcome and the potential confounders as covariates (Eqn.2).The model yields a probability of being in the exposed group for each subject in the study cohort.With this method, a large number of covariates can be reduced to a single probability.

𝑃(𝐷𝑖𝑠𝑡𝑎𝑛𝑡 𝑟𝑒𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 | 𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒𝑠) ----Eqn. 1
( ) =  +   1 +   2 + ⋯ ----Eqn. 2 In this study, the "exposed" group consists of patients with distant recurrence and the "unexposed" group consists of patients with de novo Stage IV metastatic Non-Small Cell Lung Cancer (NSCLC).In our propensity score model, we included both variables that are already adjusted for in the competing risk regression models (i.e., age of LC diagnosis, sex, race, histology of NSCLC, screening method and smoking status) and additional variables that may be potential confounders and were previously used in propensity score matching by other studies for overall survival after diagnosis of metastatic disease.These additional variables include comorbid disease history and marital status.Comorbid disease history is defined by presence of one or more of the following diseases: asthma, asbestosis, bronchiectasis, childhood asthma, chronic bronchitis, COPD, diabetes, emphysema, fibrosis of the lung, heart disease or heart attack, hypertension, pneumonia, sarcoidosis, silicosis, stroke, tuberculosis.Marital status comprises of either married/living together or not married/not living together.Those who are married were considered as the former category, and those who are divorced, separated, widowed or never married were considered the latter category.To estimate the propensity score, we fitted logistic regression using type of metastatic disease as an outcome variable and the above listed variables as independent variables.

Matching
Subjects with similar propensity scores are theoretically interchangeable.If two subjects have similar propensity scores but are in different exposure groups, then any difference in outcome could be attributed to the exposure.In our study, we have matched subjects in the distant recurrence group and subjects in the de novo metastatic group in a 1:1 ratio based on their propensity scores.After propensity score matching, we ensured that all the covariates are balanced using Chi-square test or Fisher's exact test (sparse variables) for categorical variables, and the t-test (parametric) or Man Whitney U test (non-parametric) for continuous variables.

Single imputation
Single imputation was performed by chain equations for missing data as the rate of missingness range from 2.2% to 23.3%, using the full set of variables curated through manual chart review.In addition to the variables listed in Supplementary Table S2, other variables included for single imputation are: presence of smoking history, method of detection for initial primary lung cancer (LDCT screening, symptom-based or incidental), index tumor size at initial diagnosis, presence of RET fusion mutation, presence of NTRK3 fusion mutation, presence of ROS1 fusion mutation, presence of ERBB2 mutation, presence of ALK mutation, presence of MET mutation, presence of BRAF mutation, treatment received for initial lung cancer (for patients with distant recurrence), distant metastasis to other sites not listed in Table S2, treatment received for advanced disease not listed in Table S2.
eTable  a Sample size excludes patients with regional recurrence (N=39) from the entire cohort (N=660).

eFigure 2 .
Distribution of Follow-Up Times (i.e., time from the diagnosis of metastatic disease to death or censoring) in overall cohort in NLST and by metastatic disease type.The overall cohort (N=660) has a median follow-up time of 0.5 years (IQR: 0.2 -1.3), in Panel A, and the different metastatic disease types each has a median follow-up time of 0.5 years (IQR: 0.2 -1.0) for patients with de novo metastatic disease, 0.7 years (IQR: 0.3 -1.6) for patients with distant recurrence and 0.7 years (IQR: 0.3 -2.1) for patients with regional recurrence, in Panel B. Difference in Cumulative Mortality Between Patients With Distant Recurrence and Patients With De Novo Metastatic Disease Among Those With Non-Small Cell Lung Cancer of the NLST Cohort (N = 621) a , in A, and the SHC cohort (N = 180), in B.

eFigure 4 .
Sensitivity Analysis to Evaluate Overall Survival 1 Difference Between Patients Who De Novo Metastatic Disease Versus Patients With Distant Recurrence Who Received Curative-Intent Therapy During Initial LC Diagnosis 1 Overall survival was calculated from the time of first distant metastatic disease.* P-value was calculated based on hazard ratio and confidence interval adjusted for age, sex, race, histology of primary lung cancer, smoking status and screening modality.eFigure 5. Sensitivity Analysis Using Propensity Score Matching to Evaluate Overall Survival 1 Difference Between Patients With Distant Recurrence vs De Novo Metastatic Disease in NLST (N=434).Survival Difference between Patients with De novo Metastases (N=217) andPatients with Distant Recurrence (N=217)1 Overall survival was calculated from the time of first distant metastatic disease.* P-value was calculated based on hazard ratio and confidence interval adjusted for age, sex, race, histology of primary lung cancer, smoking status and screening modality.

eTable 4 .
Sensitivity Analysis Using Propensity Score Matching (n=434) to Evaluate the Association Between Type of Metastatic Disease (Distant Recurrent vs De Novo) and Overall Survival Using an Unadjusted Cox Proportional Hazards Model.The propensity score based matched patient characteristics are shown in

eFigure 6 .
Sensitivity Analysis to Evaluate Overall Survival 1 Difference Between Patients With any Recurrence (Distant or Regional Recurrent) vs Patients With De Novo Metastatic Disease in the Overall NLST Cohort (N=660**).Survival Difference between Patients with De novo Metastases (N=392) and Patients with Any Recurrence (N=268) eFigure 7. Distribution of Follow-Up Times (ie, Time From the Diagnosis of Metastatic Disease to Death or Censoring) in Overall SHC Validation Cohort and by Metastatic Disease Type.The overall validation cohort (N=180) has a median follow-up time of 1.3 years (IQR: 0.4 -2.9), in Panel A, and the different metastatic disease types each has a median follow-up time of 1.0 years (IQR: 0.4 -2.6) for patients with de novo metastatic disease, and 2.1 years (IQR: 1.0 -3.1) for patients with distant recurrence, in Panel B. A B

eTable 1 .
Characteristics of the SHC Validation Cohort Grouped by Type of Metastatic Disease Characteristics Post-Propensity Score Matching Grouped by Type of Metastatic Disease in the NLST Cohort eFigure 2. Distribution of Follow-Up Times eFigure 3. Difference in Cumulative Mortality Between Patients With Distant Recurrence and Patients With De Novo Metastatic Disease Among Those With Non-Small Cell Lung Cancer of the NLST Cohort eFigure 4. Sensitivity Analysis to Evaluate Overall Survival 1 Difference Between Patients With De Novo Metastatic Disease Versus Patients With Distant Recurrence Who Received Curative-Intent Therapy During Initial LC Diagnosis eTable 3. Sensitivity Analysis Using Multivariable Cox Regression to Evaluate Association Between Type of Metastatic Disease and Overall Survival Comparing Patients With De Novo Stage IV Disease to Subgroup of Distant Recurrent Patients Who Received Curative-Intent Therapy (ie, Surgery or Radiation) in the NLST Primary Cohort and in the SHC Cohort.eFigure 5. Sensitivity Analysis Using Propensity Score Matching to Evaluate Overall Survival 1 Difference Between Patients With Distant Recurrence vs De Novo Metastatic Disease in NLST eTable 4. Sensitivity Analysis Using Propensity Score Matching to Evaluate the Association Between Type of Metastatic Disease (Distant Recurrent vs De Novo) and Overall Survival Using an Unadjusted Cox Proportional Hazards Mode eFigure 6. Sensitivity Analysis to Evaluate Overall Survival 1 Difference Between Patients With any Recurrence (Distant or Regional Recurrent) vs Patients With De Novo Metastatic Disease in the Overall NLST Cohort eTable 5. Sensitivity Analysis to Evaluate the Association Between any Recurrent (i.e., Regional or Distant Recurrence) vs De Novo Metastatic Disease and Overall Survival in the NLST Cohort eFigure 7. Distribution of Follow-Up Times (ie, Time From the Diagnosis of Metastatic Disease to Death or Censoring) in Overall SHC Validation Cohort and by Metastatic Disease Type eTable 6. Multivariable Cox Regression to Evaluate Association Between Type of Metastatic Disease (De Novo vs Distant Recurrent) and Overall Survival in the Validation SHC Cohort, Among Patients Who Received Curative Therapy at Initial LC Diagnosis, Additionally Adjusting for Variables That Were Associated With Type of Metastatic Disease in Univariate Analysis Characteristics of the SHC Validation Cohort Grouped by Type of Metastatic Disease eTable 1. Abbreviations.SHC, Stanford Healthcare; IQR, Interquartile Range; SD, Standard Deviation; ECOG, Eastern Cooperative Oncology Group.

Data Curation for Stanford Healthcare Cohort
Data was manually curated by a senior graduate student in collaboration with a medical oncologist.200patients were randomly selected from the initial cohort of 547 patients with metastatic disease and molecular profiling at SHC. Patients were defined to have "De novo Stage IV" metastatic disease if they were diagnosed at Stage IV and to have "Distant Recurrent" metastatic disease if they had recurrence to pleura, contralateral lung, skin, adrenal, bone, liver, brain and/or any other distant organs outside the chest at the time of diagnosis for distant metastatic disease.All data collected was manually collected in RedCap.The Stanford University IRB approved this study, and a waiver of informed consent was received as the study presented a minimal risk.ECOG performance status was obtained from the recorded value in an oncology progress note at the time of distant metastatic disease.This variable is recoded into a binary variable of good versus poor performance status groups, where ECOG 0-1 was considered good performance status while ECOG 2-4 was considered poor performance status.
Method of detection for initial primary lung cancer was curated based on clinical history in progress notes.It is defined as 'surveillance' if detected on a CT scan for routine follow-up or for assessing treatment response for those who eventually developed distant recurrence or if detected on a low-dose CT screening for those who eventually developed de novo disease, 'incidental' if detected on a CT scan not indicated for cancer work-up, and 'symptom-based' if detected on interval CT scan/LDCT screening primarily based on symptomatic complains.
2. Characteristics Post-Propensity Score Matching Grouped by Type of Metastatic Disease* in the NLST Cohort De novo Stage IV is defined as patients who were initially diagnosed at Stage IV with metastatic lung cancer; Recurrent is defined as patients who developed metastatic disease after an initial diagnosis of Stage I-III lung cancer.†Classification of 'other' race include American Indian or Alaskan Native, Native Hawaiian or Other Pacific Islander, More than one race.‡ Classification of 'Not married/Not living together' include never married, divorced, separated, widowed.§ Classification of 'Non-small cell' based on ICD-O-3 code 8046 referring to non-small cell carcinoma not further specified to be adenocarcinoma, squamous cell, or one of the other more specific non-small cell categories.† † Smoking status is the status at time of randomization in NLST. *

Table S2 ,
and the relevant methods are described in Supplementary Methods 2. De novo Stage IV is defined as patients who were initially diagnosed at Stage IV with metastatic lung cancer; Recurrent is defined as patients who developed metastatic disease after an initial diagnosis of Stage I-III lung cancer.