A, Predicted probability of lung cancer according to the smoking risk prediction model based on age in years and smoking history. The rug plot shows the observed distribution of age in the validation study (European Prospective Investigation into Cancer and Nutrition [EPIC] and NSHDS, ever smokers). B, Predicted probability of lung cancer according to the integrated risk prediction model based on the biomarker score and the smoking history. The rug plot shows the observed distribution of the biomarker score in the validation study (EPIC and NSHDS, ever smokers). The vertical lines correspond to the quartiles threshold for biomarker score among controls (Q1, Q2, Q3, and Q4).
The validation samples consist of EPIC and NSHDS ever-smoking participants who received a diagnosis of lung cancer within 1 year after blood collection. For the controls, the size of the points is proportional to the number of eligible participants represented (corresponding to the inverse of the sampling probability). The right panel represents a magnified excerpt of the full figure.
A, ROC curve analysis in the validation study (EPIC and NSHDS ever-smoker participants who received a diagnosis of lung cancer within 1 year after blood collection) for 2 risk prediction models: a model that used smoking variables only (smoking) and an integrated model with the smoking variables and the biomarker score combined (smoking + biomarkers). AUC indicates area under the curve; USPSTF, US Preventive Services Task Force. The horizontal dashed line indicates sensitivity and the vertical dashed line, specificity. B, Sensitivity and specificity in relation to the probability of lung cancer within 1 year predicted by the integrated model.
eMethods. Supplementary Methods
eResults. Supplementary Results
eTables 1 through 10. Supplementary Tables
eFigures 1 through 10. Supplementary Figures
Customize your JAMA Network experience by selecting one or more topics from the list below.
Integrative Analysis of Lung Cancer Etiology and Risk (INTEGRAL) Consortium for Early Detection of Lung Cancer. Assessment of Lung Cancer Risk on the Basis of a Biomarker Panel of Circulating Proteins. JAMA Oncol. 2018;4(10):e182078. doi:10.1001/jamaoncol.2018.2078
Can a risk prediction model based on circulating protein biomarkers improve on a traditional risk prediction model for lung cancer and the current US screening criteria?
In a validation study of 63 ever-smoking patients with lung cancer and 90 matched controls, a biomarker-based risk prediction model consisting of 4 protein markers that was developed in a cohort of US individuals at high risk of lung cancer outperformed a model based on smoking history alone when blindly validated using prediagnostic samples from 2 European cohorts.
Biomarker-based risk profiling has the potential to improve eligibility criteria for lung cancer screening.
There is an urgent need to improve lung cancer risk assessment because current screening criteria miss a large proportion of cases.
To investigate whether a lung cancer risk prediction model based on a panel of selected circulating protein biomarkers can outperform a traditional risk prediction model and current US screening criteria.
Design, Setting, and Participants
Prediagnostic samples from 108 ever-smoking patients with lung cancer diagnosed within 1 year after blood collection and samples from 216 smoking-matched controls from the Carotene and Retinol Efficacy Trial (CARET) cohort were used to develop a biomarker risk score based on 4 proteins (cancer antigen 125 [CA125], carcinoembryonic antigen [CEA], cytokeratin-19 fragment [CYFRA 21-1], and the precursor form of surfactant protein B [Pro-SFTPB]). The biomarker score was subsequently validated blindly using absolute risk estimates among 63 ever-smoking patients with lung cancer diagnosed within 1 year after blood collection and 90 matched controls from 2 large European population-based cohorts, the European Prospective Investigation into Cancer and Nutrition (EPIC) and the Northern Sweden Health and Disease Study (NSHDS).
Main Outcomes and Measures
Model validity in discriminating between future lung cancer cases and controls. Discrimination estimates were weighted to reflect the background populations of EPIC and NSHDS validation studies (area under the receiver-operating characteristics curve [AUC], sensitivity, and specificity).
In the validation study of 63 ever-smoking patients with lung cancer and 90 matched controls (mean [SD] age, 57.7 [8.7] years; 68.6% men) from EPIC and NSHDS, an integrated risk prediction model that combined smoking exposure with the biomarker score yielded an AUC of 0.83 (95% CI, 0.76-0.90) compared with 0.73 (95% CI, 0.64-0.82) for a model based on smoking exposure alone (P = .003 for difference in AUC). At an overall specificity of 0.83, based on the US Preventive Services Task Force screening criteria, the sensitivity of the integrated risk prediction (biomarker) model was 0.63 compared with 0.43 for the smoking model. Conversely, at an overall sensitivity of 0.42, based on the US Preventive Services Task Force screening criteria, the integrated risk prediction model yielded a specificity of 0.95 compared with 0.86 for the smoking model.
Conclusions and Relevance
This study provided a proof of principle in showing that a panel of circulating protein biomarkers may improve lung cancer risk assessment and may be used to define eligibility for computed tomography screening.
The National Lung Screening Trial (NLST) findings suggested that screening with low-dose computed tomography (LDCT) can reduce lung cancer mortality.1 As a result, the US Preventive Services Task Force (USPSTF) recommends LDCT screening for lung cancer among individuals aged 55 to 80 years who have smoked 30 pack-years with up to 15 years since quitting smoking.1,2 However, LDCT screening results in a large number of indeterminate nodules,1 and less than 50% of incident lung cancer cases are among individuals who are eligible for screening.3 Biomarkers may improve lung cancer risk assessment over and beyond traditional smoking-based risk models and improve current screening eligibility criteria.4,5
Previous studies have shown that the precursor form of surfactant protein B (Pro-SFTPB) is predictive of lung cancer risk.5,6 Other markers that have been shown to be useful for the workup and diagnosis of lung cancer include cancer antigen 125 (CA125), cytokeratin-19 fragment (CYFRA 21-1), carcinoembryonic antigen (CEA), and human epididymis protein 4 (HE4).7-12 However, there are limited data regarding the performance of these markers in discriminating between future lung cancer cases and controls.
This study aimed to assess the potential of these 5 protein biomarkers to inform about lung cancer risk when tested blindly using prediagnostic samples.
A full account of the methods is provided in the eMethods in the Supplement. In brief, samples obtained from ever-smoking patients with lung cancer (cases) diagnosed within 1 year after blood collection (n = 108) and smoking-matched controls (n = 216) from the US Carotene and Retinol Efficacy Trial (CARET) cohort were used to develop a biomarker score based on circulating measures of Pro-SFTPB, CA125, CEA, HE4, and CYFRA 21-1 using logistic regression. All study participants gave written informed consent to participate in the study, and the research was approved by the institutional review boards of all of the participating institutions.
The extent to which the biomarker score improved discrimination of incident lung cancer cases and controls was validated externally using ever-smoking patients with lung cancer (cases) diagnosed within 1 year after blood collection (n = 63) and matched controls (n = 90) from the European Prospective Investigation into Cancer and Nutrition (EPIC) study and the Northern Sweden Health and Disease Study (NSHDS) (eFigure 1 in the Supplement). Absolute 1-year risks of lung cancer were estimated for each study participant in the validation study by modeling the cumulative hazards of lung cancer using flexible parametric survival models.13 Two models were evaluated: a traditional smoking history–based risk model and an integrated risk prediction model that combined the smoking model and the biomarker score. Model discrimination was assessed by receiver operating characteristic (ROC) analysis using the predicted 1-year lung cancer risks as scoring rule. Discrimination estimates included area under the ROC curve (AUC), sensitivity, and specificity, which were weighted to reflect the background populations. In the context of using the 1-year absolute risk of lung cancer to define screening eligibility, the sensitivity provides an estimate of the fraction of future lung cancer cases that would be eligible for screening at a certain absolute risk threshold. Conversely, the specificity provides an estimate of the fraction of individuals from the background population who remain healthy and would not be eligible for screening. A sensitivity of 1.00 (or 100%) would indicate that all lung cancer cases are eligible for screening and a specificity of 1.00 (100%) would indicate that all individuals who remain healthy are not eligible for screening (ie, that there are no false-positive controls). Statistical significance was assumed at a 2-sided P < .05.
Details of the biomarker score and discrimination estimates in the CARET training study are available in eTables 1 and 2 and eFigures 2 and 3 in the Supplement. In the validation study of 63 ever-smoking patients with lung cancer and 90 matched controls (mean [SD] age, 57.7 [8.7] years; 68.6% men) from EPIC and NSHDS, the predicted risk of receiving a diagnosis of lung cancer within 1 year for a 60-year-old man with 30 pack-years of smoking history was estimated at 0.37% using the smoking model (Figure 1). In comparison, using the integrated risk prediction model, we estimated 1-year risks at 0.07% and 1.56% for the same man assuming a biomarker score equal to the average of the first and fourth quartile, respectively. The 1-year lung cancer risk estimates for each study participant in the validation study according to the smoking and integrated risk prediction models are shown in Figure 2. In comparison with the smoking model, the median 1-year risk estimates from the integrated risk prediction model increased for cases from 0.27% (interquartile range [IQR], 0.14%-0.50%) to 0.45% (IQR, 0.18%-1.5%) and decreased for controls from 0.12% (IQR, 0.05%-0.21%) to 0.04% (IQR, 0.015%-0.17%).
In the validation study, the population-weighted AUC was 0.73 (95% CI, 0.64-0.82) for the smoking model and 0.83 (95% CI, 0.76-0.90) for the integrated risk prediction model (P = .003 for difference in AUC) (Figure 3A). The AUCs were consistently higher for the integrated model than for the smoking model across relevant strata (eTable 3 in the Supplement). At an overall specificity of 0.83 based on the USPSTF screening criteria, the integrated risk prediction model yielded a sensitivity of 0.63 (95% CI, 0.49-0.76) compared with 0.43 (95% CI, 0.23-0.65) for the smoking model. Similarly, at an overall sensitivity of 0.42 (USPSTF), the integrated risk prediction model yielded a specificity of 0.95 (95% CI, 0.85-0.99) compared with 0.86 (95% CI, 0.72-0.94) for the smoking model. The improvement in AUC for the integrated risk prediction model (AUC, 0.80; 95% CI, 0.75-0.85) over the smoking model (AUC, 0.73; 95% CI, 0.68-0.79) was more modest when cases diagnosed up to 2 years after blood draw were considered (eFigure 4 in the Supplement). A full account of all conducted analyses is provided in the eResults; eTables 1, 2, and 4 to 10; and eFigures 6 to 10 of the Supplement.
This is, to our knowledge, the first study in which a blood-based biomarker score was developed using one cohort and externally validated using prediagnostic samples from other independent cohorts. We observed a notable improvement in discrimination between future lung cancer cases and controls over a traditional smoking-based risk prediction model by incorporating information from a biomarker score consisting of 4 circulating proteins.
In our validation study, 26 of the 62 incident lung cancer cases (42%, corresponding to a sensitivity of 0.42) would have qualified for LDCT screening according to USPSTF criteria (USPSTF eligibility criteria could not be assessed for 1 case). Using the biomarker score together with smoking information, we estimated that 40 of 63 cases (63%, corresponding to a sensitivity of 0.63) could be identified without increasing the number of eligible controls (ie, without decreasing the specificity). The data further suggested that the biomarker score could alternatively be used to reduce screening of individuals not destined to develop lung cancer (false positives) from 15 of 90 controls (17%) to 4 of 90 controls (5%) without affecting the uptake of future lung cancer cases (sensitivity). These improvements in sensitivity and specificity were consistently observed across each evaluated stratum. Our findings also indicated that the improvement in discrimination afforded by the biomarker score is more modest beyond the initial year after blood draw, which suggests that an annual biomarker test may be necessary in a screening program.
Naive discrimination estimates, as typically provided in a matched, nested, case-control setting, are inherently biased. An important strength of our study was the use of absolute risks and population-based discrimination estimates, which were necessary to estimate the number of individuals who would be selected for screening using the biomarker-based eligibility criterion in the overall background cohorts, beyond our specific case-control study.
A limitation of our study was that 3 variables that were originally included in a validated risk prediction model (the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial model from 2012 [PLCOM2012]) were not available in our validation studies.14 However, with use of the original PLCO data, the exclusion of these variables from the PLCOM2012 model only nominally decreased the model’s performance, which suggests that our risk prediction model represented a valid comparison for the biomarkers score (eMethods and eFigure 5 in the Supplement).14
Although this study provided a proof of principle of the potential of using biomarkers in lung cancer risk assessment to define screening eligibility, validating and calibrating the integrated risk prediction model using larger sample size with prediagnostic samples is clearly needed before such a risk prediction tool can be used in practice. A larger sample size will also allow stratified analysis to evaluate the performance of the biomarker panel in predicting lung cancer cases associated with different characteristics, such as stage at diagnosis and histologic subtype. Furthermore, our study was limited to a select panel of circulating proteins, and we note that other types of biomarkers may also be informative.4,5 We also note that the population that would most benefit from a biomarker test before undergoing LDCT screening remains to be defined. A thorough cost-effectiveness assessment based on a large study sample is warranted to determine the threshold in absolute risk of developing lung cancer during a specific period, above which the benefits of screening outweigh the harms.15
This study provides a proof of principle in demonstrating that circulating biomarkers have the potential to inform lung cancer risk assessment and substantially improve on current criteria for LDCT screening.
Accepted for Publication: April 10, 2018.
Published Online: July 12, 2018. doi:10.1001/jamaoncol.2018.2078
Correction: This article was corrected on September 13, 2018, to correct the author surname for Elisabete Weiderpass and on November 14, 2019, to add an omitted patent filing by Drs Taguchi, Feng, and Hanash.
Corresponding Authors: Samir M. Hanash, MD, PhD, University of Texas MD Anderson Cancer Center, 6767 Bertner Ave, Houston, TX 77030 (firstname.lastname@example.org); Mattias Johansson, PhD, International Agency for Research on Cancer, 150 Cours Albert Thomas, 69372 Lyon CEDEX 08, France (email@example.com).
Integrative Analysis of Lung Cancer Etiology and Risk (INTEGRAL) Consortium for Early Detection of Lung Cancer Group Authors: The following investigators take authorship responsibility for the study results: Florence Guida, PhD; Nan Sun, PhD; Leonidas E. Bantis, PhD; David C. Muller, PhD; Peng Li, PhD; Ayumu Taguchi, MD, PhD; Dilsher Dhillon, MS; Deepali L. Kundnani, MS; Nikul J. Patel, MS; Qingxiang Yan, PhD; Graham Byrnes, PhD; Karel G. M. Moons, PhD; Anne Tjønneland, MD, PhD; Salvatore Panico, MD, MS; Claudia Agnoli, PhD; Paolo Vineis, MD, MPH, FFPH; Domenico Palli, MD; Bas Bueno-de-Mesquita, MD, MPH, PhD; Petra H. Peeters, MD, PhD; Antonio Agudo, MD, PhD; Jose M. Huerta, PhD; Miren Dorronsoro, MD; Miguel Rodriguez Barranco, PhD; Eva Ardanaz, MD, PhD; Ruth C. Travis, DPhil; Karl Smith Byrne, DPhil; Heiner Boeing, PhD; Annika Steffen, PhD; Rudolf Kaaks, PhD; Anika Hüsing, MS; Antonia Trichopoulou, PhD; Pagona Lagiou, MD; Carlo La Vecchia, MD; Gianluca Severi, PhD; Marie-Christine Boutron-Ruault, PhD; Torkjel M. Sandanger, PhD; Elisabete Weiderpass, MD, PhD; Therese H. Nøst, PhD; Kostas Tsilidis, PhD; Elio Riboli, MD, MPH, MSc; Kjell Grankvist, MD, PhD; Mikael Johansson, PhD; Gary E. Goodman, MD, MS; Ziding Feng, PhD; Paul Brennan, PhD; Mattias Johansson, PhD; Samir M. Hanash, MD, PhD.
Affiliations of Integrative Analysis of Lung Cancer Etiology and Risk (INTEGRAL) Consortium for Early Detection of Lung Cancer Group Authors: Genetic Epidemiology Group, International Agency for Research on Cancer, Lyon, France (Guida, Li, Brennan, Mattias Johansson); Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston (Sun, Dhillon, Kundnani, Patel, Hanash); Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston (Bantis, Yan, Feng); Department of Epidemiology and Biostatistics, Imperial College London School of Public Health, London, United Kingdom (Muller, Vineis, Bueno-de-Mesquita, Tsilidis, Riboli); Laboratory of Population Health, Max Planck Institute for Demographic Research, Rostock, Germany (Li); Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston (Taguchi); Environment and Radiation Section, International Agency for Research on Cancer, Lyon, France (Byrnes); Department of Epidemiology, Julius Center for Health Sciences and Primary Care, University Medical Center, Utrecht, Netherlands (Moons, Peeters); Unit of Diet, Genes, and Environment, Danish Cancer Society Research Center, Copenhagen (Tjønneland); Department of Clinical Medicine and Surgery, Federico II University, Naples, Italy (Panico); Epidemiology and Prevention Unit, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy (Agnoli); Molecular and Genetic Epidemiology Unit, Human Genetics Foundation, Torino, Italy (Vineis, Severi); Cancer Risk Factors and Life-Style Epidemiology Unit, Cancer Research and Prevention Institute–Istituto per lo Studio e la Prevenzione Oncologica, Florence, Italy (Palli); Department for Determinants of Chronic Diseases, National Institute for Public Health and the Environment, Bilthoven, Netherlands (Bueno-de-Mesquita); Unit of Nutirition and Cancer, Cancer Epidemiology Research Program, Catalan Institute of Oncology, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet de Llobregat, Barcelona, Spain (Agudo); Department of Epidemiology, Murcia Regional Health Council, Biomedical Research Institute of Murcia (IMIB-Arrixaca), Murcia, Spain (Huerta); Centro de Investigación Biomédica en Red Epidemiología y Salud Pública (CIBERESP), Madrid, Spain (Huerta, Barranco, Ardanaz); Public Health Direction and Biodonostia Research Institute–CIBERESP, San Sebastian, Spain (Dorronsoro); Escuela Andaluza de Salud Pública, Instituto de Investigación Biosanitaria, Granada, Spain (Barranco); Hospitales Universitarios de Granada/Universidad de Granada, Granada, Spain (Barranco); Epidemiology, Prevention, and Promotion Health Service, Navarra Public Health Institute, Pamplona, Spain (Ardanaz); Instituto de Investigación Sanitaria de Navarra (IdiSNA), Navarra Institute for Health Research, Pamplona, Spain (Ardanaz); Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom (Travis, Byrne); Department of Epidemiology, German Institute of Human Nutrition, Potsdam-Rehbruecke (Boeing, Steffen); Divison of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg (Kaaks, Hüsing); Hellenic Health Foundation, Athens, Greece (Trichopoulou, Lagiou, La Vecchia); World Health Organization Collaborating Center for Nutrition and Health, Unit of Nutritional Epidemiology and Nutrition in Public Health, Department of Hygiene, Epidemiology and Medical Statistics, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece (Trichopoulou, Lagiou); Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts (Lagiou); Department of Clinical Sciences and Community Health, Università degli Studi di Milano, Milano, Italy (La Vecchia); Université Paris-Saclay, Université Paris-Sud, Université de Versailles Saint-Quentin-en-Yvelines, Centre de Recherche en Epidémiologie et Santé des Populations, National Institute for Health and Medical Research (INSERM), Villejuif, France (Severi, Boutron-Ruault); Department of Community Medicine, Universtiy of Tromsø, Arctic University of Norway, Tromsø (Sandanger, Weiderpass, Nøst); Department of Research, Cancer Registry of Norway, Institute of Population-Based Cancer Research, Oslo, Norway (Weiderpass); Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden (Weiderpass); Genetic Epidemiology Group, Folkhälsan Research Center, Helsinki, Finland (Weiderpass); Department of Hygiene and Epidemiology, School of Medicine, University of Ioannina, Ioannina, Greece (Tsilidis); Department of Medical Biosciences, Clinical Chemistry, Umeå University, Umeå, Sweden (Grankvist); Department of Radiation Sciences, Oncology, Umeå University, Umeå, Sweden (Mikael Johansson); Public Health Sciences Division, Program in Epidemiology, Fred Hutchinson Cancer Research Center, Seattle, Washington (Goodman).
Author Contributions: Drs Guida, Sun, Bantis, and Muller contributed equally to this study. Dr Hanash had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Feng, Brennan, Mattias Johansson, Hanash.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Guida, Nan, Bantis, Muller, Feng, Brennan, Mattias Johansson, Hanash.
Critical revision of the manuscript for important intellectual content: Guida, Nan, Bantis, Muller, Li, Taguchi, Dhillon, Kundnani, Patel, Yan, Byrnes, Moons, Tjønneland, Panico, Agnoli, Vineis, Palli, Bueno-de-Mesquita, Peeters, Agudo, Huerta, Dorronsoro, Rodriguez Barranco, Ardanaz, Travis, Smith Byrne, Boeing, Steffen, Kaaks, Hüsing, Trichopoulou, Lagiou, La Vecchia, Severi, Boutron-Ruault, Sandanger, Weiderpass, Nøst, Tsilidis, Riboli, Grankvist, Mikael Johansson, Goodman, Mattias Johansson, Hanash.
Statistical analysis: Guida, Bantis, Muller, Li.
Obtained funding: Mattias Johansson, Hanash.
Administrative, technical, or material support: Dhillon, Kundnani, Patel, Grankvist, Mikael Johansson, Mattias Johansson, Hanash.
Supervision: Feng, Brennan, Mattias Johansson, Hanash.
Conflict of Interest Disclosures: Drs Taguchi, Feng, and Hanash report the filing of a patent, Methods for the Detection and Treatment of Lung Cancer (WO2018148600), based on the data included in this article. No other disclosures were reported.
Funding/Support: This study was supported by grants 1U19CA203654 and UO1194733 from the National Cancer Institute and the National Cancer Institute Early Detection Research Network, grant INCa_ARC_10450 from Fondation ARC pour la recherche sur le cancer and INCa, the MD Anderson Lung Cancer Moon Shot Program and the Lyda Hill Foundation, the Canary Foundation, the Lungevity Foundation, and the S. Rubenstein Family Foundation. The EPIC study has been supported by the Europe Against Cancer Program of the European Commission (SANCO); Deutsche Krebshilfe; Deutsches Krebsforschungszentrum; German Federal Ministry of Education and Research; Danish Cancer Society; Health Research Fund (FIS) of the Spanish Ministry of Health; Spanish Regional Governments of Andalucia, Asturias, Basque Country, Murcia, and Navarra; Catalan Institute of Oncology, Spain; grant RETICC DR06/0020 from the ISCIII of the Spanish Ministry of Health; Cancer Research UK; Medical Research Council, United Kingdom; Greek Ministry of Health; Stavros Niarchos Foundation; Hellenic Health Foundation; Italian Association for Research on Cancer (AIRC); Italian National Research Council; Fondazione-Istituto Banco Napoli, Italy; Associazione Italiana per la Ricerca sul Cancro–AIRC-Milan; Compagnia di San Paolo; Dutch Ministry of Public Health, Welfare, and Sports; World Cancer Research Fund;Swedish Cancer Society; Swedish Scientific Council; Regional Government of Västerbotten, Sweden; NordForsk (Centre of excellence programme HELGA), Norway; French League against Cancer (LNCC), France; National Institute for Health and Medical Research (INSERM), France; Mutuelle Générale de l’Education Nationale (MGEN), France; 3M Co, France; Gustave Roussy Institute (IGR), France; and General Councils of France.
Role of the Funder/Sponsor: The funding organizations had no role in design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Additional Contributions: We thank the study participants for their contribution.
Create a personal account or sign in to: