Evidence reviews for the US Preventive Services Task Force (USPSTF) use an analytic framework to visually display the key questions that the review will address to allow the USPSTF to evaluate the effectiveness and safety of a preventive service. The questions are depicted by linkages that relate interventions and outcomes. Additional details are provided in the USPSTF Procedure manual.10 I-ELCAP indicates International Early Lung Cancer Action Program; NLST, National Lung Screening Trial; and SBRT, stereotactic body radiotherapy.
aThe evaluation of evidence on treatment was limited to studies of surgical resection or SBRT for stage I non–small cell lung cancer.
ICTRP indicates International Clinical Trials Registry Platform; KQ, key question; SABR, stereotactic ablative radiation; SBRT, stereotactic body radiotherapy; USPSTF, US Preventive Services Task Force; WHO, World Health Organization.
aBecause many articles contribute to 1 or more KQs, the number of articles listed per KQ in this section does not add up to 223.
DANTE indicates Detection and Screening of Early Lung Cancer by Novel Imaging Technology and Molecular Essays; DLCST, Danish Lung Cancer Screening Trial; IRR, incidence rate ratio; ITALUNG, Italian Lung Cancer Screening Trial; NELSON, Nederlands-Leuvens Longkanker Screenings Onderzoek; and NLST, National Lung Screening Trial.
DANTE indicates Detection and Screening of Early Lung Cancer by Novel Imaging Technology and Molecular Essays; DLCST, Danish Lung Cancer Screening Trial; IRR, incidence rate ratio; ITALUNG, Italian Lung Cancer Screening Trial; LDCT, low-dose computed tomography; LSS, Lung Screening Study; NELSON, Nederlands-Leuvens Longkanker Screenings Onderzoek; NLST, National Lung Screening Trial; NR, not reported.
eMethods. Literature Search Strategy
eTable 1. NSCLC Staging Overview, Typical 5-Year Survival, and Treatment Approaches
eTable 2. Eligibility Criteria
eTable 3. Risk of Bias and Overall Quality Assessment Ratings for Nonrandomized Studies
eTable 4. Risk of Bias and Overall Quality Assessment Ratings for Randomized Studies: Part 1
eTable 5. Risk of Bias and Overall Quality Assessment Ratings for Randomized Studies: Part 2
eTable 6. Risk of Bias and Overall Quality Assessment Ratings for Randomized Studies: Part 3
eTable 7. Risk of Bias and Overall Quality Assessment Ratings for Risk Prediction Model (KQ 2) Studies: Part 1
eTable 8. Risk of Bias and Overall Quality Assessment Ratings for Risk Prediction Model (KQ 2) Studies: Part 2
eTable 9. Risk of Bias and Overall Quality Assessment Ratings for Risk Prediction Model (KQ 2) Studies: Part 3
eTable 10. Risk of Bias and Overall Quality Assessment Ratings for Risk Prediction Model (KQ 2) Studies: Part 4
eTable 11. Risk of Bias and Overall Quality Assessment Ratings for Accuracy (KQ 3) Studies: Part 1
eTable 12. Risk of Bias and Overall Quality Assessment Ratings for Accuracy (KQ 3) Studies: Part 2
eTable 13. Risk of Bias and Overall Quality Assessment Ratings for Accuracy (KQ 3) Studies: Part 3
eTable 14. Predictors Used in Risk Prediction Models for Identifying Adults at Higher Risk of Lung Cancer Mortality and Model Applicability
eTable 15. PLCOm2012Model Estimated Benefits and Harms Over 6 Years Compared With USPSTF or NLST Criteria
eTable 16. Summary of Modeling Studies Evaluating Screen-Prevented Lung Cancer Deaths and NNS to Prevent One Lung Cancer Death
eTable 17. Accuracy of LDCT for Lung Cancer Screening in RCTs (KQ 3)
eTable 18. Accuracy of LDCT for Lung Cancer Screening in Nonrandomized Studies (KQ 3)
eTable 19. LDCT Parameters, by Study Type
eTable 20. Number and Percentage of False-Positive Results After Screening With LDCT
eTable 21. False-Positive Evaluations
eTable 22. Incidental Findings from KQ 4 Studies
eTable 23. KQs 6 and 7: Study and Patient Characteristics Table
eTable 24. KQs 6 & 7 Surgery Results
eTable 25. KQs 6 & 7 SBRT Results
eTable 26. KQs 6 and 7 Update Search Results: Study and Patient Characteristics
eTable 27. KQs 6 and 7 Update Search Results: Surgery Study Outcomes
eTable 28. KQs 6 and 7 Update Search Results: SBRT Study Outcomes
eTable 29. Summary of Multilevel Barriers to Effective Lung Cancer Screening
eTable 30. Summary of Randomized, Controlled Trials in the Systematic Review From 2019
eFigure. Trial Results for Lung Cancer Incidence (KQ 1)
Customize your JAMA Network experience by selecting one or more topics from the list below.
Jonas DE, Reuland DS, Reddy SM, et al. Screening for Lung Cancer With Low-Dose Computed Tomography: Updated Evidence Report and Systematic Review for the US Preventive Services Task Force. JAMA. 2021;325(10):971–987. doi:10.1001/jama.2021.0377
Lung cancer is the leading cause of cancer-related death in the US.
To review the evidence on screening for lung cancer with low-dose computed tomography (LDCT) to inform the US Preventive Services Task Force (USPSTF).
MEDLINE, Cochrane Library, and trial registries through May 2019; references; experts; and literature surveillance through November 20, 2020.
English-language studies of screening with LDCT, accuracy of LDCT, risk prediction models, or treatment for early-stage lung cancer.
Data Extraction and Synthesis
Dual review of abstracts, full-text articles, and study quality; qualitative synthesis of findings. Data were not pooled because of heterogeneity of populations and screening protocols.
Main Outcomes and Measures
Lung cancer incidence, lung cancer mortality, all-cause mortality, test accuracy, and harms.
This review included 223 publications. Seven randomized clinical trials (RCTs) (N = 86 486) evaluated lung cancer screening with LDCT; the National Lung Screening Trial (NLST, N = 53 454) and Nederlands-Leuvens Longkanker Screenings Onderzoek (NELSON, N = 15 792) were the largest RCTs. Participants were more likely to benefit than the US screening-eligible population (eg, based on life expectancy). The NLST found a reduction in lung cancer mortality (incidence rate ratio [IRR], 0.85 [95% CI, 0.75-0.96]; number needed to screen [NNS] to prevent 1 lung cancer death, 323 over 6.5 years of follow-up) with 3 rounds of annual LDCT screening compared with chest radiograph for high-risk current and former smokers aged 55 to 74 years. NELSON found a reduction in lung cancer mortality (IRR, 0.75 [95% CI, 0.61-0.90]; NNS to prevent 1 lung cancer death of 130 over 10 years of follow-up) with 4 rounds of LDCT screening with increasing intervals compared with no screening for high-risk current and former smokers aged 50 to 74 years. Harms of screening included radiation-induced cancer, false-positive results leading to unnecessary tests and invasive procedures, overdiagnosis, incidental findings, and increases in distress. For every 1000 persons screened in the NLST, false-positive results led to 17 invasive procedures (number needed to harm, 59) and fewer than 1 person having a major complication. Overdiagnosis estimates varied greatly (0%-67% chance that a lung cancer was overdiagnosed). Incidental findings were common, and estimates varied widely (4.4%-40.7% of persons screened).
Conclusions and Relevance
Screening high-risk persons with LDCT can reduce lung cancer mortality but also causes false-positive results leading to unnecessary tests and invasive procedures, overdiagnosis, incidental findings, increases in distress, and, rarely, radiation-induced cancers. Most studies reviewed did not use current nodule evaluation protocols, which might reduce false-positive results and invasive procedures for false-positive results.
In 2020, lung cancer was the second most common cancer and the leading cause of cancer-related death in both men and women in the US.1 Most patients diagnosed with lung cancer presented with distant or metastatic disease; less than 20% were diagnosed with localized (ie, stage 1) disease.1 Lung cancer has traditionally been classified into 2 major categories: (1) non–small cell lung cancer (NSCLC) (eTable 1 in the Supplement), which collectively comprises adenocarcinoma, squamous cell carcinoma, and large cell carcinoma, and (2) small cell lung cancer, which is more aggressive and has worse survival rates.2 Approximately 85% of lung cancers are NSCLC.3 The risk of developing lung cancer is largely driven by age and smoking status, with smoking estimated to account for nearly 90% of all lung cancers.4-6 Other risk factors for lung cancer include environmental exposures, radiation therapy, other (noncancer) lung diseases, race/ethnicity, and family history.7
In 2013, the US Preventive Services Task Force (USPSTF) recommended annual screening for lung cancer with low-dose computed tomography (LDCT) in adults aged 55 to 80 years who have a 30–pack-year smoking history and currently smoke or have quit within the past 15 years (B recommendation).8 The USPSTF recommended that screening should be discontinued once a person has not smoked for 15 years or develops a health problem that substantially limits life expectancy or the ability or willingness to have curative lung surgery.8 This updated review evaluates the current evidence on screening for lung cancer with LDCT for populations and settings relevant to primary care in the US to inform an updated recommendation by the USPSTF.
Figure 1 shows the analytic framework and key questions (KQs) that guided the review. Detailed methods are available in the full evidence report.7
PubMed/MEDLINE and the Cochrane Library were searched for English-language articles published through May 2019. Search strategies are listed in the eMethods in the Supplement. Clinical trial registries were searched for unpublished studies. To supplement electronic searches, investigators reviewed reference lists of pertinent articles, studies suggested by reviewers, and comments received during public commenting periods. Since May 2019, ongoing surveillance was conducted through article alerts and targeted searches of journals to identify major studies published in the interim that may affect the conclusions or understanding of the evidence and the related USPSTF recommendation. The last surveillance was conducted on November 20, 2020.
Two investigators independently reviewed titles, abstracts, and full-text articles to determine eligibility using prespecified criteria (eTable 2 in the Supplement). Disagreements were resolved by discussion and consensus. English-language studies of adults aged 18 years or older conducted in countries categorized as “very high” on the Human Development Index,9 rated as fair or good quality, and published in or after 2001 were included. For all KQs, randomized clinical trials (RCTs) and nonrandomized controlled intervention studies were eligible. Cohort studies based on prospectively collected data that were intended to be used for evaluations relevant to this review were also eligible for KQs on harms of screening or workup (KQs 4 and 5) and treatment (KQs 6 and 7).
For KQ2 (on risk prediction), externally validated models aimed at identifying persons at increased risk of lung cancer using multiple variables, including at least age and smoking history, were included. Eligible risk prediction models had to be compared with either the 2013 USPSTF recommendations or criteria used by trials showing benefit. Eligible outcomes included estimated screen-preventable lung cancer deaths or all-cause mortality, estimated screening effectiveness (eg, number needed to screen [NNS]), and estimated screening harms.
For each included study, 1 investigator extracted pertinent information about the populations, tests or treatments, comparators, outcomes, settings, and designs, and a second investigator reviewed this information for completeness and accuracy. Two independent investigators assessed the quality of studies as good, fair, or poor, using predefined criteria developed by the USPSTF and adapted for this topic.10 Disagreements were resolved by discussion.
Findings for each KQ were summarized in tabular and narrative format. The overall strength of the evidence for each KQ was assessed as high, moderate, low, or insufficient based on the overall quality of the studies, consistency of results between studies, precision of findings, risk of reporting bias, and limitations of the body of evidence, using methods developed for the USPSTF (and the Evidence-based Practice Center program).10 Additionally, the applicability of the findings to US primary care populations and settings was assessed. Discrepancies were resolved through discussion.
To determine whether meta-analyses were appropriate, the clinical and methodological heterogeneity of the studies was assessed according to established guidance.11 Meta-analyses were not conducted because of substantial clinical and methodological heterogeneity. For example, the trials of lung cancer screening differed in eligibility criteria (eg, age, pack-years of smoking, years since quitting), number of screening rounds (from 2 to 5), screening intervals (eg, annual, biennial, or escalating), thresholds for a positive screen (eg, 4 mm, 5 mm, or based on volume), and comparators (chest radiograph or no screening). For KQ1, forest plots were created to display the findings of each study by calculating incidence rate ratios (IRRs), using number of events and person-years of follow-up, for lung cancer incidence, lung cancer mortality, and all-cause mortality. Quantitative analyses were conducted using Stata version 14 (StataCorp).
A total of 223 publications were included (Figure 2). Twenty-six articles addressed whether screening improves health outcomes. Most articles assessed accuracy, harms, or effectiveness of surgery or stereotactic body radiotherapy for early NSCLC. Results for KQs 6, 7, and 8 are in the eResults in the Supplement. Individual study quality ratings are reported in eTables 3 to 13 in the Supplement.
Key Question 1a. Does screening for lung cancer with LDCT change the incidence of lung cancer and the distribution of lung cancer types and stages (ie, stage shift)?
Key Question 1b. Does screening for lung cancer with LDCT change all-cause mortality, lung cancer mortality, or quality of life?
Key Question 1c. Does the effectiveness of screening for lung cancer with LDCT differ for subgroups defined by age, sex, race/ethnicity, presence of comorbid conditions, or other lung cancer risk factors?
Key Question 1d. Does the effectiveness of screening for lung cancer with LDCT differ by the number or frequency of LDCT scans (eg, annual screening for 3 years, the protocol used in the National Lung Screening Trial [NLST] vs other approaches)?
Seven RCTs (described in 26 articles) were included (Table 1): NLST, Detection and Screening of Early Lung Cancer by Novel Imaging Technology and Molecular Essays (DANTE), Danish Lung Cancer Screening Trial (DLCST), Italian Lung Cancer Screening Trial (ITALUNG), Lung Screening Study (LSS), the German Lung Cancer Screening Intervention Trial (LUSI), and the Nederlands-Leuvens Longkanker Screenings Onderzoek (NELSON) study.12-37 Two trials in the US compared LDCT with chest radiography (LSS and NLST), and 5 trials in Europe compared LDCT with no screening (DANTE, DLCST, ITALUNG, LUSI, and NELSON). Only the NLST (53 454 participants) and NELSON (15 792 participants) were adequately powered to assess for lung cancer mortality benefit.24,31 The majority of participants were White in all trials; in the NLST, 91% were White, less than 5% were Black, and less than 2% were Hispanic or Latino.
Trials varied in their definition of a positive screen and in the follow-up evaluation process. NELSON was unique in using volumetric measurements of nodules and calculating volume doubling. Compared with the prior systematic review conducted for the USPSTF,38,39 longer follow-up or more complete end point verification was available from DANTE,12 DLCST,16 LSS,20 and the NLST,33,37 and 3 additional trials—NELSON,24 ITALUNG,17 and LUSI21,23—reported data relevant to this KQ.
The cumulative incidence of lung cancer was higher in LDCT groups than in control groups for all studies except ITALUNG (eFigure in the Supplement). Figure 3 shows the increases in early-stage (I-II) and decreases in late-stage (III-IV) lung cancer incidence.
Figure 4 shows the calculated IRRs for the trials that reported lung cancer mortality. Over almost 7 years of follow-up and more than 140 000 person-years of follow-up in each group, the NLST found a significant reduction in lung cancer mortality with 3 rounds of annual LDCT screening compared with chest radiography (calculated IRR, 0.85 [95% CI, 0.75-0.96]). These findings indicate an NNS to prevent 1 lung cancer death of 323 over 6.5 years of follow-up. Analysis of extended follow-up data of NLST participants at 12.3 years after randomization found a similar absolute difference between groups (1147 vs 1236 lung cancer deaths; risk ratio [RR], 0.92 [95% CI, 0.85-1.00]; absolute difference between groups of 3.3 [95% CI, −0.2 to 6.8] lung cancer deaths per 1000 participants). The NELSON trial reported a reduction in lung cancer mortality for 4 rounds of screening with increasing intervals between LDCTs (combining data for males and females, calculated IRR, 0.75 [95% CI, 0.61-0.90]; NNS to prevent 1 lung cancer death of 130 over 10 years of follow-up). Results of the other trials were very imprecise and did not show statistically significant differences between groups (Figure 4).
The NLST found a reduction in all-cause mortality with LDCT screening compared with chest radiography (1912 vs 2039 deaths; 1141 per 100 000 person-years vs 1225 per 100 000 person-years; calculated IRR, 0.93 [95% CI, 0.88-0.99]). The other trials found no statistically significant differences between groups, but results were imprecise (Figure 5).
All included trials enrolled participants at high risk for lung cancer (based on age and smoking history). Seven publications using DLCST, LUSI, NELSON, or NLST data described subgroup analyses for age, sex, race/ethnicity, smoking status and pack-years, history of chronic obstructive pulmonary disease (COPD), or other pulmonary conditions.16,23,24,33-35,37 A post hoc analysis of NLST data reported that 88% of the benefit (lung cancer deaths averted) was achieved by screening the 60% of participants at highest risk for lung cancer death.29 Other post hoc analyses of NLST data reported lung cancer mortality by sex (RR, 0.73 for women vs 0.92 for men; P = .08), age (RR, 0.82 for <65 years vs 0.87 for ≥65 years; P = .60), race/ethnicity (hazard ratio [HR], 0.61 for Black individuals vs 0.86 for White individuals; P = .29), and smoking status (RR, 0.81 for current smokers vs 0.91 for former smokers; P = .40), and did not identify statistically significant differences between groups.33-35 A long-term follow-up of NLST participants at 12.3 years reported similar results for subgroups and did not identify statistically significant interactions by sex, age, or smoking status (sex: RR, 0.86 for women vs 0.97 for men, P = .17; age: RR, 0.86 for <65 years vs 1.01 for ≥65 years, P = .051; smoking status: RR, 0.88 for current smokers vs 1.01 for former smokers, P = .12).37 Both LUSI and NELSON reported a similar pattern for subgroups by sex as found in the NLST that was not statistically significantly different between groups (LUSI: women, HR, 0.31 [95% CI, 0.10-0.96] vs men, HR, 0.94 [95% CI, 0.54-1.61], P = .09) or without reporting an interaction test (NELSON: women, RR, 0.67 [95% CI, 0.38-1.14] vs men, RR, 0.76 [95% CI, 0.61-0.94] at 10 years of follow-up).23,24 NELSON reported analyses by age group among the men in the trial (not including the women in those analyses) but did not report interaction tests for subgroups defined by age (RRs ranged from 0.59 [95% CI, 0.35-0.98] for persons aged 65 to 69 years at randomization to 0.85 [95% CI, 0.48-1.50] for persons aged 50 to 54 years at randomization).24
Key Question 2. Does the use of risk prediction models for identifying adults at higher risk of lung cancer mortality improve the balance of benefits and harms of screening compared with the use of trial eligibility criteria (eg, NLST criteria) or the 2013 USPSTF recommendations?
Detailed results for this KQ are in eResults and eTables 14-16 in the Supplement. In summary, 4 studies of 3 different risk prediction models (a modified version of a model developed from participants of the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial [PLCOm2012], the Lung Cancer Death Risk Assessment Tool [LCDRAT], and Kovalchik model) estimating outcomes in 4 different cohorts reported increased screen-preventable deaths compared with the risk factor–based criteria used by the NLST or USPSTF (in the 2013 recommendations). Three studies demonstrated improved screening efficiency (determined by the NNS) of risk prediction models compared with risk factor−based screening, while 1 study showed mixed results. For harms, 8 studies of 13 different risk prediction models (PLCOm2012, simplified PLCOm2012, Bach, Liverpool Lung Project [LLP], simplified LLP, Knoke, Two-Stage Clonal Expansion [TSCE] incidence, TSCE Cancer Prevention Study death, TSCE Nurses’ Health Study/Health Professionals Follow-up Study, the HUNT Lung Cancer model, LCDRAT, COPD-LUCSS [Lung Cancer Screening Score], Kovalchik) estimating outcomes in 4 different cohorts reported similar numbers of false-positive selections from risk prediction (ie, the risk prediction model selected persons to be screened who did not have or develop lung cancer) and mixed findings for rates of false-positive selections when comparing risk prediction models with the risk factor–based criteria used by the NLST or USPSTF. In general, estimates were consistent but imprecise, primarily because of a lack of an established risk threshold to apply the model.
Key Question 3a. What is the accuracy of screening for lung cancer with LDCT?
Key Question 3b. Does the accuracy of screening for lung cancer with LDCT differ for subgroups defined by age, sex, race/ethnicity, presence of comorbid conditions, or other lung cancer risk factors?
Key Question 3c. Does the accuracy of screening for lung cancer with LDCT differ for various approaches to nodule classification (ie, those based on nodule size and characteristics)?
Detailed results for this KQ are in eResults and eTables 17 and 18 in the Supplement. Fifty-three articles were eligible for this KQ.12,13,19,21,22,24-28,30-32,34,40-77 Of those, 24 publications with the most complete data are described.12,21,24,34,41,43-45,47-49,52,53,56,58,60,64,66-68,71,72,75,76 Sensitivity of LDCT from 13 studies (76 856 total participants) ranged from 59% to 100%; all but 3 studies reported sensitivity greater than 80%. Specificity of LDCT from 13 studies (75 819 total participants) ranged from 26.4% to 99.7%; all but 3 reported specificity greater than 75%. Positive predictive value (14 studies, 77 840 participants) ranged from 3.3% to 43.5%. Negative predictive value (9 studies, 47 496 participants) ranged from 97.7% to 100%. Variability in accuracy was mainly attributed to heterogeneity of eligibility criteria, screening protocols (eg, number of screening rounds, screening intervals), follow-up length (eg, to identify false-negative screens), and definitions (eg, of positive tests, indeterminate tests). Three studies (73 404 participants) compared various approaches to nodule classification (Lung-RADS or International Early Lung Cancer Action Program [I-ELCAP]) and found that using Lung-RADS in the NLST would have increased specificity while decreasing sensitivity and that increases in positive predictive value are seen with increasing nodule size thresholds.44,49,52
Key Question 4a. What are the harms associated with screening for lung cancer with LDCT?
Key Question 4b. Do the harms of screening for lung cancer with LDCT differ with the use of Lung-RADS, I-ELCAP, or similar approaches (eg, to reduce false-positive results)?
Key Question 4c. Do the harms of screening for lung cancer with LDCT differ for subgroups defined by age, sex, race/ethnicity, presence of comorbid conditions, or other lung cancer risk factors?
Key Question 5a. What are the harms associated with workup or surveillance of nodules?
Key Question 5b. Do the harms of workup or surveillance of nodules differ with the use of Lung-RADS, I-ELCAP, or similar approaches (eg, to reduce false-positive results)?
Key Question 5c. Do the harms of workup or surveillance of nodules differ for subgroups defined by age, sex, race/ethnicity, presence of comorbid conditions, or other lung cancer risk factors?
Detailed results are in eResults in the Supplement.
Nine publications reported on radiation associated with LDCT.16,31,56,69,75,78-81 Most of those reported the radiation associated with 1 LDCT, with ranges from 0.65 mSv to 2.36 mSv (eTable 19 in the Supplement). Two of the studies evaluated the cumulative radiation exposure for participants undergoing screening with LDCT78,80; using those studies to estimate cumulative exposure for an individual from 25 years of annual screening (ie, from age 55 to 80 years as recommended by the USPSTF in 2013) yields 20.8 mSv to 32.5 mSv. One study estimated the lifetime risk of cancer from radiation of 10 annual LDCTs was 0.26 to 0.81 major cancers for every 1000 people screened.80
Twenty-seven publications reported enough information to determine the rate of false-positives, defined as any result leading to additional evaluation (eg, repeat LDCT scan before the next annual screening, biopsy) that did not result in a diagnosis of cancer.15,18,19,21,24,30-32,34,40,46,47,49,52,56,62,65-68,73,75-77,82-84 False-positive rates varied widely across studies, most likely because of differences in definitions of positive results, such as cutoffs for nodule size (eg, 4 mm vs 5 mm vs 6 mm), use of volume-doubling time, and nodule characteristics considered. The range of false-positive rates overall was 7.9% to 49.3% for baseline screening and 0.6% to 28.6% for individual incidence screening rounds, although rates for some subgroups were higher (eg, age ≥65 years) (eTable 20 in the Supplement). False-positive rates generally declined with each screening round.34,47,65,66,73,76
Among the trials that found lung cancer screening mortality benefit and cohort studies based in the US, false-positive rates were 9.6% to 28.9% for baseline and 5.0% to 28.6% for incidence rounds. The NLST reported false-positive rates for baseline, year 1, and year 2 of 26.3%, 27.2%, and 15.9%, respectively.31 The NELSON trial noted false-positive rates of 19.8% at baseline, 7.1% at year 1, 9.0% for males at year 3, and 3.9% for males at year 5.5 of screening.24,65 One study of 112 radiologists from 32 screening centers who each interpreted 100 or more NLST scans reported a mean (SD) false-positive rate of 28.7% (13.7) (range, 3.8%-69.0%).46 Mean rates were similar for academic (n = 25) and nonacademic (n = 7) centers (27.9% vs 26.7%, respectively).46 An implementation study through the Veterans Health Administration revealed a false-positive rate of 28.9% of veterans eligible for screening (58% of those who were actually screened) at baseline screening.40 False-positive rates varied across 8 study sites, ranging from 12.6% to 45.8% of veterans eligible for screening.40
Fourteen studies reported on the evaluation of false-positive results.22,30,31,34,43,62-64,66,72,75,79,81,85 Among all patients screened, the percentage who had a needle biopsy for a false-positive result ranged from 0.09% to 0.56% (eTable 21 in the Supplement). Surgical procedures for false-positive results were reported in 0.5% to 1.3% and surgical resections for false-positive results were reported in 0.1% to 0.5% of all screened participants.
In the NLST, false-positive results led to invasive procedures (needle biopsy, thoracotomy, thoracoscopy, mediastinoscopy, and bronchoscopy) in 1.7% of those screened (number needed to harm, 59). Complications occurred in 0.1% of those screened (number needed to harm, 1000), with major, intermediate, and minor complications occurring in 0.03%, 0.05%, and 0.01%, respectively, of those screened. Death in the 60 days following the most invasive procedure performed occurred in 0.007% of those screened.31 One study using NLST data estimated that 117 invasive procedures for false-positive results (23.4% of all invasive procedures for false-positive results from the NLST) would be prevented by using Lung-RADS criteria.44
Five studies specifically examined overdiagnosis,81,86-89 and 7 additional trials were examined for differences in cancer incidence between LDCT and comparison groups.14,17,19,24,31,90,91 Estimates of overdiagnosis ranged from 0% to 67.2% that a screen-detected lung cancer is overdiagnosed.
One RCT (DLCST; 4075 participants), studies of participants from RCTs (NELSON, NLST, LSS; 19 426 total participants), and 3 cohort studies (ELCAP, Mayo Lung Project, and Pittsburgh Lung Screening Study [PLuSS]; 5537 total participants) included evaluations of the effect of LDCT screening or screening results on smoking cessation and relapse.91-100 Studies comparing LDCT vs controls (no screening or chest radiography) for smoking cessation or abstinence outcomes do not indicate that screening leads to false reassurance. Abnormal or indeterminate screening test results may increase cessation and continued abstinence, but normal screening test results had no influence. Regarding smoking intensity, evidence was minimal, and no study showed influence of screening or test result on smoking intensity.
Four RCTs (DLCST, NELSON, NLST, and UK Lung Cancer Screening [UKLS] trial; 12 096 total participants) reported in 6 publications,62,101-105 1 uncontrolled cohort study (PLuSS, 400 participants),106 and 2 studies of participants from the screening arm of an RCT (NELSON, 630 participants107; UKLS, 1589 participants108) included an evaluation of potential psychosocial consequences of LDCT screening. These studies evaluated general health-related quality of life (HRQoL; 3 studies),101,104,107 anxiety (8 studies),62,101-107 depression (2 studies),62,102 distress (3 studies),62,104,107 and other psychosocial consequences of LDCT screening (5 studies).62,103,105,106,108 Taken together, there is moderate evidence to suggest that, compared with no screening, persons who receive LDCT screening do not have worse general HRQoL, anxiety, or distress over 2 years of follow-up. Some evidence suggests differential consequences by screening result such that general HRQoL and anxiety were worse, at least in the short-term, for individuals who received true-positive results compared with other screening results; distress was worse for participants who received an indeterminate screening result compared with other results. The strength of evidence is low for other psychosocial consequences, largely because of unknown consistency, imprecision, and only 1 or 2 studies assessed outcomes.
Studies reported a wide range of screening-related incidental findings (4.4% to 40.7%) that were deemed significant or requiring further evaluation (eResults and eTable 22 in the Supplement).34,40,62,82,109-112 Rates varied considerably in part because there was no consistent definition of what constitutes an incidental finding nor which findings were “actionable” or “clinically significant.” Older age was associated with a greater likelihood of incidental findings. Common incidental findings included coronary artery calcification, aortic aneurysms, emphysema, infectious and inflammatory processes, masses, nodules, or cysts of the kidney, breast, adrenal, liver, thyroid, pancreas, spine, and lymph nodes. Incidental findings led to downstream evaluation, including consultations, additional imaging, and invasive procedures with associated costs and burdens.
This evidence review evaluated screening for lung cancer with LDCT in populations and settings relevant to US primary care; a summary of the evidence is provided in Table 2. Screening high-risk persons with LDCT can reduce lung cancer mortality but also causes a range of harms. For benefits of screening, the NLST demonstrated a reduction in lung cancer mortality and all-cause mortality with 3 rounds of annual LDCT screening compared with chest radiography, and the NELSON trial demonstrated a reduction in lung cancer mortality with 4 rounds of LDCT screening with increasing intervals. Harms of screening include false-positive results leading to unnecessary tests and invasive procedures, overdiagnosis, incidental findings, short-term increases in distress because of indeterminate results, and, rarely, radiation-induced cancer.
NLST and NELSON results are generally applicable to high-risk current and former smokers aged 50 to 74 years, but participants were younger, more highly educated, less likely to be current smokers than the US screening-eligible population, and had limited racial and ethnic diversity. The general US population eligible for lung cancer screening may be less likely to benefit from early detection compared with NLST and NELSON participants because they face a high risk of death from competing causes, such as heart disease and stroke.113 Data from the 2012 Health and Retirement Study showed a lower 5-year survival rate and life expectancy in screening-eligible persons compared with NLST participants.113 NELSON did not allow enrollment of persons with moderate or severe health problems and an inability to climb 2 flights of stairs; weight over 140 kg; or current or past kidney cancer, melanoma, or breast cancer.
The trials were mainly conducted at large academic centers, potentially limiting applicability to community-based practice (eg, because of challenges with implementation [eContextual Questions in the Supplement], level of multidisciplinary expertise). Many of the trial centers are well recognized for expertise in thoracic radiology as well as cancer diagnosis and treatment.31 The NLST noted that mortality associated with surgical resection was much lower in the trial than that reported for the US population (1% vs 4%).31,114
Guidelines recommend that clinicians conduct a rigorous process of informed and shared decision-making about the benefits and harms of lung cancer screening before initiating screening. However, given the complex nature of benefits and harms associated with screening, there is some concern that robust shared decision-making is impractical to implement in actual practice.115-117 eContextual question 1 in the Supplement describes the barriers to implementing lung cancer screening and surveillance in clinical practice in the US.
Most studies reviewed in this article (including the NLST) did not use current nodule evaluation protocols such as Lung-RADS (endorsed by the American College of Radiology).118 A study included in this review estimated that Lung-RADS would reduce false-positive results compared with NLST criteria and that about 23% of all invasive procedures for false-positive results from the NLST would have been prevented by using Lung-RADS criteria.44
Application of lung cancer screening with (1) current nodule management protocols and (2) the use of risk prediction models might improve the balance of benefits and harms, although the strength of evidence supporting this possibility was low. There remains considerable uncertainty about how such approaches would perform in actual practice because the evidence was largely derived from post hoc application of criteria to trial data (for Lung-RADS) and from modeling studies (for risk prediction) and does not include prospective clinical utility studies. Additional discussion of the evidence on risk prediction models is provided in the eDiscussion in the Supplement. When applied to current clinical practice, lung cancer screening programs have demonstrated significant variation, even within a single institution type.40
This review has several limitations. First, non–English-language articles were excluded, as were studies with sample size less than 500 or 1000 for some KQs to focus on the best evidence. Doing so omitted some smaller studies that reported on harms of screening. For example, a study of 351 participants in the NELSON trial examined discomfort of LDCT scanning and waiting for the LDCT results.119 Most participants (88%-99%) reported experiencing no discomfort related to the LDCT scan, but about half reported at least some discomfort from waiting for the result (46%) and dreading the result (51%). Second, the KQ on risk prediction models (KQ2) was limited to how well risk prediction models perform vs current recommended risk factor–based criteria for lung cancer screening. KQ2 complements the decision analysis report120 by evaluating previously published studies that apply risk prediction models to cohorts or representative samples of the US population rather than simulated populations. Third, for accuracy, some included studies did not report accuracy metrics; rather, when sufficient data were reported, values were calculated from the study data. This approach introduces uncertainty and may account for variability.
Screening high-risk persons with LDCT can reduce lung cancer mortality but also causes false-positive results leading to unnecessary tests and invasive procedures, overdiagnosis, incidental findings, increases in distress, and, rarely, radiation-induced cancers. Most studies reviewed did not use current nodule evaluation protocols, which might reduce false-positive results and invasive procedures for false-positive results.
Corresponding Author: Daniel E. Jonas, MD, MPH, Department of Internal Medicine, Division of General Internal Medicine, The Ohio State University, 2050 Kenny Rd, Columbus, OH 43221 (Daniel.Jonas@osumc.edu).
Accepted for Publication: January 19, 2021.
Author Contributions: Dr Jonas had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Jonas, Reuland, Nagle, Clark, Weber, Enyioha, Armstrong, Voisin.
Acquisition, analysis, or interpretation of data: Jonas, Reuland, Reddy, Nagle, Clark, Weber, Enyioha, Malo, Brenner, Armstrong, Coker-Schwimmer, Middleton, Harris.
Drafting of the manuscript: Jonas, Reuland, Reddy, Nagle, Clark, Enyioha, Malo, Brenner, Armstrong, Middleton, Voisin.
Critical revision of the manuscript for important intellectual content: Jonas, Reuland, Reddy, Weber, Brenner, Coker-Schwimmer, Harris.
Statistical analysis: Jonas, Reddy, Weber, Enyioha, Middleton.
Obtained funding: Jonas.
Administrative, technical, or material support: Jonas, Reddy, Clark, Weber, Armstrong, Middleton, Voisin.
Supervision: Jonas, Reuland, Armstrong, Harris.
Conflict of Interest Disclosures: Dr Jonas reported receiving a contract from the Agency for Healthcare Research and Quality (AHRQ) during the conduct of the study. Dr Reuland reported receiving a contract from the AHRQ to complete review during the conduct of the study. Dr Reddy reported receiving a contract from the AHRQ to complete review during the conduct of the study. Dr Clark reported receiving a contract from the AHRQ to complete review during the conduct of the study. Dr Weber reported receiving a contract from the AHRQ to conduct review during the conduct of the study. Dr Malo reported receiving a contract from the AHRQ to complete review during the conduct of the study. Dr Armstrong reported receiving a contract from the AHRQ to complete review during the conduct of the study. Dr Coker-Schwimmer reported receiving a contract from the AHRQ, which funded the research under contract No. HHSA-290-2015-00011-I, Task Order No. 11, during the conduct of the study. Dr Middleton reported receiving a contract from the AHRQ to complete review during the conduct of the study. Dr Voisin reported receiving a contract from the AHRQ to complete review during the conduct of the study. No other disclosures were reported.
Funding/Support: This research was funded under contract HHSA-290-2015-00011-I, Task Order 11, from the AHRQ, US Department of Health and Human Services, under a contract to support the US Preventive Services Task Force (USPSTF).
Role of the Funder/Sponsor: Investigators worked with USPSTF members and AHRQ staff to develop the scope, analytic framework, and key questions for this review. AHRQ had no role in study selection, quality assessment, or synthesis. AHRQ staff provided project oversight, reviewed the report to ensure that the analysis met methodological standards, and distributed the draft for public comment and review by federal partners. Otherwise, AHRQ had no role in the conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript findings. The opinions expressed in this document are those of the authors and do not reflect the official position of AHRQ or the US Department of Health and Human Services.
Additional Contributions: We gratefully acknowledge the following individuals for their contributions to this project, including AHRQ staff (Howard Tracer, MD, and Tracy Wolff, MD, MPH), RTI International–University of North Carolina at Chapel Hill Evidence-based Practice Center staff (Carol Woodell, BSPH; Sharon Barrell, MA; and Loraine Monroe), and expert consultant M. Patricia Rivera, MD, professor of medicine, Division of Pulmonary Diseases and Critical Care Medicine at University of North Carolina at Chapel Hill. The USPSTF members, expert consultants, peer reviewers, and federal partner reviewers did not receive financial compensation for their contributions. Ms Woodell, Ms Barrell, and Ms Monroe received compensation for their roles in this project.
Additional Information: A draft version of the full evidence report underwent external peer review from 5 content experts (Deni Aberle, MD, UCLA Medical Center; Peter Bach, MD, MAPP, Memorial Sloan-Kettering Cancer Center; Tanner Caverly, MD, University of Michigan; Michael Jaklitsch, MD, Brigham and Women’s Hospital; and Renda Soylemez Wiener, MD, MPH, US Department of Veteran’s Affairs, Boston University School of Medicine) and 3 federal partner reviewers (from the Centers for Disease Control and Prevention, the National Cancer Institute, and the National Institute of Nursing Research). Comments from reviewers were presented to the USPSTF during its deliberation of the evidence and were considered in preparing the final evidence review. USPSTF members and peer reviewers did not receive financial compensation for their contributions.
Editorial Disclaimer: This evidence report is presented as a document in support of the accompanying USPSTF Recommendation Statement. It did not undergo additional peer review after submission to JAMA.