Jonsson S, Thorsteinsdottir U, Gudbjartsson DF, Jonsson HH, Kristjansson K, Arnason S, Gudnason V, Isaksson HJ, Hallgrimsson J, Gulcher JR, Amundadottir LT, Kong A, Stefansson K. Familial Risk of Lung Carcinoma in the Icelandic Population. JAMA. 2004;292(24):2977-2983. doi:10.1001/jama.292.24.2977
Author Affiliations: Departments of Medicine
and Pathology, Landspitali-University Hospital (Drs S. Jonsson, Isaksson,
and Hallgrimsson), deCODE Genetics (Drs Thorsteinsdottir, Gudbjartsson, Kristjansson,
Arnason, Gulcher, Amundadottir, Kong, and Stefansson, and Mr H. Jonsson),
and Icelandic Heart Association (Dr Gudnason), Reykjavík, Iceland.
Context The dominant role of tobacco smoke as a causative factor in lung carcinoma
is well established; however, an inherited predisposition may also be an important
factor in the susceptibility to lung carcinoma.
Objective To investigate the contribution of genetic factors to the risk of developing
lung carcinoma in the Icelandic population.
Design, Setting, and Participants Risk ratios (RRs) of lung carcinoma for first-, second-, and third-degree
relatives of patients with lung carcinoma were estimated by linking records
from the Icelandic Cancer Registry (ICR) of all 2756 patients diagnosed with
lung carcinoma within the Icelandic population from January 1, 1955, to February
28, 2002, with an extensive genealogical database containing all living Icelanders
and most of their ancestors since the settlement of Iceland. The RR for smoking
was similarly estimated using a random population-based cohort of 10 541
smokers from the Reykjavik Heart Study who had smoked for more than 10 years.
Of these smokers, 562 developed lung cancer based on the patients with lung
cancer list from the ICR.
Main Outcome Measures Estimation of RRs of close and distant relatives of patients with lung
carcinoma and comparison with RRs for close and distant relatives of smokers.
Results A familial factor for lung carcinoma was shown to extend beyond the
nuclear family, as evidenced by significantly increased RR for first-degree
relatives (for parents: RR, 2.69; 95% confidence interval [CI], 2.20-3.23;
for siblings: RR, 2.02; 95% CI, 1.77-2.23; and for children: RR, 1.96; 95%
CI, 1.53-2.39), second-degree relatives (for uncles/aunts: RR, 1.34; 95% CI,
1.15-1.49; and for nephews/nieces: RR, 1.28; 95% CI, 1.10-1.43), and third-degree
relatives (for cousins: RR, 1.14; 95% CI, 1.05-1.22) of patients with lung
carcinoma. This effect was stronger for relatives of patients with early-onset
disease (age at onset ≤60 years) (for parents: RR, 3.48; 95% CI, 1.83-8.21;
for siblings: RR, 3.30; 95% CI, 2.19-4.58; and for children: RR, 2.84; 95%
CI, 1.34-7.21). The hypothesis that this increased risk is solely due to the
effects of smoking was rejected for all relationships, except cousins and
spouses, with a single-sided test of the RRs for lung carcinoma vs RRs for
Conclusions These results underscore the importance of genetic predisposition in
the development of lung carcinoma, with its strongest effect in patients with
early-onset disease. However, tobacco smoke plays a dominant role in the pathogenesis
of this disease, even among those individuals who are genetically predisposed
to lung carcinoma.
Lung carcinoma is the leading cause of death from cancer among men and
women in many Western countries.1 Mortality
due to lung carcinoma in the United States exceeds the death rate from breast,
prostate, and colon cancer combined.2 Treatment
results for lung carcinoma have remained disappointing and only marginal
gains have been made during the last 30 to 40 years. Five-year survival is
now approaching 14% given the best available diagnostic and treatment modalities.3
The dominant role of tobacco smoke as a causative factor in lung carcinoma
is well established. Most studies report that more than 90% of patients with
lung carcinoma are smokers.1 Previous epidemiological
case-control studies have shown an approximately 2-fold increase in the development
of lung carcinoma in first-degree relatives of patients with lung carcinoma,
after controlling for confounding factors, such as smoking and age, suggesting
a genetic predisposition.4- 7
Similar risk has also been observed for relatives of patients with lung
carcinoma in larger registry-based studies utilizing the Utah Population and
Cancer Registry Database8,9 and
the Swedish Family-Cancer Database.10- 12
registry-based studies are more meaningful as they are less prone to sampling
bias, resulting from proband identification and oversampling of families with
several affected members.13 However, none of
these larger studies were controlled for smoking. It is important to control
for smoking for 2 reasons. First, it is possible that the increased incidence
of lung carcinoma in first-degree relatives is due to shared environment (second-hand
smoke or other environmental factors), as demonstrated by increased lung cancer
risk for spouses of patients with lung cancer in 1 of the Swedish studies.12 Second, the familiality of lung cancer could be entirely
due to the familiality of nicotine addiction and smoking.
In our study, we estimated the familiality of lung carcinoma in the
Icelandic population by linking together records from the Icelandic Cancer
Registry (ICR)14,15 of all cases
of lung carcinoma diagnosed in Iceland from January 1, 1955, to February 28,
2002, with a nationwide genealogical database containing all living Icelanders
and the majority of their ancestors since the settlement of Iceland in 870 AD. This allowed us to examine all relationships among all of the
lung carcinoma cases registered in the ICR and to estimate risk for lung carcinoma
development beyond first-degree relatives of patients with lung carcinoma,
thus reducing the effects of shared environment. Furthermore, by using information
on smoking history from the Reykjavik Heart Study,14 we
estimated the familiality of smoking, and compared the risk ratio (RR) of
lung carcinoma with the RR of smoking to examine whether there is a genetic
component to the risk of lung carcinoma.
The study population included all patients diagnosed with lung carcinoma
in Iceland from January 1, 1955, to February 28, 2002. These cases were all
registered in the ICR.14,15 Lung
carcinoma was defined as a malignant neoplasm of epithelial origin according
to the World Health Organization histological classification.16 Carcinoid
tumors as well as tumors of lymphoid and mesenchymal origin were excluded
from our analysis. Information in the ICR includes year of diagnosis, year
of death, Systematized Nomenclature of Medicine code, International
Classification of Diseases, 10th Revision (ICD-10), and mode of lung carcinoma verification. During this 47-year period,
2756 patients with lung carcinoma were identified (1504 men and 1252 women).
Histological and cytological verification was available for 2516 patients
with lung carcinoma; the remaining 240 patients were diagnosed clinically.
A random collection of 10 541 adult smokers from the Icelandic
population was obtained from the Icelandic Heart Association. These were individuals
who had been randomly selected to take part in a nationwide study of cardiovascular
risk factors (the Reykjavik Heart Study) during the years 1967 to 2002 and
had answered a questionnaire on entry, which included information about their
smoking habits. All individuals who had smoked for more than 10 years were
defined as smokers. Of the 10 541 smokers in the study, 562 developed
lung carcinoma. Because we had smoking information only on a small proportion
of all patients with lung carcinoma and their relatives, we could not calculate
lung carcinoma RR directly, taking smoking into account. Instead, we used
the random sample of smokers to estimate the familiality of smoking.
All data were encrypted through a process approved by the Data Protection
Commission of Iceland before being sent to our laboratory for analysis.17 The study was approved by the National Bioethics
Committee of Iceland and the Data Protection Commission of Iceland.
We have built a computerized database of genealogical information in
Iceland, including the names of all 284 000 living Icelanders and their
deceased ancestors.18 Currently, more than
685 000 individuals are registered in the database. Control groups were
assembled to match the patients with lung carcinoma group according to year
of birth, sex, and number of ancestors within the database in the preceding
5 generations. The Data Protection Commission of Iceland reversibly encrypted
the data along with the genealogical database, before making it available
to our laboratory.17
To evaluate familial risk of lung carcinoma in the Icelandic population,
we calculated RRs of close and distant relatives of the probands.18 The RR for relatives of patients with lung carcinoma
were defined as the risk of lung carcinoma in the relatives of affected individuals
divided by the prevalence in the general population. In other words, if P denotes the event in which the proband is affected and R denotes the event in which the relative is affected,
the RR is defined as
When calculating the risk of lung carcinoma in relatives, we restricted
our analyses to relatives born during the period covering the lifespan of
the group of patients in question. We used the same restriction according
to year of birth in estimating the risk in the general population for the
The RR of smoking was evaluated in a similar way as the RR of lung carcinoma
using the list of the 10 541 smokers and the Icelandic genealogical database.
The RR for smoking together with the RR for lung carcinoma allows for a statistical
test on the effects of smoking on lung carcinoma.
Let r be the number of relatives of probands
(counting multiple times individuals who are relatives of multiple probands19), a the number of relatives
of probands that are affected (again possibly counting the same individual
morethan once), n the size of the population, and x the number of affected individuals in the population.
If P(R) and P(R|P) can reasonably
be assumed to be constant in the population, then respectively x/n and a/r are estimates of these probabilities. Given the estimates, RR is
consistently estimated by
Assuming the population may be split into N subpopulations,
within each of which P(R)
and P(R|P) can reasonably be assumed to be constant, although they may vary
between subpopulations, and assuming RR is the same in all subpopulations,
it is consistently estimated by any weighted sum of the estimates for the N subpopulations. We chose to select weights such that
the efficiency of the estimator is at maximum for RR equal to 1. Making the
simplifying assumption that the relatives are independent (although this assumption
is obviously wrong, it only affects efficiency, not validity), the optimal
weight for group j is
(this is the inverse of the variance of the estimate for RR in subpopulation j), where the meaning of a, r, x, and n is the same as above, restricted to the subpopulation j, except that all affected individuals in the population are still
taken as probands and not just the individuals in the subpopulation. Given
these weights, our estimate of RR is
In our analysis, potential differences in P(R) and P(R|P) between subpopulations stem from time-dependent censoring
of affection statuses and possibly sex-specific differences. Therefore, we
have taken j to run over groups of relatives born
in the same 5-year period and of the same sex. The patients with lung carcinoma
in our analysis were born between the years 1868 and 1977, yielding 44 subpopulations.
In the case of smoking, our list is only a random sample of all the
smokers. By applying the same method to estimate RR with this partial list, aj / rj is an underestimate
of P(R|P) and xj / nj is
an underestimate of P(R).
However, since these estimates should be off by the same factor, (aj / rj)/(xj / nj) continues to be a valid estimate
Because a person can both be a proband and a relative of 1 or more other
probands, aj does not have a binominal
distribution. In general, for stratum j, aj / rj can be considered as a weighted
average of many unbiased but correlated estimates of P(R|P). It follows that (aj / rj)/(xj / nj)
is a ratio of 2 unbiased estimates and a consistent estimate of RR. Our overall
estimate of RR is a weighted average of the estimates obtained from the various
strata and is itself a consistent estimate. However, appropriate simulations,
instead of purely analytical calculations, are needed to study its sampling
variation. To assess the significance of the RR obtained for a given group
of patients, we compared their observed values with the RR computed for 1000
independently drawn and matched groups of control individuals. Each patient
was matched to a specific control individual in each control group. The control
individuals were drawn at random from the genealogical database and had the
same year of birth, the same sex, and the same number of ancestors recorded
in the database, as did the patients to whom they were matched. A reported P = .05 for the RR would indicate that 50 of the 1000 matched
control groups had values as large or larger than that for the patient’s
relatives or spouses. When none of the values computed for the control groups
were larger than the value for the patient’s relatives or spouses, we
report P<.001. Using a variance stabilizing square
root transform, an approximate confidence interval (CI) may be constructed
based on the control distribution.19
We show that, assuming that the familial clustering of lung carcinoma
is entirely explained by the familial clustering of smoking, the RR of smoking
must be greater than that of lung carcinoma. Mathematically, when we say “the
familial clustering of lung carcinoma is entirely explained by the familial
clustering of smoking,” we mean that the 4 random variables, proband
lung carcinoma status, proband smoking status, relative smoking status, and
relative lung carcinoma status, form a Markov Chain. For example, this means
that relative lung carcinoma status is conditionally independent of proband
lung carcinoma status, given the smoking status of either the proband or the
Let PLC, PS, RS, and RLC denote the events that the proband has lung carcinoma,
the proband smokes, the relative smokes, and the relative has lung carcinoma,
respectively. Given that these events are all positively correlated and if
we make the Markov assumption described above, then
(1) P(RLC|PLC) ≤ P(RLC|PS)
(2) P(PS|RLC) ≤ P(PS|RS).
We want to prove that
(*) [P(RLC|PLC)/P (RLC)]≤[P(RS|PS)/P(RS)].
Because of (1),
to prove (*), it is sufficient to show that
Applying Bayes’ Rule, the left-hand side of (**) can be rewritten
and the right-hand side of (**) can be rewritten as
It follows from (2) that (3) ≤ (4). Hence, (**) and (*)
hold. It is also worth noting that equality holds in (*) if and only if (1)
and (2) are both equalities. The latter is true if and only if P(PS|PLC) = 1 and P(RS|RLC) = 1.
In other words, equality holds in (*) if and only if an individual must smoke
to get lung carcinoma.
When the 2756 patients with lung carcinoma were matched to the Icelandic
genealogical database, 274 affected sibling pairs, 296 affected avuncular
pairs, and 724 affected cousin pairs were observed.
Estimates of the RR for relatives of the 2756 patients are shown in Table 1. Parents, siblings, and children (first-degree
relatives) had RRs of 2.69 (95% CI, 2.20-3.23), 2.02 (95% CI, 1.77-2.23),
and 1.96 (95% CI, 1.53-2.39), respectively. The RRs for uncles/aunts and nephews/nieces
(second-degree relatives) and for cousins (third-degree relatives) were less
than that of first-degree relatives but were also significantly increased.
The RR for spouses was also significantly increased, although less than that
for first-degree relatives.
To determine whether the risk of developing lung carcinoma is greater
for relatives of patients with early-onset vs late-onset disease, we calculated
the RR for relatives of patients diagnosed with lung carcinoma at 60 years
or younger (Table 1). For all groups
of relatives analyzed, the risk was greater for relatives of patients with
early-onset disease than for relatives of all patients with lung carcinoma.
Thus, the risk for second-degree relatives (RR, 1.96; 95% CI, 1.35-2.78, for
uncles/aunts; and RR, 1.94; 95% CI, 1.32-2.72, for nephews/nieces) of patients
with early-onset disease is similar to the risk for children and siblings
(RR, 1.96; 95% CI, 1.53-2.39; and RR, 2.02; 95% CI, 1.77-2.23, respectively)
of all patients with lung carcinoma.
All 4 major histological types of lung carcinoma (adenocarcinoma and
small cell, large cell, and squamous cell carcinoma) are significantly associated
with smoking, and the risk of developing lung carcinoma increases with number
of cigarettes smoked and the duration of smoking. However, the strength of
this relationship varies between the histological types with adenocarcinoma
displaying the weakest overall relationship to smoking.20,21 Due
to this difference, we calculated the risk of lung carcinoma development for
relatives and spouses for adenocarcinoma separately from the other major histological
types of lung carcinoma (ie, small cell, large cell, and squamous cell carcinomas)
(Table 2). No significant difference
in lung carcinoma risk was detected between relatives and spouses of patients
with lung carcinoma from these 2 histological groups. However, the risk for
spouses of patients with adenocarcinoma of the lung was only half of that
of spouses of the combined group of small cell, large cell, and squamous cell
lung carcinoma. Although this difference was large, it was not significant
as the CI for the spouses of patients with adenocarcinoma lung cancer was
wide due to low number of spouses in that cohort.
It has been proposed that nicotine addiction (smoking) is at least in
part inherited. We thus calculated the risk of smoking for relatives and spouses
of smokers using a random list of 10 541 individuals who had smoked at
least 1 package of cigarettes per day for more than 10 years. As shown in Table 3, the risk of having smoked for more than
10 years is significant for first-, second-, and third-degree relatives of
smokers. The risk was, however, highest for spouses of smokers (RR, 2.39;
95% CI, 2.28-2.48), suggesting that in addition to genetic factors, environmental
factors and/or nonrandom mating have a substantial effect on smoking habits.
Prolonged exposure to tobacco smoke precedes the development of lung
carcinoma in the vast majority of patients with lung carcinoma. We demonstrate
mathematically that if the familiality of lung carcinoma is entirely explained
by the familiality of smoking, the risk for smoking (Table 3) must be higher than that of lung carcinoma (Table 1). Therefore, if the RR of lung carcinoma is actually higher
than the RR of smoking, it would be a rejection of the null hypothesis that
lung carcinoma is entirely due to smoking. Based on a single-sided test of
the RRs for lung carcinoma vs RRs for smoking, the null hypothesis was rejected
beyond the nuclear family (Table 4).
This was evident by significantly higher RRs for lung carcinoma than for smoking
for all relationships except for cousins. In contrast, the RR for smoking
of spouses was significantly higher than the RR for lung carcinoma.
Taken together, our data on the nationwide evaluation of lung carcinoma
familiality in Iceland demonstrates that heritable factors are indeed involved
in the etiology of lung carcinoma. Furthermore, this genetic predisposition
goes beyond the predisposition to smoking.
We investigated the role of genetic factors in the development of lung
carcinoma by linking together information on all lung carcinoma cases diagnosed
within the Icelandic population from January 1, 1955, to February 28, 2002,
with an extensive genealogical database covering all Icelanders living during
this time and most of their ancestors. Using these data, we found that there
is a familial predisposition to the development of lung carcinoma, as RR estimates
for first-, second-, and third-degree relatives of patients with lung carcinoma
were all significantly increased. This effect was strongest for relatives
of patients with early-onset lung carcinoma, in accordance with previous articles.22 Significantly increased RR for spouses of patients
with lung carcinoma also indicates the presence of shared environmental factors
and/or nonrandom mating.
The nationwide genealogy database used in our study provided a means
for uncovering the familial component by revealing more connections between
patients, missed in most other populations. The first-degree relatives (siblings,
children, and parents) of patients with lung carcinoma (early- and late-onset)
are at a 2- to 3.5-fold increased risk of developing lung carcinoma than the
general population. However, members of a nuclear family share environment,
as evidenced by the 1.75-fold risk of lung carcinoma development in spouses.
Thus, this RR increase in first-degree relatives of patients with lung carcinoma
is the result of a combination of environmental, genetic factors, or both.
Using genealogy, our study goes further than other reported studies by demonstrating
that this familial factor extends beyond the nuclear family as evidenced by
significantly increased RR for second- and third-degree relatives of patients
with lung carcinoma. In the more distant relationships, shared environmental
factors are likely to be of less significance, providing a stronger evidence
for genetic factors given that RR is in excess.
We had smoking information only for a proportion of our nationwide cohort
of patients with lung carcinoma and therefore could not estimate RR directly
taking smoking into account. However, we demonstrated mathematically that
single-sided comparison of the RR for smoking to that of lung carcinoma in
relatives and spouses of smokers and patients with lung carcinoma, respectively,
can be used to determine whether lung carcinoma is entirely due to smoking.
When that comparison is applied, the risk for lung carcinoma is significantly
higher than the risk for smoking beyond the first-degree relatives of patients
with lung carcinoma, demonstrating that increased risk for relatives of patients
with lung carcinoma is not solely due to smoking. In contrast, this effect
for spouses is opposite (the RR for smoking is higher than for lung carcinoma).
These results suggest that the increased risk for lung carcinoma among spouses
may be solely due to tobacco exposure. Furthermore, and more importantly,
these data also demonstrate that the increased risk for close and distant
relatives of patients with lung carcinoma is not solely due to tobacco smoke
exposure. Similar conclusion was also reached in a study in which survival
models were applied in a case-control analysis of lung carcinoma (ie, the
familial aggregation of lung carcinoma could not be fully explained by the
familial aggregation of smoking).23 Based on
previous theoretical analysis by Khoury et al,24 it
is unlikely that other unknown environmental factors could explain fully the
increased familial risk in lung carcinoma, implying an underlying genetic
predisposition in lung carcinoma.
When we compared the risk of lung carcinoma for spouses and relatives
of patients with adenocarcinoma to that of spouses and relatives of patients
with other histological types of lung carcinoma, the greatest difference (more
than half, although not significant) was observed between the spouses of these
2 groups. This suggests a weaker environmental influence for adenocarcinoma
than for the 3 other major histology types of lung carcinoma. These data concur
with epidemiological studies that have demonstrated a weaker association between
smoking and adenocarcinoma vs other histological types of lung carcinoma.20,21
Comparison of the concordance of cancer between monozygotic and dizygotic
pairs of twins has been used to quantify the extent to which an observed familial
pattern is due to genetic or shared environmental factors.25 However,
these studies are limited because twins are rare and few twin registries go
far enough back in time for cancer assessment.26 The
largest of these studies have suggested a limited heritability of lung carcinoma,
although none reached statistical significance.25
In previous epidemiological studies on lung carcinoma using segregation
analysis, a codominant model of inheritance best fitted the data, suggesting
that a rare major autosomal gene plays a role in the development of lung carcinoma.27 Other studies have suggested that a number of low-penetrance,
high-frequency polymorphisms are likely to account for a proportion of lung
carcinoma risk.28 Polymorphisms in these genes
could explain individual differences in susceptibility to tobacco carcinogens
and are likely to include genes involved in decreasing or increasing the activity
of carcinogens (eg, CYP1A, CYP2E, and GSTM1) and genes involved in monitoring
and repairing tobacco carcinogen-induced DNA damage (eg, p53 and ERCC1).29- 31 Our
results of RR calculation cannot discriminate between different models of
inheritance. Recently, a major lung cancer susceptibility locus was mapped
to chromosome 6q23-25 using multigenerational densely-affected families.32 The characteristics of this locus are consistent
with a dominant or codominant major locus. Information gained from epidemiological
and genetical studies such as our study may be of particular importance in
allowing for risk stratification with respect to lung carcinoma. Further information
gained from linkage and association studies may give additional value in this
In conclusion, to our knowledge, this study is the first population-based
study using a comprehensive and extensive genealogy database, taking into
account the effects of smoking, which demonstrates a familial nature of lung
carcinoma that strongly suggests a genetic predisposition to the disease.
However, although the results presented here support a role for genetics in
the risk of lung carcinoma, it should be emphasized that tobacco smoke plays
a dominant role in the pathogenesis of this disease, even among those individuals
who are genetically predisposed to lung carcinoma.
Corresponding Authors: Kari Stefansson,
MD, PhD, and Unnur Thorsteinsdottir, PhD, deCODE Genetics, Sturlugata 8, 101
Reykjavík, Iceland (firstname.lastname@example.org and email@example.com).
Author Contributions: Drs Jonsson and Stefansson
had full access to all of the data in the study and take responsibility for
the integrity of the data and the accuracy of the data analysis.
Study concept and design: S. Jonsson, Thorsteinsdottir,
Kristjansson, Arnason, Hallgrimsson, Gulcher, Amundadottir, Stefansson.
Acquisition of data: S. Jonsson, Isaksson.
Analysis and interpretation of data: S. Jonsson,
Thorsteinsdottir, H. Jonsson, Kong, Gudbjartsson.
Drafting of the manuscript: S. Jonsson, Thorsteinsdottir.
Critical revision of the manuscript for important
intellectual content: S. Jonsson, Thorsteinsdottir, H. Jonsson, Kong,
Gudbjartsson, Kristjansson, Arnason, Isaksson, Hallgrimsson, Gulcher, Amundadottir,
Statistical analysis: H. Jonsson, Kong, Gudbjartsson.
Obtained funding: Stefansson.
Administrative, technical, or material support:
S. Jonsson, Thorsteinsdottir, Kristjansson, Arnason, Isaksson, Gulcher, Amundadottir,
Study supervision: S. Jonsson, Thorsteinsdottir.
Funding/Support: All of the work, data generation,
and analysis of this study was supported by deCODE Genetics.
Role of the Sponsor: deCODE Genetics participated
in the design and conduct of the study, the collection, analysis, and interpretation
of the data, and the preparation, review, and approval of the manuscript.
Independent Statistical Analysis: Kristjan
Jonasson, PhD, Associate Professor, Department of Mathematics, Faculty of
Science, University of Iceland, was given access to the complete data, including
genealogical data and lung cancer and smoking data, after coding of personal
identification numbers. Dr Jonasson completed a thorough check of the methods
and data analysis, and confirmed that the results reported in the submitted
manuscript are both statistically correct and in accordance with the data.
Acknowledgment: We thank the Icelandic Cancer
Registry for providing us with the list of patients with lung carcinoma.