Study design: a comparison of the 6-year cumulative incidence of invasive breast cancer among women who received biennial screening vs controls who received only a prevalence screen at the end of their observation period.
Expected and observed cumulative incidence of invasive breast cancer among women who received biennial screening vs controls who received only a prevalence screen at the end of their observation period. A, What would be expected given the conventional model of cancer progression: invasive breast cancers in the control group that would have been detected by regular screening ultimately either progress to be detected clinically or persist to be detected by the prevalence screen. Thus, the 6-year cumulative incidence would be the same in both groups. B, What was observed in our study: a deficit in cumulative incidence persists in the control group following the prevalence screen. CI indicates confidence interval; RR, relative rate.
Zahl P, Mæhlen J, Welch HG. The Natural History of Invasive Breast Cancers Detected by Screening Mammography. Arch Intern Med. 2008;168(21):2311–2316. doi:10.1001/archinte.168.21.2311
The introduction of screening mammography has been associated with sustained increases in breast cancer incidence. The natural history of these screen-detected cancers is not well understood.
We compared cumulative breast cancer incidence in age-matched cohorts of women residing in 4 Norwegian counties before and after the initiation of biennial mammography. The screened group included all women who were invited for all 3 rounds of screening during the period 1996 through 2001 (age range in 1996, 50-64 years). The control group included all women who would have been invited for screening had there been a screening program during the period 1992 through 1997 (age range in 1992, 50-64 years). All women in the control group were invited to undergo a 1-time prevalence screen at the end of their observation period. Screening attendance was similar in both groups (screened, 78.3%, and controls, 79.5%). Counts of incident invasive breast cancers were obtained from the Norwegian Cancer Registry (in situ cancers were excluded).
As expected, before the age-matched controls were invited to be screened at the end of their observation period, the cumulative incidence of invasive breast cancer was significantly higher in the screened group than in the controls (4-year cumulative incidence: 1268 vs 810 per 100 000 population; relative rate, 1.57; 95% confidence interval, 1.44-1.70). Even after prevalence screening in controls, however, the cumulative incidence of invasive breast cancer remained 22% higher in the screened group (6-year cumulative incidence: 1909 vs 1564 per 100 000 population; relative rate, 1.22; 95% confidence interval, 1.16-1.30). Higher incidence was observed in screened women at each year of age.
Because the cumulative incidence among controls never reached that of the screened group, it appears that some breast cancers detected by repeated mammographic screening would not persist to be detectable by a single mammogram at the end of 6 years. This raises the possibility that the natural course of some screen-detected invasive breast cancers is to spontaneously regress.
Throughout Europe—including Denmark,1 Italy,2 Norway,3 Sweden,3 and the United Kingdom4—the initiation of screening mammography has been associated with increased breast cancer incidence among women of screening age. If all of thesenewly detected cancers were destined to progress and become clinically evident as women age, a fall in incidence among older women should soon follow. The fact that this decrease is not evident raises the question: What is the natural history of these additional screen-detected cancers?
We consider herein the possibility of spontaneous regression in screen-detected invasive breast cancer. We make use of an exceptional natural experiment—the rapid and comprehensive introduction of biennial screening mammography in 4 Norwegian counties—to compare 6-year cumulative breast cancer incidence in a cohort of women aged 50 to 64 years at the start of the program with that of an age-matched cohort from 4 years earlier. The timing for this control cohort was chosen to overlap with the introduction of mammography and thus included a single 2-view mammogram (prevalence screen) at the end of the period. If spontaneous regression did not occur (ie, if all screen-detected breast cancers were to progress or even remain the same size), the cumulative incidence in the 2 cohorts would therefore be expected to be equal.
The Norwegian Breast Cancer Screening Program is a program of 2-view biennial screening mammography initiated in 1996 by the Ministry of Health and Care Services. The program invited women aged 50 to 69 years to undergo a first round of screening in 1996-1997, a second round in 1998-1999, and a third round in 2000-2001.
Our study population was drawn from women residing in 1 of 4 counties (Akershus, Oslo, Rogaland, Hordaland) during the period 1992 through 2001. The mobility in this population was low (approximately 1.2% of women aged 50-69 years move each year, mostly to another municipality within the 4 counties5) and the participation in screening was high (80%, 79%, and 78% participation for the 3 screening rounds, respectively6,7). The screened and control group were selected so that both groups were potentially eligible for screening over a 6-year period (Figure 1).
The screened group included all women who were invited for screening (age range, 50-69 years) in all 3 rounds (first, second, and third round) from 1996 through 2001. Thus, the screened group was restricted to women aged between 50 and 64 years (inclusive) in the year 1996 (and who were therefore aged 55-69 years at the end of the analysis 5 years later in 2001).
The control group included all women who would have been invited for screening (age range, 50-69 years) had there been a screening program during the period 1992 through 1997. Thus, the control group was restricted to women between the ages of 50 and 64 years (inclusive) in the year 1992 (and who were therefore aged 55-69 years at the end of the analysis 5 years later in 1997). Because this period includes the years 1996-1997 (the first 2 years of the Norwegian Breast Cancer Screening Program), all women in this group were invited to undergo a 1-time prevalence screen at the close of the period.
Our primary outcome was the cumulative incidence of invasive breast cancer over a 6-year period. In situ cancers, such as ductal carcinoma in situ, were specifically excluded in the calculation of this cumulative incidence. The annual number of incident invasive breast cancers (1992-2001) for women of each year of age was obtained from the Norwegian Cancer Registry. This computerized population-based cancer registry was established in 1951 and has been demonstrated to register virtually all cancers diagnosed in Norway by using the national personal identification number system to link information from several sources.8 The denominators for our incidence calculations—the number of women in each 1-year age group for each year of our analysis—were obtained from Statistics Norway, Oslo.
We calculated the annual incidence of invasive breast cancer in the screened and control group and compared the cumulative incidence in women in the screened group with age-matched women in the control group at 2, 4, and 6 years. We then calculated the relative rate (RR) for being diagnosed with invasive cancer (screened vs control) and estimated 95% confidence intervals (CIs) using Poisson regression. Because the 1996-1997 cells contribute data to both the screened and control group in the 6-year analysis, we also empirically estimated 95% CIs using a bootstrap technique (simulating the distribution of rates for each age group/year cell).9
We also sought to determine how the effect was modified by age. Because the annual number of invasive breast cancers among women at any year of age (eg, 50 years) is small, we collapsed women into 4-year age groups at entry (eg, 50-53 years, 51-54 years, 52-55 years, and so on through age 61-64 years) and report the RR for each age group. These RRs represent a moving average of how the effect of being in the screened group is modified by age. The 4-year age groups serve another purpose. In our overall analysis (age 50-64 years at entry), a substantial number of women contribute information to both the screened and control group (although at a different point in their life, eg, a 50-year old woman at entry in the control group will be 54 years old at entry in the screened group). In the 4-year age groups, no individual can appear in both the screened and control group. In an effort to illustrate our method, an appendix that provides the detailed RR calculation for a single 4-year age group (women aged 56-59 years at entry) is provided online (http://www.vaoutcomes.org/downloads/Appendix_Table.pdf).
Table 1 gives the characteristics of the screened and control group. Because both groups consist of women aged 50 to 64 years at the start of the observation period, the finding that their mean age is about the same is expected. Furthermore, their educational attainment and family income are also similar. More important, however, is the comparability of their reproductive history—a key risk factor for breast cancer. The proportion that is nulliparous, mean age at first birth, and the number of births are all similar. Finally, both groups had similarly high attendance at screening at the end of the observation period (78.3% of screened women attended the third round of screening; 79.5% of controls attended the prevalence screen).
As expected, invitation to the first round of mammography was associated with a dramatic rise in invasive breast cancer incidence in the screened group relative to age-matched controls (2-year cumulative incidence, 660 vs 384 per 100 000 population; RR, 1.72 [95% CI, 1.53-1.94]). As time passed and cancer in the control group had the opportunity to become clinically evident, the difference narrowed (4-year cumulative incidence, 1268 vs 810 per 100 000 population; RR, 1.57 [95% CI, 1.44-1.70]). Even after prevalence screening in controls, however, the cumulative incidence of invasive breast cancer remained 22% higher in the screened group (6-year cumulative incidence, 1909 vs 1564 per 100 000 population; RR, 1.22 [95% CI, 1.16-1.30]). The expected and observed cumulative incidence over the 6-year period is shown in Figure 2.
Higher incidence was observed in screened women at each year of age. Table 2 gives the cumulative incidence for each 4-year age group in the analysis as well as the risk of breast cancer diagnosis in the screened group relative to controls. The RR is significantly elevated (>1) in each group except the very oldest (starting age, 61-64 years).
To determine how sensitive our results are to the length of follow-up, we repeated the analysis using 8 years of observation for both the screened (1996-2003) and control group (1992-1999). This analysis compares 4 biennial screening rounds (over 8 years) in the screened group with 4 years without screening followed by 2 biennial screening rounds (over 4 years) in the control group. In other words, it addresses the concern about the imperfect sensitivity of mammography by allowing the control group to have 2 screening rounds to identify cancer. Extending follow-up, however, had little effect on the excess incidence observed (8-year cumulative incidence, 2580 vs 2152 per 100 000 population; RR, 1.20 [95% CI, 1.14-1.25]).
Because most cancers that are detected are also treated, there are only a few reports documenting spontaneous regression of breast cancer.10,11However, spontaneous regression of advanced cancer has long been recognized in metastatic melanoma12 and metastatic renal cell carcinoma,13 and, in fact, such observations have motivated the interest in immunotherapy in these settings.14 Furthermore, more systematic investigations of spontaneous regression are beginning to be reported in the context of screen-detected abnormalities. There are data suggesting that regression routinely occurs in colonic adenomas (both from the National Polyp Study15 and others16) and a growing literature documenting regression in precancerous lesions of the cervix.17,18 Documentation of regression in screen-detected cancer is limited to neuroblastoma, for which investigators have found that screening detects far more cancer than will ever become clinically apparent19 and that a substantial proportion regress.20
The rapid and comprehensive introduction of screening among women aged 50 to 69 years in Akershus, Oslo, Rogaland, and Hordaland counties offers an exceptional opportunity to examine the possibility of regression in screen-detected breast cancer. Some of us (P.-H.Z. and J.M.) have reported elsewhere on the surge in incidence that ensued and argued that, because it was not compensated by a drop in incidence among women older than 69 years, much of the increase represented overdiagnosis.3 Herein, we investigated the issue further by comparing cumulative breast cancer incidence over 6 years in age-matched cohorts of women before and after the initiation of biennial mammography.
We found that the initiation of screening was associated with a substantial rise in incidence in the screened group relative to controls—a finding that is expected as the time of diagnosis is advanced (and a finding that is, in fact, necessary if screening is to reduce mortality). We also found, however, something much less expected—the excess incidence did not completely disappear following a prevalence screen in the control group. Thus, it appears that some invasive breast cancers detected by repeated mammographic screening would not persist to be detectable by a single screening at the end of 6 years. In other words, the natural course for some screen-detected breast cancers may be to spontaneously regress.
Although not widely known, there are corroborating data from randomized trials of mammography. A long-term follow-up of the Malmö trial recently reported a 10% rate of overdiagnosis 15 years after the trial was completed, most of which was diagnosed as invasive cancer.21 When calculated in terms of the cancers detected during the intervention period (as is reported herein), this extra diagnosis corresponds to an excess incidence of 19% (RR, 1.07-1.33).22 Both Canadian trials23,24 reported that a portion of the invasive breast cancers detected in the screening arm never presented in the control arm, despite screening for 4 years following the end of the trial. In the Canadian trial of women aged 50 to 59 years, the excess incidence in the screened group was about 7% (RR, 1.07; 95% CI, 0.96-1.19). Although this is a small effect relative to that reported herein, it reflects that the control group was not truly unscreened, but instead regularly screened via physical examination. The effect of the regular screening among controls was evident by aspiration, needle biopsy, and surgical biopsy rates that were only slightly lower than those of the mammography arm.25 The Canadian trial of women aged 40 to 49 years had a truly unscreened control group and is thus the most comparable to our study. In this trial the excess incidence in the screened group was 22% (RR, 1.22; 95% CI, 1.09-1.37), essentially matching our finding.
It is important, of course, to consider alternative explanations for the increased incidence we observed in the screened group. One possibility is that the case ascertainment in the Norwegian Cancer Registry became more complete over the period. However, during the period, the registry was documented to have almost perfect (98%-99%) solid tumor ascertainment rates,26,27 and its reported breast cancer incidence among women not of screening age (ie, age 30-49 years and ≥70 years) was remarkably constant.3 Another possibility is that screened and control groups are not comparable in terms of their risk of developing breast cancer. Table 1, however, demonstrates just how similar the 2 groups are, an ensured similarity given our staggered cohort design in which a substantial number of women contribute information to both groups (albeit at different periods in their life).
Some might argue that our finding of increased incidence in the screened group reflects the increasing sensitivity of mammography. A number of factors leading to increased sensitivity over time could be hypothesized, such as the following: the interpretation skills of mammographers may be increasing with practice, prior mammograms may be increasingly available for comparison, and the technology itself may be improving. The available data, however, suggest that the sensitivity is stable. First, the mean diameter of screen-detected tumors showed little change over the period (14.4 mm in 1998-1999 and 14.0 mm in 2000-2001).7 Second, the incidence among women aged 50 to 51 years—those being invited to screening for the first time—did not increase over time (annual incidence, 281 per 100 000 in 1996-1997, 286 per 100 000 in 1998-1999, and 260 per 100 000 in 2000-2001). Finally, estimated sensitivity (the ratio of screen-detected cases over the sum of screen-detected cases and interval cancers in the subsequent 2 years) itself did not increase (77.7% after the first round and 74.0% after the second round).7
Another possibility is that excess incidence reflects a temporal increase in the underlying incidence of breast cancer. To explain an increase of the magnitude we observed, however, would require a dramatic increase in cancer incidence. There are no population-based data to support this; the annual increase in breast cancer incidence in these 4 counties before screening was less than 1% per year.3 Because our comparison groups involve observation periods only 4 years apart, rising underlying incidence can explain only 4% of the increased incidence reported herein.
Nevertheless, one could posit a dramatic change in one exposure relevant to the development of breast cancer, ie, hormone therapy (HT) (estrogen + progestin), which was associated with a 24% increase in invasive breast cancer incidence in the Women's Health Initiative randomized trial.28 In fact, there was a substantial increase in mean HT exposure in women residing in the 4 counties between the 2 periods: according to data from the Norwegian Institute of Public Health, approximately 32 000 women were treated in the period 1992 through 1997 and 46 000 in the period 1996 through 2001.
In a sensitivity analysis assuming an extreme condition, ie, that all these women were in the age group analyzed herein (age 50-64 years at entry), we calculated that the addition of 14 000 women at higher risk (because of HT) within the approximately 119 000 women in the screen group would explain less than 3% of the increased incidence reported herein (24% × 14 000/119 000 = 2.8%). Even assuming the most extreme condition possible, ie, no HT use in the control group and 46 000 women using HT in the screened group, explains less than 10% of the observed effect (24% × 46 000/119 000 = 9.3%). The small influence of HT is also supported by the observation of constant breast cancer incidence from 2002 through 2005, during which the sales of estrogen-progestin combinations dropped 60%.29
Then again, there are 2 reasons to posit that our results might underestimate the increased incidence associated with screening and thus underestimate the frequency of regression among screen-detected cancers. First, some women in the control group undoubtedly had a mammogram prior to their prevalence screen. A questionnaire given to women at the time of the prevalence screen revealed that about one-half had had a previous mammogram, suggesting that opportunistic screening (screening outside of an organized program) occurred in the 4 counties during the period 1992 through 1995.30 Second, attendance for mammography was not 100% in the screened group. This contamination of both the screened and control groups tends to bias our results toward the null and make our finding an underestimate of the true effect of mammography on cumulative incidence.
The finding of increased incidence leads to an obvious question: What happened to those extra cancers that likely existed in the control group, yet were never detected? Two possibilities exist: (1) they regressed or (2) they remained dormant and then were missed on the prevalence screen.
Could a substantial population of dormant, stable cancers—cancers that neither progress nor regress—explain the excess incidence in the screened group? To answer this, consider a simple example in which there are 2000 cancers available for detection over 6 years in both the screened and control groups. Furthermore, consider an extreme condition, namely, half the cancers are stable and the sensitivity of mammography for these lesions is 50%. One thousand progressive cancers are detected either clinically or by screening in both groups (because both groups include a screen at the end of the period, progressive subclinical cancers at close of study are detected equally). Among the 1000 stable cancers in the control group, 500 are detected by the prevalence screening, making the total detection among controls 1500 (1000 + 500). Among the 1000 stable cancers in the screened group, 500 are detected in the first round of screening. Among the remaining 500, 250 are detected in the second round, and among the remaining 250, 125 are detected in the third round. The total detection among the screened group is then 1875 (1000 + 500 + 250 + 125). This example, with relatively extreme numbers, leads to a cumulative incidence ratio of 1.25.
This example highlights that for our findings to be explained by stable cancers, not only must they be very common, but also the incidence of cancer must drop in the screened group as screening continues. A drop between the first and second round is always expected as prevalent progressive cancers are detected by the first screening, as was observed in our study (incidence of invasive cancer per 100 000 population was 350 in the first round and 295 in the second round). However, with a high proportion of stable cancers, a continued drop between the second and third round would be expected as the reservoir of stable cancers is depleted. No such drop was observed; the incidence per 100 000 population was 295 in the second round and 293 in the third round. Constant incidence in the second and third rounds were also reported in Canada.21
If we add 1 more screening round to both groups in the prior example, the cumulative incidence ratio drops to 1.11. However, when we repeated our analysis after extending the follow-up to recreate this condition (comparing 4 rounds in the screening group with 2 rounds in the controls), we found little reduction in the incidence ratio (1.20 vs 1.22). Thus, the existence of a substantial reservoir of stable cancers seems improbable, and we believe that the most tenable explanation of our findings is that some screen-detected breast cancers spontaneously regress.
Although many clinicians may be skeptical of the idea, the excess incidence associated with repeated mammography demands that spontaneous regression be considered carefully. Spontaneous regression of invasive breast cancer has been reported, with a recent literature review identifying 32 reported cases.31 This is a relatively small number given such a common disease. However, as some observers have pointed out, the fact that documented observations are rare does not mean that regression rarely occurs. It may instead reflect the fact that these cancers are rarely allowed to follow their natural course.32
Another piece of supporting evidence for regression comes from the Wisconsin Breast Cancer Epidemiology Simulation Model.33 It uses a stochastic simulation to replicate breast cancer incidence and mortality rates in the US population during the period 1975 through 2000, when screening was introduced. To fit the observed statistics, it was necessary to postulate that approximately 40% of initiated breast cancers fell in a class of so-called limited malignant potential, ie, tumors that “progress to a maximum of approximately 1-cm diameter, dwell at this size for 2 years, and then regress if undetected.”33(p43)
It is important to further test the possibility of regression by analyzing data from other screening programs. One condition for confirming the possibility of regression, ie, that the introduction of the screening program has caused a marked and sustained increase in incidence, seems to be present in almost all 40 counties in Norway and Sweden.34 Why and how such regression may occur and whether the regression is complete are questions of considerable biological and clinical interest.
Finally, it is also important to emphasize that our findings have no bearing on the debate on whether screening mammography reduces breast cancer mortality. Our findings are equally consistent with the possibility that mammography either leads to a reduction in breast cancer mortality or has no effect at all. Instead, our findings simply provide new insight on what is arguably the major harm associated with mammographic screening, namely, the detection and treatment of cancers that would otherwise regress.
Correspondence: H. Gilbert Welch, MD, MPH, VA Outcomes Group, Department of Veterans Affairs Medical Center, White River Junction, VT 05009 (email@example.com).
Accepted for Publication: April 8, 2008.
Author Contributions:Study concept and design: Zahl, Mæhlen, and Welch. Acquisition of data: Mæhlen. Analysis and interpretation of data: Welch. Drafting of the manuscript: Zahl, Mæhlen, and Welch. Critical revision of the manuscript for important intellectual content: Welch. Statistical analysis: Zahl, Mæhlen, and Welch. Obtained funding: Mæhlen. Administrative, technical, and material support: Welch. Study supervision: Welch.
Financial Disclosure: None reported.
Funding/Support: This study was supported in part by a Research Enhancement Award from the Department of Veterans Affairs (03-098).
Disclaimer: The views expressed herein do not necessarily represent the views of the Department of Veterans Affairs or the US government.
Additional Information: An appendix that provides the detailed RR calculation for a single 4-year age group (women aged 56 through 59 years at entry) is provided online (http://www.vaoutcomes.org/downloads/Appendix_Table.pdf).
Additional Contributions: William Black, MD, Eric Larson, MD, MPH, and colleagues in the Norwegian Institute of Public Health, the University of Oslo, and the VA Outcomes Group provided feedback regarding the manuscript, which enhanced both our thinking and the presentation of our results.