Kaiser Permanente Northern California, 2011 to 2012. Numbers on Google Earth images indicate clusters in 1, Marin County; 2, west central Contra Costa County; 3, east Contra Costa and west San Joaquin Counties; 4, eastern Sacramento County; 5, southern San Mateo and eastern Santa Clara Counties; 6, central Placer County near Lincoln; 7, eastern Sonoma and northern Napa Counties.
Kaiser Permanente Northern California, 2011 to 2012. Numbers on Google Earth images indicate clusters in 1, Marin County; 2, northeast Sacramento County; 3, east Contra Costa and west San Joaquin Counties; 4, west central Contra Costa County; 5, eastern Sonoma and northern Napa Counties.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Ray GT, Kulldorff M, Asgari MM. Geographic Clusters of Basal Cell Carcinoma in a Northern California Health Plan Population. JAMA Dermatol. 2016;152(11):1218–1224. doi:10.1001/jamadermatol.2016.2536
Are there geographic clusters of basal cell carcinoma among the membership of a large health plan in northern California?
In this study of data from a basal cell carcinoma registry, after adjustment for age, sex, and neighborhood socioeconomic status, 5 discrete geographic clusters of basal cell carcinoma emerged.
Identifying geographic clusters can help inform future research on the underlying etiology of the clustering including factors related to the environment, health care access, or other characteristics of the resident population, and can help target screening efforts to areas of highest yield.
Rates of skin cancer, including basal cell carcinoma (BCC), the most common cancer, have been increasing over the past 3 decades. A better understanding of geographic clustering of BCCs can help target screening and prevention efforts.
Present a methodology to identify spatial clusters of BCC and identify such clusters in a northern California population.
Design, Setting, and Participants
This retrospective study used a BCC registry to determine rates of BCC by census block group, and used spatial scan statistics to identify statistically significant geographic clusters of BCCs, adjusting for age, sex, and socioeconomic status. The study population consisted of white, non-Hispanic members of Kaiser Permanente Northern California during years 2011 and 2012.
Main Outcomes and Measures
Statistically significant geographic clusters of BCC as determined by spatial scan statistics.
Spatial analysis of 28 408 individuals who received a diagnosis of at least 1 BCC in 2011 or 2012 revealed distinct geographic areas with elevated BCC rates. Among the 14 counties studied, BCC incidence ranged from 661 to 1598 per 100 000 person-years. After adjustment for age, sex, and neighborhood socioeconomic status, a pattern of 5 discrete geographic clusters emerged, with a relative risk ranging from 1.12 (95% CI, 1.03-1.21; P = .006) for a cluster in eastern Sonoma and northern Napa Counties to 1.40 (95% CI, 1.15-1.71; P < .001) for a cluster in east Contra Costa and west San Joaquin Counties, compared with persons residing outside that cluster.
Conclusions and Relevance
In this study of a northern California population, we identified several geographic clusters with modestly elevated incidence of BCC. Knowledge of geographic clusters can help inform future research on the underlying etiology of the clustering including factors related to the environment, health care access, or other characteristics of the resident population, and can help target screening efforts to areas of highest yield.
Basal cell carcinoma (BCC) is the most common cancer in the United States, estimated to affect between 1 and 2 million Americans annually.1,2 Despite the large population affected, BCCs have been difficult to study owing in part to their exclusion from national cancer registries such as the Surveillance, Epidemiology, and End Results program and, prior to 2012, their lack of unique International Classification of Diseases identifiers.3 Such data limitations have hampered the study of BCC etiology, incidence, and disease burden and the development of consistent health care policies to guide screening and preventive measures.4 To overcome these data limitations, we developed and validated a registry that uses electronic pathology records to identify BCCs among members of Kaiser Permanente Northern California (KPNC), a large integrated health care delivery system.5 We used this BCC registry to identify geographic clusters of BCCs among the KPNC membership.
Identification of clusters is important in epidemiologic studies of cancer.6 When a statistically significant excess of cases is noted, subsequent epidemiologic studies can investigate whether the cluster is related to the environment, health care access, or other characteristics of the resident population, and the investigation of clusters may uncover previously unknown etiologies. In addition, knowledge of the geographic patterns of cancers in a population can be used to more effectively direct screening and prevention efforts. In the United States, spatial clustering studies have been performed for melanoma in the state of Massachusetts7 and more recently in northern California.8 However, to our knowledge there have been no published reports in the US of spatial clustering for BCC.
In this study, we used spatial scan statistics to identify BCC geographic clusters among a northern California population in the years 2011 to 2012. After adjustment for the known BCC risk factors of age and sex, the role of neighborhood socioeconomic status (SES) was explored.
Kaiser Permanente Northern California is a large, integrated health care delivery system that provides health care to more than 3.5 million members residing in northern California, representing approximately 33% of the insured population and 28% of the total population in its service area. The member population reflects the general population in the northern California region, although, as an insured population, it underrepresents persons with low levels of education and income.9
Kaiser Permanente Northern California maintains a computerized system for all pathology specimens that records specimen types, anatomic locations, gross and microscopic diagnoses, and Systematized Nomenclature of Medicine (SNOMED) codes, which are assigned based on standardized classifications of pathology diagnoses. These codes were used to identify a BCC registry at KPNC that was used in this study and has previously been validated.5
This study was approved by the Kaiser Foundation Research Institute Institutional Review Board. The Declaration of Helsinki protocols were followed and a waiver of informed consent was obtained.
We extracted all electronic pathology reports of specimens collected during years 2011 and 2012 with SNOMED code M809xx. We have previously shown that pathology records with SNOMED code M809xx have a positive predictive value of 99.2% for being a true BCC.5 Consistent with prior studies of BCC,2,10 we retained, for each patient, only those BCC specimens not preceded by another BCC specimen in the prior 365 days, and considered these to be new primary BCCs. Only those BCCs for patients who were active KPNC members when the specimen was collected were retained, thus ensuring the availability of a current address, and to make numerator and denominator inclusion requirements consistent. Patient age, sex, self-reported race/ethnicity, and home address at the time of BCC diagnoses were extracted from KPNC membership databases. Because they had almost all (92%) of the BCC cases,2 only white, non-Hispanic patients were included. Patients were assigned to census block groups from the US 2010 decennial census based on home address.11 We retained patients who lived within 14 counties that we considered to be KPNC’s primary service area.
Living in areas with less socioeconomic deprivation is associated with higher risk of melanoma8,12,13 and BCC.14 As a measure of SES, we used a neighborhood deprivation index (NDI), which has been described previously15 and has been used in prior studies of KPNC members.16,17 The NDI was created using data from the 2006 through 2010 American Community Survey collected by the US Census Bureau11,18 and was generated through principal components analysis of 8 variables at the census tract level (the lowest level for which all NDI variables are available), including percentages of males working in management and professional occupations, residents living in crowded housing, households in poverty, households headed by females with dependents, households receiving public assistance, households earning less than $30 000 per year, residents at least 25 years of age with less than a high school education, and residents at least 16 years of age who are unemployed.16 The NDI was standardized to have a mean of 0 and standard deviation of 1 by dividing it by the square of the eigenvalue.16 Higher NDIs equate to greater deprivation.
The denominator for calculating BCC incidence rates was KPNC member-years. We identified all white, non-Hispanic persons who were members of KPNC at any time during years 2011 and 2012. Membership systems track membership on a monthly basis, and persons could enter and leave the health plan throughout this time. Member addresses were assessed each month, and only months in which the member was living within 1 of the 14 service-area counties were included. We summarized member-years (over the 2-year period) into strata by census block group, sex, and age (in 8 categories: 0 to <30 years, 30 to <90 in 10-year increments, and ≥90). Cases of BCC were summarized into the same strata, and (when calculating incidence rates) persons could contribute more than 1 case to the numerators (although only 1 in any given year). Incidence rates were calculated by dividing the number of cases in each stratum by the person-years in that stratum.
Geographic clusters of BCC were identified using the spatial scan statistic and the free SaTScan software.19 The discrete Poisson version of the spatial scan statistic was used to detect and evaluate geographical clusters where a higher than expected number of cases were diagnosed, given the underlying covariate-adjusted population-years, at a statistical significance level of P < .05. This is done by gradually scanning a circular variable size window across space, noting the number of observed and expected observations inside each of many thousands of evaluated circles, and calculating the likelihood for each. The circle with the maximum likelihood is the most likely cluster, that is, the cluster least likely to be due to chance. Secondary clusters are also identified. Using Monte Carlo hypothesis testing, the P value assigned to each cluster is adjusted for the multiple testing inherent in the large number of circles evaluated. The following SaTScan options were used: Maximum Spatial Cluster Size: 10% of the population at risk; Maximum Monte Carlo replications: 99 999; Criteria for Reporting Clusters: Gini Optimized Cluster Sizes; Gini Index Based Collection: Optimal Index Only. The spatial analysis included only the first BCC per patient and was first performed adjusting only for sex and age (categorical: 0 to <30 years, 30 to <90 in 10-year increments, and ≥90). We ran a subsequent analysis adjusting for the standardized NDI (in 4 groups:<−1, −1 to <0, 0 to <1, ≥1), in addition to sex and age. For all primary spatial analyses, case and denominator data were loaded into SaTScan at the census block group level.
Because SaTScan results can vary depending on the area unit chosen (such as county, census tract, or census block group),20 we conducted a sensitivity analysis in which we ran the sex- and age-adjusted spatial analysis at the census tract, rather than the census block group, level.
An important risk factor for skin cancers is exposure to UV radiation.8,21-25 We had no measures of individual-level UV exposure—either recent or lifetime. To explore the relationship between BCC and UV exposure, we used the UV watt-hour per square meter measure from the county-level UV exposure database.26 The methodology used to calculate county-based watt-hours per square meter is described by Tatalovich et al24 and has been used as an ecologic measure of UV exposure in studies of skin cancer risk.27,28
We considered running a spatial analysis adjusting for the county-level UV (in addition to age, sex, and NDI). However, our data showed no correlation between BCC rates by county and the UV measure. In fact, there was a modest, nonsignificant, negative correlation. Prior reports have noted that ecologic measures of UV exposure may not be good proxies for individual-level exposure.28 Given these preliminary results, we decided not to adjust for UV exposure because the county-level measures cannot explain the high-incidence areas.
We identified 29 266 BCC cases in the white, non-Hispanic KPNC population during years 2011 and 2012, representing 28 408 unique persons. Patient mean (SD) age was 68.5 (13) years, and 57% (n = 16 234) were male (Table 1).
The BCC incidence rate in years 2011 to 2012 was 930 cases/100 000 person years, with rates for males being 55% higher than for females (1144 vs 737 cases per 100 000 person-years). Rates increased with age, with the highest rates occurring in those 80 to 90 years of age.
Incidence of BCC ranged from a low of 661/100 000 person-years in San Joaquin County to a high of 1598/100 000 person-years in Marin County. With adjustment for sex and age only, the spatial analysis identified 7 statistically significant BCC clusters (pictured over incidence rate quintiles by census block group) (Figure 1). One cluster that was identified covered most of southern Marin County. Persons living in this cluster had a 33% (relative risk [RR], 1.33; 95% CI, 1.12-1.57; P < .001) increased risk of BCC compared with persons living outside that cluster (Table 2). Other clusters identified were an area in western Contra Costa County centered near Walnut Creek (RR, 1.25; 95% CI, 1.10-1.43; P < .001); an area encompassing the border between Contra Costa County and San Joaquin County, with most of the population in this cluster residing in eastern Contra Costa county (RR, 1.42; 95% CI, 1.15-1.75; P < .001); an area in eastern Sacramento County encompassing Rancho Cordova (RR, 1.23; 95% CI, 1.09-1.39; P < .001); southern San Mateo and eastern Santa Clara counties, mostly including the populations just to the west of Redwood City, Palo Alto, Sunnyvale, Cupertino, and Saratoga (RR, 1.22; 95% CI, 1.08-1.37; P < .001); a small cluster near the town of Lincoln in Placer County (RR, 1.37; 95% CI, 1.11-1.70; P = .004); and eastern Sonoma and northern Napa counties (RR, 1.20; 95% CI, 1.04-1.38; P = .01). Of the 28 408 patients with a BCC case during this period, 7684 lived in one of the identified clusters. The overall BCC incidence rate per 100 000 person-years within the clusters was 1537 vs 944 outside the clusters. The mean standardized NDI (weighted by person-years) of the census block groups in the 7 clusters was −0.65, whereas in all block groups outside the clusters it was −0.28, indicating that neighborhoods within BCC clusters were less deprived than those outside the clusters. Selecting 1 component of the NDI, we found that average median family income in block groups inside the clusters was $112 175, whereas outside the clusters it was $91 791.
In the sensitivity analysis whereby patients were geocoded to the larger census tract level, the clusters identified were similar to those derived from the block group analysis, although some clusters expanded or contracted modestly.
After adjustment for neighborhood SES, the Marin County cluster remained almost identical; clusters 5 and 6 (the southern San Mateo/eastern Santa Clara cluster and the Placer County cluster) from the original analysis were no longer significant. The remaining clusters were similar—although of somewhat different size and placement—to the original clusters (Table 3 and Figure 2).
Using a large, validated BCC registry, we found the incidence of BCCs in this white, non-Hispanic, northern California population to be 930 cases per 100 000 person-years, similar to previously published findings in this setting.2 When adjusting for sex and age, we identified 7 geographic clusters of BCC. These clusters were areas of relatively low socioeconomic deprivation—findings consistent with a prior BCC study, which showed education and disposable income to be positively associated with BCC risk.29 After adjustment for deprivation, 5 clusters remained, each similar to 1 of the original 7 clusters.
Epidemiologists have often described and mapped health data by geographic units such as state, county, zip code, or census tract, in a manner similar to the choropleth maps of BCC rates by census block group in the background of Figures 1 and 2. However, these maps are quantitatively difficult to assess due to the arbitrariness of the geographic boundaries and the potentially large differences in the size of the population contained within each unit, and thus the significance of any given unit’s rate.20,30 The spatial scan statistic avoids these problems by using a rigorous statistical approach to identifying statistically significant clusters, adjusting for multiple testing. To our knowledge, the present study is the first to use the spatial scan statistic to identify BCC clusters.
Our primary goal was to apply the spatial scan statistic to BCC incidence to identify geographic areas where screening and prevention efforts might be targeted, and to motivate future research relating to etiology. Although our purpose was not to identify causes of BCC or nonspatial correlates of BCC, our analyses were consistent with prior findings that persons in less socioeconomically deprived areas are at higher risk of BCC. Clusters 5 and 6 in the sex- and age-adjusted analysis (the clusters encompassing parts of Silicon Valley and the small cluster in Placer County) had the third-lowest and lowest mean NDI of the 7 clusters and dropped out after adjustment for neighborhood deprivation. However, other clusters encompassing low-deprivation census tracts such as the one in Marin County and west central Contra Costa County remained essentially the same after adjustment. Our measure of deprivation was at the census tract, not the individual, level, and may not be completely adjusting for SES. We considered census tract NDI to be primarily a proxy for individual SES because we presume that the primary relationship between BCC and SES operates at the individual level. However, there may be a neighborhood-level association in addition to any individual-level association. Future research could help distinguish between the individual- and neighborhood-level effects.
One of the BCC clusters was in part of Marin County, and this is consistent with reports that melanoma incidence is also higher in Marin County than in the surrounding area.31 The authors of a report on high melanoma incidence in Marin County concluded that the elevated incidence was likely explained by a higher concentration of persons with known risk factors for melanoma, as opposed to any characteristics unique to the geography of the county,31 and we suspect that this is also true for BCC. In either case, the potential usefulness of identifying clusters for the targeting of screening and prevention holds. However, other possible explanations for the existence of BCC clusters may be related to differences in access to services and differences in the propensity of persons to seek treatment. Differences due to access to care should be attenuated in this population of insured persons belonging to the same health plan. However, if an increased propensity to seek treatment is the primary cause of the clusters, then the clusters may not be identifying true “hot spots” of BCC.
Strengths of this study include its large sample size, the ability to ascertain BCCs, the availability of residence address information and neighborhood deprivation status, and the application of a rigorous methodology for identifying statistically significant geographic clusters. One aspect of the large sample size is that the relative risks within the clusters were modest, with none being higher than 1.42. This is reassuring, in that it indicates that there are no particularly large hot-spot clusters in this region. A limitation is that we were not able to investigate the possible relationship between UV radiation and BCC incidence. Our measure of UV radiation at the county level was not correlated with BCC incidence at the county level, which precluded its use as a useful adjustment variable. Within a limited geographic region where the differences in ambient UV radiation are small (a range of approximately 4500 to 5100 Wh/m2 among the 14 counties in this study) and where the population is relatively transient,32 measures of ambient UV, even at a more refined geographic level, might be weak proxies for individual-level exposure. We also did not have data on skin classification types for KPNC members and thus could not adjust for them. Whereas such measures as individual UV radiation exposure and skin type would be useful for identifying risk factors for BCC, the identification of BCC clusters still has the value of identifying locations where rates are higher, regardless of the causes of those higher rates. We report analyses with and without adjustment for neighborhood SES because a relationship has been found in other studies between SES and risk of BCC.14 The etiology of this association may be due to differential UV exposure—for example, higher-SES persons may be more likely to vacation in sunny places or use tanning beds. Or, they may be more likely to seek medical help for skin abnormalities. Thus, another limitation of this study is that we cannot discern the underlying etiology of variables, such as SES, that affect BCC risk. Finally, the KPNC population may not be completely representative of the US population and, in particular, may underrepresent persons at the high or low ends of the income spectrum.
In this study of a northern California population, we identified several geographic clusters with modestly elevated incidence of BCC. Although identification of geographic clusters does not reveal causes, the statistical technique used to identify clusters can be used to direct future studies identifying the reason for clustering. Knowledge of such clusters can motivate hypothesis generation and help direct and inform future research on the underlying etiology of the clustering, including factors related to the environment, health care access, or other characteristics of the resident population, and can help target screening efforts to areas of highest yield.
Accepted for Publication: June 8, 2016.
Corresponding Author: G. Thomas Ray, MBA, Division of Research, Kaiser Permanente, 2000 Broadway, Oakland, CA 94612 (email@example.com).
Published Online: July 20, 2016. doi:10.1001/jamadermatol.2016.2536
Author Contributions: Mr Ray had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Ray.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Ray.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Kulldorff.
Obtaining funding: Asgari.
Study supervision: Asgari.
No additional contributions: Ray.
Conflict of Interest Disclosures: Mr Ray has received research support from grants through his institution in the past 3 years from Pfizer, Merck, Genentech, and Purdue Pharma. Dr Asgari has served as an investigator for research studies funded by Valeant Pharmaceuticals and Pfizer Inc. No other disclosures are reported.
Funding/Support: This study was supported in part by the National Cancer Institute (Cancer Research Network Across Health Care Systems) grant U19CA79689 (to Dr Asgari), and in part by National Cancer Institute grant RO1CA165057 (to Dr Kulldorff).
Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Create a personal account or sign in to: