Activity metrics for 4 UNGD phases: well pad preparation (pads developed), well drilling (after wells spudded), stimulation (hydraulic fracturing, or “fracking”), and production.
aNew York patients with asthma (n = 72) are not shown.
eFigure 1. Statistical model.
eFigure 2. Locations of cases and controls by quartile of spud activity metric.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Rasmussen SG, Ogburn EL, McCormack M, et al. Association Between Unconventional Natural Gas Development in the Marcellus Shale and Asthma Exacerbations. JAMA Intern Med. 2016;176(9):1334–1343. doi:10.1001/jamainternmed.2016.2436
Copyright 2016 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.
Asthma is common and can be exacerbated by air pollution and stress. Unconventional natural gas development (UNGD) has community and environmental impacts. In Pennsylvania, UNGD began in 2005, and by 2012, 6253 wells had been drilled. There are no prior studies of UNGD and objective respiratory outcomes.
To evaluate associations between UNGD and asthma exacerbations.
A nested case-control study comparing patients with asthma with and without exacerbations from 2005 through 2012 treated at the Geisinger Clinic, which provides primary care services to over 400 000 patients in Pennsylvania. Patients with asthma aged 5 to 90 years (n = 35 508) were identified in electronic health records; those with exacerbations were frequency matched on age, sex, and year of event to those without.
On the day before each patient’s index date (cases, date of event or medication order; controls, contact date), we estimated activity metrics for 4 UNGD phases (pad preparation, drilling, stimulation [hydraulic fracturing, or “fracking”], and production) using distance from the patient’s home to the well, well characteristics, and the dates and durations of phases.
Main Outcomes and Measures
We identified and defined asthma exacerbations as mild (new oral corticosteroid medication order), moderate (emergency department encounter), or severe (hospitalization).
We identified 20 749 mild, 1870 moderate, and 4782 severe asthma exacerbations, and frequency matched these to 18 693, 9350, and 14 104 control index dates, respectively. In 3-level adjusted models, there was an association between the highest group of the activity metric for each UNGD phase compared with the lowest group for 11 of 12 UNGD-outcome pairs: odds ratios (ORs) ranged from 1.5 (95% CI, 1.2-1.7) for the association of the pad metric with severe exacerbations to 4.4 (95% CI, 3.8-5.2) for the association of the production metric with mild exacerbations. Six of the 12 UNGD-outcome associations had increasing ORs across quartiles. Our findings were robust to increasing levels of covariate control and in sensitivity analyses that included evaluation of some possible sources of unmeasured confounding.
Conclusions and Relevance
Residential UNGD activity metrics were statistically associated with increased risk of mild, moderate, and severe asthma exacerbations. Whether these associations are causal awaits further investigation, including more detailed exposure assessment.
Asthma is a common chronic disease—in 2010, 25.7 million people in the United States had asthma, a prevalence of 8.4%.1 Asthma is characterized by variable and recurring symptoms (including cough, wheezing, shortness of breath, and chest tightness), reversible airflow obstruction, bronchial hyperresponsiveness, and underlying inflammation.2,3 In 2009, there were 11.8 million outpatient visits, 2.1 million emergency department visits, and 479 300 hospitalizations for asthma in the United States.1
Outdoor air pollution is a recognized cause of asthma exacerbations. A large body of literature links asthma exacerbations to exposure to air pollutants, including ozone, particulate matter, nitrogen dioxide, and sulfur dioxide,2,4 and exposure to even low levels of these pollutants has been associated with asthma hospitalizations, emergency department visits, and rescue medication use, with latency between 0 and 5 days.5-11 Stress at the individual and community levels is also associated with asthma exacerbations.12 Psychosocial stress can modify the effects of environmental triggers13 and is associated with worse asthma control and medication aderence.14
Unconventional natural gas development (UNGD) has recently become a major energy source domestically and worldwide. Pennsylvania has proceeded with UNGD rapidly—between the mid-2000s and 2012, 6253 wells were drilled. In contrast, New York and Maryland, also in the Marcellus shale, have not developed.15,16 Despite calls for research on the health effects of the industry, there are few published studies of public health impacts of UNGD.17,18
The first step of UNGD is well pad preparation, lasting about 30 days, during which 3 to 5 acres are cleared, and materials are brought to the site.19 Drilling begins on the spud date and typically lasts up to a month as a well is drilled vertically 2000 to 3000 m and horizontally 600 to 3000 m.19 After drilling is completed, the horizontal portion is perforated. Stimulation, also called hydraulic fracturing or “fracking,” follows; this process lasts about a week and requires 11 to 19 million liters of water, sand, and chemical additives (eg, friction reducers, biocides, gelling agents).19,20 Development to this point requires over 1000 truck trips per well.19 After stimulation, gas production begins. The Pennsylvania Department of Environmental Protection requires companies to submit documentation at most of these stages of well development.21
UNGD has been associated with air quality and community social impacts.22-29 Psychosocial stress,12 exposure to air pollution4,30 including from truck traffic,31 sleep disruption,32,33 and reduced socioeconomic status34 are all biologically plausible pathways for UNGD to affect asthma exacerbations. To date, there have been no epidemiologic studies of UNGD and objective respiratory outcomes. Respiratory outcomes are appropriate outcomes to assess potential health impacts of UNGD because these have clear links to air pollution and stress, have short latency between exposure and health effects, are common in the general population, and prompt patients to seek care and so are captured by health system data. Using electronic health record (EHR) data from the Geisinger Clinic, located in over 35 counties in Pennsylvania, including many with active UNGD, we conducted a nested case-control study of the association between 4 UNGD activity metrics and asthma exacerbations.
Question Is there an association between unconventional natural gas development (UNGD) and asthma exacerbations?
Findings In this nested case-control study of 35 508 patients with asthma, those in the highest quartile of residential UNGD activity had significantly higher odds of 3 types of asthma exacerbations (new oral corticosteroid medication orders, emergency department visits, and hospitalizations) than those in the lowest quartile.
Meaning UNGD activity near patient residences was associated with increased odds of mild, moderate, and severe asthma exacerbations.
We identified patients with asthma from the Geisinger Clinic population, which is representative of the general population in the region.35 We included Pennsylvania and New York patients and, using International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) codes, excluded patients with cystic fibrosis (277.0x), chronic pulmonary heart disease (416.x), paralysis of vocal cords or larynx (478.3x), bronchiectasis (494.xx), and pneumoconiosis (500.xx-508.xx). We required patients to have at least 2 encounters or medication orders with ICD-9-CM codes for asthma on different days.36 Patients were geocoded using previously published methods,37 88.9% to home address, 2.6% to ZIP + 4, and 8.5% to ZIP code centroid. Inclusion criteria also included contact with Geisinger from 2005 through 2012 while between the ages 5 and 90 years and recorded information on sex (n = 35 508). The study was approved by the Geisinger Health System institutional review board (which has an authorization agreement with the Johns Hopkins Bloomberg School of Public Health) with a waiver of consent and a waiver of HIPAA authorization. Patients were not paid for their participation.
We identified and defined new oral corticosteroid (OCS) medication orders, asthma emergency department encounters, and asthma hospitalizations as mild, moderate, and severe exacerbations, respectively. For patients with more than 1 exacerbation of a given type within a calendar year, we randomly selected 1 event. For mild exacerbations, we distinguished new OCS medication orders from 2008 through 2012 for an asthma exacerbation from standing orders or OCS ordered for other diseases (Figure 1). The medication order date was considered the index date. OCS orders from before 2008 were excluded because these were not consistently captured before then. For moderate and severe exacerbations, we identified all emergency and hospitalization encounters from 2005 through 2012. Primary or secondary diagnoses for asthma (ICD-9-CM code 493.x) were used to identify emergency or hospitalization encounters. Patients who had multiple emergency or hospitalization encounters within 72 hours were considered to have a single event. Emergency and hospitalization encounters within 72 hours were identified as a single hospitalization. The first encounter or admission date of each group of combined encounters was the index date. For patients with more than 1 type of exacerbation within a week, we retained only the higher category.
We identified controls from patients with asthma under observation by the health system, so that if the patient were to have an exacerbation, it would be captured by the EHR. All patient contact dates were identified (eg, encounter, order, test). Because many of the covariates and the UNGD metrics were time varying, we needed a single date on which to assign these variables. Therefore, for controls, we randomly selected 1 contact date per year per patient. A case patient was always eligible to be a control for a less severe event or for an event of equal or greater severity until the year of the case patient’s event. We frequency matched cases to controls by age category (5-12, 13-18, 19-44, 45-61, 62-74, or ≥75 years), sex (male or female), and year of encounter.
We created time-varying covariates (age, season of event, smoking status, overweight and obesity status, Medical Assistance [as a measure of low family socioeconomic status], type 2 diabetes) for each index date and non–time-varying covariates (sex and race/ethnicity) for each patient. Race/ethnicity was assessed by patient self-report and was included because it is a well-documented confounder in studies of asthma.2 We estimated the patients’ distance to the nearest major and minor road using a network from the Federal Highway Administration38 and used patients’ geographic coordinates to assign them to a community using a mixed definition of place and calculated community socioeconomic deprivation for these places.37,39 In cities, communities were defined by census tracts; elsewhere, communities were defined by minor civil divisions (townships and boroughs). We estimated the peak temperature on the day before each index date using data from the nearest weather station to each patient.40
Well data were obtained from the Pennsylvania Department of Environmental Protection for well spud (start of drilling) and production, the Pennsylvania Department of Conservation and Natural Resources for information on well stimulation (hydraulic fracturing) and depths, and SkyTruth, which used crowdsourcing of aerial photographs from the US Department of Agriculture to identify the location of wellpads.41 For each well, we had information on the well pad; latitude and longitude; dates of spudding, stimulation, and production; total depth; and volume of natural gas produced and the number of production days. We imputed missing total depths (0.4%) using conditional mean imputation. We estimated missing production quantities (0.2%) by averaging production quantities in the prior and following period. We extrapolated missing spud (2.0%) and stimulation (34.6%) dates using the well’s available dates of development by requiring that the stimulation date fall between the spud and production start dates and by using median durations between phases from wells without any missing dates.
We estimated the UNGD activity metrics using an inverse distance-squared method for pad preparation, spud, stimulation, and production phases. We compared activity metrics on the day before, 3 days before, the sum of 3 to 5 days before, and the sum of 1 to 5 days before the index date, and because they were highly correlated (Spearman correlation coefficients ranged from 0.96 to 1.00), we used only the day before the index date.
For the pad preparation and spud metrics, we used Equation 1:
where n is the number of wells and d2ij is the squared distance (in meters) between well i and patient j. For the stimulation metric, we used Equation 2:
where n is the number of wells, dij2 is the squared distance (in meters) between well i and patient j, and ti is the total well depth (in meters) of well i. Total depth was used as a surrogate for truck traffic because volume of water used during stimulation42 was highly correlated with total depth, and water is trucked to the well during stimulation. For the production metric, we used Equation 3:
where n is the number of wells, dij2 is the squared distance (in meters) between well i and patient j, and vi is the daily natural gas production volume (cubic meters) of well i. Production volume was used as a surrogate for fugitive emissions and compressor engine activity.22
Based on descriptions of the process19 and our data, we estimated that pad development lasted 30 days before the spud date for the first well on a pad; drilling lasted between 1 and 30 days after the spud date based on total depth; and stimulation lasted 7 days. All wells in Pennsylvania in a given phase on the day prior to an index date contributed to that phase’s activity metric (Equations 1-3). We divided the 4 continuous metrics (pad preparation, drilling, stimulation, and production) into quartiles using all 69 548 index dates from all 3 outcomes (mild, moderate, or severe asthma exacerbation), so the cut points were the same for all outcomes (very low, low, medium, or high).
To assess the association of the 4 UNGD activity metrics with the 3 types of asthma exacerbations, we used multilevel logistic regression with random intercept for patient and community to account for multiple events per patient and patient clustering within communities. The base model included 1 of the 4 UNGD activity metrics (very low, low, medium, or high), age category (5-12, 13-18, 19-44, 45-61, 62-74, or ≥75 years), sex (male or female), race/ethnicity (black, Hispanic, white, or other), family history of asthma (yes or no), smoking status (former, current, never, or data missing), season (summer, fall, winter, or spring), Medical Assistance (yes or no), and overweight/obesity status (using BMI percentile for children and BMI for adults43) as covariates. We then added, 1 at a time, type 2 diabetes (yes or no), community socioeconomic deprivation (across quartiles),37,39 distances to nearest major and minor arterial road (in meters, z transformed), and maximum temperature on the day prior to the event (degrees Celsius, per interquartile range [IQR]) (eFigure 1 in the Supplement). We included the continuous covariates as linear and quadratic terms to allow for nonlinearity and used a 2-sided type 1 error rate of .05 for significance testing. We used Stata software, version 11.2 (StataCorp LP) and R software, version 3.1.2 (R Foundation for Statistical Computing), for our analyses.
We calculated the intraclass correlation coefficient for the person and community levels. The proportions of total variance that were accounted for by between-community variation and between-person variation, respectively, were 14% and 63% for severe exacerbations, 41% and 89% for moderate exacerbations, and 1% and 59% for mild exacerbations. We evaluated covariates for conditional significance as they were added to the models.
To evaluate how the 4 separate UNGD activity metrics compared with a summary measure, we calculated z scores using continuous metrics, summed the z scores, and re-ran the final models with this combined UNGD activity metric (across quartiles). To explore whether an unmeasured confounder was responsible for our associations, we evaluated associations with encounters for a negative control44 (intestinal infectious disease and noninfectious gastroenteritis, ICD-9-CM codes 001-009 and 558.9, respectively) among patients with asthma, and we also replaced the UNGD activity metric with indicators for counties. We were concerned about the unbalanced numbers of cases and controls for certain age categories, sex, and years in the mild exacerbations analysis, so we reran the analysis dropping the unbalanced cells. To check the sensitivity to geocoding level, we re-ran the final model for the production UNGD metric and each outcome using only patients who were geocoded to their home address. We estimated how large an unmeasured confounder would need to be to account for the observed associations, in whole or in part.45
Between 2005 and 2012, 6253 unconventional natural gas wells were spudded on 2710 pads; 4728 were stimulated; and 3706 were in production. The median number of wells per pad was 1 (IQR, 1-3), and the median total depth was 3394 m (IQR, 2934-3839 m). Most development occurred after 2007 (Figure 2). On their index date, patients in the highest group of the spud metric lived a median of 19 km from the closest spudded well compared with 63 km for patients in the lowest group. We identified 5600 severe, 2291 moderate, and 25 647 mild exacerbations. After retaining 1 event per type per year per person, 4782 severe, 1870 moderate, and 20 749 mild exacerbations were included. There was substantial overlap of patients and wells in the northern counties (Figure 3) and substantial overlap of patients by quartile of UNGD activity metric (eFigure 2 in the Supplement).
Demographic and clinical variables differed by outcome, in many cases significantly, and the specific data quantifying these significant differences are reported in Table 1. Compared with patients with mild and moderate exacerbations, patients with severe exacerbations were more likely to be female, older, current smokers, and obese (all P < .001; see Table 1 for all supporting data). Patients with moderate exacerbations were more likely to be on Medical Assistance and of black race than patients with the other 2 outcomes, and patients with mild exacerbations were more likely to live in townships than patients with the other 2 outcomes (all P < .001; see Table 1 for all supporting data).
For severe, moderate, and mild exacerbations, the average percentage changes for all odds ratios (ORs), from simple models with random intercepts for person and place without covariates to fully adjusted multilevel models, were −8.5%, −0.2%, and 6.0%, respectively, suggesting little sensitivity of the associations with measured covariates. In adjusted models, the high activity (vs very low) of each UNGD metric was associated with each asthma outcome (Table 2), except for the pad metric with mild exacerbations. Associations for the other 11 exposure-outcome pairs ranged from OR, 1.5 (95% CI, 1.2-1.7) for pad metric with severe exacerbations to OR, 4.4 (95% CI, 3.8-5.2) for production metric with mild exacerbations. Of the 12 activity metric-outcome pairs, 6 had increasing ORs across quartiles 2, 3, and 4.
The 4 UNGD activity metrics, calculated for all case and control index dates (n = 69 548), were correlated with one another (Spearman correlation coefficients of the continuous variables ranged from 0.73 to 0.91). In the analysis to evaluate associations of a combined UNGD activity metric of the 4 phases of development, the OR point estimates were between those from regressions of each phase separately. In the negative disease control analysis, we found no association of the spud activity metric with gastrointestinal illness. In a model evaluating associations of counties with outcomes (UNGD metrics removed), counties with high UNGD activity were not associated with outcomes. In the analysis that removed cells with unbalanced numbers of cases and controls in the mild exacerbation analysis, associations were attenuated (ORs decreased by 5%, 17%, 37%, and 55% for the high group OR for the pad, spud, stimulation, and production metrics, respectively). In the analysis to evaluate the impact of different quality of geocoding, associations were unchanged. In the analysis of the mild and severe exacerbations, we determined that even an unmeasured confounder strongly associated with both UNGD activity and outcome (eg, both ORs, 3.0), and a prevalence of 0.3 in the exposed group, would not likely change our inference about associations, given our models. However, for moderate exacerbations, an unmeasured confounder with the same characteristics could account for 2 of the 3 statistically significant associations.
We conducted a nested case-control study in a large number of patients with asthma using EHR data in Pennsylvania from 2005 through 2012, a period of rapid development. In this first study of UNGD and objective respiratory outcomes, we found consistent associations of 4 UNGD activity metrics with 3 types of asthma exacerbations. Whether these associations are causal awaits further investigation, including more detailed exposure assessment.
Asthma is a suitable outcome because UNGD has community and environmental impacts that could affect it: it is highly prevalent; it can be exacerbated by stress and small changes in air quality with short latency; and patients usually seek care for exacerbations so they are captured by an EHR. By leveraging longitudinal EHR data, we were able to complete a number of sensitivity analyses that suggested that the associations were robust to increasing levels of adjustment, although in some cases they were attenuated.
Studies of air pollution and asthma exacerbations have generally found small but consistently increased risks. A study of pediatric emergency department visits for asthma in Atlanta found that a standard deviation increase in pollution had associated risk ratios of 1.020, 1.036, and 1.062 for particulate matter smaller than 10 μm, nitrogen dioxide, and ozone, respectively.46 Studies on psychosocial stress have found that in children with asthma, the risk of an asthma exacerbation increased 4.7 times in the 2 days following a very stressful event.47 Adults exposed to violence in their community have 2.3 and 2.5 times the risk of an asthma emergency department visit and hospitalization, respectively, than those not exposed to community violence.48
Two sensitivity analyses were directed to the very important possibility that unmeasured confounding could account for our results. First, UNGD metrics were not associated with the negative disease control. Second, in the analysis replacing UNGD metrics with indicators for counties, counties with UNGD were not associated with severe exacerbations. These both provide evidence that unmeasured confounding is unlikely to account for our findings, but we acknowledge that the possibility still exists. We note that an unmeasured confounder would need to be strongly associated with both UNGD and asthma outcomes to account for our results. In sensitivity analysis to address unbalanced numbers of cases and controls, results were attenuated; the majority of dropped patients made up the most susceptible groups (younger and older) in the most exposed years, so attenuation was not unexpected. Finally, geocoding method and analysis with an overall activity metric did not change inferences.
This study had several strengths, including a large sample size from a population that represents the general population in the region. In addition, our exposure assessment improved on that used in prior studies,49,50 which used categorical distance-based metrics that did not account for UNGD phases. Our metric incorporated the temporality and duration of phases, gas production volume, and a surrogate for truck traffic. This study also improved on outcome ascertainment used in the previous study on UNGD and respiratory outcomes,50 which relied on self-reported outcomes and grouped several respiratory symptoms and conditions together (including asthma). We used documented asthma exacerbations. Our findings were robust to increasing levels of covariate control and in several sensitivity analyses.
This study also had limitations. The EHR did not contain information on occupation and only reflects patients’ most recent address. However, comparing addresses used in a prior study35 with addresses used in this study (39 months apart), 79.8% of patients were at the same address, and an additional 7.4% and 7.6% were less than 3.2 km and from 3.2 to 16.0 km, respectively, from their prior address, indicating little residential mobility. The EHR contained data only on events that occur at Geisinger facilities, but ambulances go to the closest hospital, so we may have undercounted events. We were unable to differentiate between asthma exacerbations that were hospitalized from those that occurred while hospitalized. We frequency matched cases and controls for year because UNGD activity metrics and year were highly correlated. We did not include year in the final model because of this high correlation, so there remains the possibility of unmeasured residual confounding by factors that strongly vary by year. We kept all 4 UNGD metrics because of a priori evidence that exposures differed by phase, but because metrics were highly correlated, we were unable to definitively distinguish among them. Furthermore, our UNGD metrics do not provide insight into the mechanism of the associations we observed.
Asthma is a common disease with large individual and societal burdens, so the possibility that UNGD may increase risk for asthma exacerbations requires public health attention. As ours is the first study to our knowledge of UNGD and objective respiratory outcomes, and several other health outcomes have not been investigated to date, there is an urgent need for more health studies. These should include more detailed exposure assessment to better characterize pathways and to identify the phases of development that present the most risk.
Corresponding Author: Brian S. Schwartz, MD, MS, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, Room W7041, Baltimore, MD 21205 (email@example.com).
Accepted for Publication: March 27, 2016.
Published Online: July 18, 2016. doi:10.1001/jamainternmed.2016.2436.
Author Contributions: Ms Rasmussen and Dr Schwartz had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Rasmussen, Bandeen-Roche, Schwartz.
Acquisition, analysis, or interpretation of data: Rasmussen, Ogburn, McCormack, Casey, Bandeen-Roche, Mercer, Schwartz.
Drafting of the manuscript: Rasmussen, Mercer.
Critical revision of the manuscript for important intellectual content: Rasmussen, Ogburn, McCormack, Casey, Bandeen-Roche, Schwartz.
Statistical analysis: Rasmussen, Ogburn, McCormack, Casey, Bandeen-Roche, Schwartz.
Obtained funding: Schwartz.
Administrative, technical, or material support: Rasmussen, McCormack, Casey, Mercer, Schwartz.
Study supervision: Schwartz.
Conflict of Interest Disclosures: Dr Schwartz is a Fellow of the Post Carbon Institute (PCI), serving as an informal advisor on climate, energy, and health issues. He receives no payment for this role. His research is entirely independent of PCI and is not motivated, reviewed, or funded by PCI. No other disclosures are reported.
Funding/Support: This study was funded by National Institute of Environmental Health Sciences grant ES023675-01 (Dr Schwartz) and training grant ES07141 (Ms Rasmussen). Additional support was provided by the Degenstein Foundation for compilation of well data, the Robert Wood Johnson Foundation Health & Society Scholars program (Dr Casey), and the National Science Foundation Integrative Graduate Education and Research Traineeship (Ms Rasmussen).
Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Additional Contributions: We thank Joseph J. DeWalle, BS, Jennifer K. Irving, BA, and Joshua M. Crisp, BS (Geisinger Center for Health Research), for patient geocoding and assistance in assembling the UNGD dataset; SkyTruth in Shepherdstown, West Virginia, for the well pad data; Kirsten Koehler, PhD (Johns Hopkins School of Public Health), for assistance with the temperature data; Kara Rudolph, PhD, MHS (Robert Wood Johnson Foundation Health and Society Scholars Program, University of California, San Francisco and University of California, Berkeley) for the code for the unmeasured confounder graphs; and Jonathan S. Pollak, MPP (Johns Hopkins School of Public Health) for identifying the patients with asthma from the general Geisinger population. All except Drs Koehler and Rudolph received compensation for their contributions.