Association of Breastfeeding and Air Pollution Exposure With Lung Function in Chinese Children

This cross-sectional study evaluates the association of air pollution exposure and having been breastfed or not with lung function among children in China.


eMethods 1. Description of the Study Design and Random Sampling Strategy
The Seven Northeastern Cities (SNEC) study is a cross-sectional study of children's health outcomes based on exposure to ambient air pollutants. This region encompasses more than 20million people residing in 14 cities in Liaoning province in Northeastern China. To maximize the inter-and intra-city gradients of the pollutants of interest and also to minimize the correlation between district-specific ambient pollutants, in April 2012 the seven cities of Shenyang, Dalian, Anshan, Fushun, Benxi, Liaoyang, and Dandong in Liaoning province were selected as study sites, based on air pollution measurements taken between 2009 and 2011 (Map 1). In each of the seven cities, we selected all urban districts for the study. There are five districts in Shenyang, four districts in Dalian and Fushun, three districts in Anshan, Benxi, and Dandong, and two districts in Liaoyang, respectively.

Map 1. Locations of the study cities in Liaoning province, northeast of China
In each of the 24 study districts, there was only one available municipal air monitoring station. To generate a representative sample, we randomly selected one or two elementary schools and one or two middle schools, located within 2 km of a municipal monitoring station. If the number of students in the first selected school was below 500, we included a second school. The resulting 62 schools were included. Within each school, we randomly selected one or two classrooms depending on the class size from each grade level to enroll study participants.
The Map 2 below shows an area including up to a 2 km radius around one of our monitoring stations, eMethods 3. Description of the Ground-Monitored PM1, PM2.5, PM10, and NO2 Data Ground-monitored airborne particulate matter with a diameter of 1 μm or less (PM1), airborne particulate matter with a diameter of 2.5 μm or less (PM2.5) and airborne particulate matter with a diameter of 10 μm or less (PM10) were obtained from the China Atmosphere Watch Network (CAWNET) of the China Meteorological Administration (CMA). The network consisted of 96 stations across mainland China. Concentrations of PM1, PM2.5 and PM10 at all stations were measured with GRIMM 180 Environmental dust monitors (Model 1.108, Grimm Aerosol Technik GmbH, Ainring, Germany). Daily concentration of NO2 was estimated with satellite-derived OMI data (Daily Level-3 Nitrogen Dioxide Product) and other predictors. Two quality-control procedures were applied to all PM measurements: a "limit check" and "climatological check". For the limit check, we evaluated each valid PM measurement to determine whether it fell within its possible limits, otherwise, they were removed. In the climatological check, the median and standard deviation (SD) of hourly PM measurements were calculated at each PM observational site. Any PM values lying outside of more than three SDs from the median PM have been removed. Daily PM1, PM2.5, PM10, and NO2 concentrations were estimated by using a spatial statistical model with a machine learning method matched to the children's geocoded home addresses. Briefly, each participant's home address was geocoded as a geographical longitude and latitude, and superimposed over the predicted daily PM1, PM2.5, PM10, and NO2 concentrations' grids, and then the exposure parameters were calculated by averaging the daily concentrations for PM1, PM2.5, PM10, and NO2 over the four-year period of 2009-2012.
This method is user-friendly, as there is no need to define the complex relationships between predictors (e.g., linear or nonlinear relationships and interactions). Also, the variable importance measures provided by random forests help the user to identify important variables and noise variables. The final model is shown as following: PMij = AODij + TEMPij + RHij + BPij + WSij+ NDVI + Urban_cover + doy + log(elev) NO2ij = OMIij + TEMPij + BPij + RHij + WSij + NDVIij+ Urban_coverij + doyi + log (elevj) where PM2.5ij is the PM2.5 or PM10 concentration on day i at station j; NO2ij is the NO2 concentration on day i at station j; AODij is the combined AOD; OMIij is the satellite-derived OMI value; TEMP, RH, BP, and WS are mean temperature, relative humidity, barometric pressure, and wind speed on day i, respectively; NDVI is the monthly average NDVI value; Urban_cover is the percentage of urban cover with a buffer radius of 10 km; doy is day of the year; and log(elev) is the log transformed elevation. To evaluate the predictive ability of the final model, a 10-fold cross-validation (CV) was performed.
The results of a 10-fold cross-validation showed R 2 values for daily and annual predictions were 55% and 75% for PM1, 83% and 86% for PM2.5, 78% and 81% for PM10, and 64% and 72% for NO2, respectively. The Root Mean Squared Error (RMSE) values for daily and annual predictions were 20.5 µg/m 3 and 8.8 µg/m 3 for PM1, 18.1 µg/m 3 and 6.9 µg/m 3 for PM2.5, 31.5 µg/m 3 and 14.4 µg/m 3 for PM10, and 12.4µg/m 3 and 6.5 µg/m 3 for NO2, respectively. eMethods 4. Description of the PM10, SO2, NO2, and O3 Data The operation of the monitoring stations has strictly followed the quality assurance/quality control (QA/QC) procedure set by the State Environmental Protection Administration of China (SEPAC,1992). The environmental monitoring centers in each of the three cities conducted regularly performance audits and precision checks on the air-monitoring equipment. Quarterly performance audits are conducted to assess data accuracy on airborne particulate matter with a diameter of 10 μm or less (PM10), sulfur dioxide (SO2), nitrogen dioxide (NO2), and ozone (O3) monitoring systems.

1) The calculation method
The calculation method is performed according to Chinese National standards (GB8170-87). The unit of monitored pollutants is mg/m 3 accurate to the third decimal. The units can also be expressed as μg/m 3 , depending on the pollutant's concentration. For concentrations that were too low to be measured, half of the lowest checking limit of the equipment will be used as the measured value.
2) Outliers When the measured concentration is too low (e.g. background value), a negative value can be obtained because of the zero drift of the monitor. There is no physical meaning to this value. This negative value can be regarded as a value of "unable to measure." For the monitoring station with an automatic calibration system, if equipment zero drift/span drift exceeds the control range during the period of zero/span calibration, the data from the time it becomes out of control until the equipment is recovered should be regarded as invalid data. The data cannot be used statistically.
The data during the period of zero calibration/span calibration should be regarded as invalid data. It cannot be used statistically, but a flag should be made on these data and the records stored as evidence.
When values are missing because of a loss of power, any data received by the central control station during the period of the loss of power should be regarded as invalid data. The period of loss of power should be counted at the start of power outage until complete warm-up of equipment. The data cannot be used statistically.
Because pollutant concentrations change over time and change slowly, there should be no swift change in pollutant concentration in the results of normal monitoring. Either a swift change or no change indicates that there is an equipment problem. The problem should be identified, and the data between the start of problem to recovery should be regarded as outliers. These data cannot be used statistically.

3) Statistics of monitoring data
One time value The central control station uses an average of 15 minutes of pollutant concentrations measured at the branch station as a one-time value. The central control modifies this value and judge whether this value is an outlier using the report software.

eMethods 5. Description of the 2-Level Binary Logistic Regression Model
We assessed normality and described distributions for continuous variables as the mean ± standard deviation (SD), and categorical variables as n (%), comparing breastfed to non-breastfed children by Student's t-test or χ 2 -test as appropriate. To investigate the relationship between the pulmonary function tests (PFT) and ambient air pollution, we considered a two-level logistic regression model in which children were the first-level units and the districts were the second-level units. At the child level, we predicted the logit of the prevalence rate for a given impaired lung function outcome by breastfeeding (BF) and k covariates (X1 ….Xk) as follows: logit [P(symptomij )] = αj + λjBFij + β1X1ij + ….+ βkX1ij + eij (1) where the subscript j is for districts (j=1,…, 25), the subscript i is for children (i=1,..nj), αj are the intercepts at the district level, λj are the regression coefficients for breastfeeding, β1 …βk are the regression coefficients of covariates, and eij are the random errors, assumed to have mean of zero and constant variance. The αj and λj are random coefficients as they are assumed to vary across districts. In general, a district with a high αj is predicted to have higher prevalence rates than a district with a low αj. Similarly, differences in λj indicate that the relationship between breastfeeding and prevalence rates is not the same in all districts. In districts with a high (low) λj, breastfeeding has a large (small) effect on prevalence rates (i.e., the difference between breastfed children and non-breastfed children is relatively large (small)). At the district level, we regressed the district-specific intercepts αj and coefficients λj on the district-specific pollutant level (Zj) to explain the variations of αj and λj, as follows: αj = α + γ1Zj + u1j (2) λj = λ + γ2Zj + u2j (3) Equation (2) predicts the prevalence rates in a district by Zj. If γ1 is positive, then adjusting for covariates, the prevalence rates are higher in districts with a higher pollutant level. Conversely, if γ1 is negative, then adjusting for covariates, the prevalence rates are lower in districts with a higher pollutant level. Equation (3) states that, adjusting for covariates, the relationship between prevalence rates and breastfeeding in a district depends on the district's pollutant level Zj. If γ2 is positive, then adjusting for covariates, the breastfeeding effect on prevalence rates is larger with a higher pollutant level. Conversely, if γ2 is negative, then adjusting for covariates, the breastfeeding effect on prevalence rates is smaller with a higher pollutant level. The -terms u1j and u2j are random errors at the district level, assumed to be independent and have mean of zero and constant variance. These random errors characterize the between-district variation and are assumed to be independent from eij at the child level. Note that α, λ, β1,…, βk, γ1, and γ2 are fixed effects and so do not vary across districts (they therefore have no subscript j to indicate district). The above models can be written as a single regression equation by substituting equations (2) The terms in the first and second parentheses in equation (4) are often called the fixed (or deterministic) and random (or stochastic) parts of the model, respectively. The product term ZjBFij is a cross-level interaction between the child-level variable BFij and the district-level variable Zj. The random error u2j BFij is different for different children, a situation that in ordinary multiple regression analysis is called heteroscedasticity. To evaluate the robustness of our estimates, we conducted a number of sensitivity analyses, including stratifying by child's age, excluding children with mixed feeding, excluding children with low birth weight or preterm birth, excluding children with doctordiagnosed asthma, and randomly excluding one district. Analyses were also adjusted for a potential confounders selected a priori based on literature evidence for associations with air pollution exposure and respiratory function. These factors included age, sex, height, birth weight, preterm birth, parental education, family income per year, exercise per week, passive smoke exposure, home coal use, house pet, home renovation in the past 2 years, area of residence per person, doctor-diagnosed asthma, family history of atopy, and short-term air pollution concentrations. Short-term air pollution concentrations referred to the daily mean pollutant levels from routine air monitoring data during lung function measurement in the children. If the estimated regression effects for pollutants changed by at least 10% upon inclusion in the base model, the covariate was retained in the final model as a confounding variable. All analyses were conducted using the GLIMMIX procedure in SAS 9.4 (SAS Institute, Cary, NC USA). All statistical tests were two-tailed, and p-values less than 0.05 were considered statistically significant except for the interaction term where statistical significance was asserted at p-values less than 0.1.