Association of COVID-19 Incidence and Mortality Rates With School Reopening in Brazil During the COVID-19 Pandemic

Key Points Question Is the reopening of schools during the COVID-19 pandemic associated with increased COVID-19 incidence and mortality? Findings In this cross-sectional study of 643 Brazilian municipalities including 18 761 schools, on average, there was no systematic association between school reopening and COVID-19 incidence or mortality in São Paulo State up to 12 weeks after reopening, which was also the case for schools in the most vulnerable conditions. Aggregate mobility was already high before the school reopening and did not significantly increase afterwards. Meaning The results of this study suggest that reopening schools under appropriate protocols in low- and middle-income countries during the pandemic is unlikely to be associated with higher aggregate COVID-19 cases or deaths when counterfactual mobility is already high.


eMethods.
eAppendix. eFigure 1. Timeline of school reopening eTable 3. Heterogeneous treatment effects estimated through differences-in-differences eTable 4. Differences in cases for school-age children aggregated by cohort eTable 5. Differences in cases for parent-age adults aggregated by cohort eTable 6. Difference-in-differences estimates aggregated by cohorts with matched sample eTable 7. Difference-in-differences model with continuous treatment  In eTable 2, we provide a description of all variables used in the paper.

Quality of school infrastructure
We combine different school characteristics to capture the quality of school infrastructure, using the principal component method.
We select the following variables from the 2019 Brazilian school census: the availability of bathrooms, school staff's bathrooms, shower, kitchen, garbage collection, piped water, basic sanitation, and the average number of students per class.
We select the first principal component as our composite index of school infrastructure, computed at the municipality level averaging across its schools. This first component accounts for approximately 40% of the total variance of the selected variables.

Mobility
Our mobility data is based off Google reports, computed from mobile-phone GPS information. Google calculates daily mobility information for more than 400 municipalities in the São Paulo state. As such, we have mobility data for roughly 78% of municipalities that reopened schools in 2020, and 56% of the ones that did not.
Google provides information concerning different types of mobility, including: 1) retail and recreation; 2) grocery and pharmacy, 3) parks, 4) transit stations; and 5) workplaces. We average across these five measures to generate an aggregate mobility index. We then average over that daily aggregate measure to generate a weekly municipallevel mobility index.
Mobility is expressed in percentage-point deviations from February 15, 2020, when Google started collecting mobility data.

Under-reporting of COVID-19 cases
As in several other developing countries, limited testing in Brazil has been linked to substantial under-reporting of COVID-19 cases. Estimates suggest that the actual number of COVID-19 cases could be three to ten times larger than that registered in official statistics 1 .
Nevertheless, we do not expect under-notification to affect the findings of this study. The reason is two-fold. First, our estimates are based on a differences-in-differences strategy, which contrasts outcomes across municipalities that authorized schools to reopen and those that did not, before and after in-person activities could resume. As such, under-reporting would only bias our results if it were systematically different between groups and periods. Nonetheless, there is no reason why under-reporting should have been higher within municipalities that reopened schools, or for it to have increased after in-person activities could resume.
Second, and most importantly, this study provides first-hand evidence on the relationship between school reopening and COVID-19 deaths. Deaths tend to be much more precisely reported than cases; in effect, official statistics for COVID-19 related deaths tended to closely track the total number of excess deaths in 2020 and 2021, relative to previous years 2 . As we estimate a similar relationship for both cases and deaths, we conclude our results are not driven by under-reporting.

Estimation method
To estimate the effect of school reopening on the outcomes of interest, we rely on the differences-in-differences approach. That is, we compare the trends of the dependent variables for treated and non-treated municipalities. However, recent literature suggests that a straightforward two-way fixed-effect regression is not appropriate in the context of an application with multiple periods and staggered treatment.
To circumvent this problem, we implement the Callaway and Sant'Anna procedure 3 . It can be seen as a three-step estimator. First, we divide the treatment group (the set of municipalities that authorized schools to reopen) into cohorts according to the week municipalities authorized schools to reopen. Let be a dummy variable indicating that municipality m reopened schools at period g and let be a dummy variable indicating that municipality m did not authorize schools to reopen in 2020. Let be the number of municipalities treated at time g, and N be the total number of time periods after treatment (=5 in our sample). Next, let = {0,5,6,8,9} be the set that identifies the weeks at which different cohorts were treated, in our sample. Then, in the first step of the procedure, we estimate: where Ф(. ) is the normal cumulative density function, are municipal time-invariant characteristics. That is, we use a Probit model to predict how the likelihood of being treated at ∈ relates to municipal characteristics ( ′ ). Then, we calculate the propensity score of being treated in a certain period as the predicted probability from the first-stage ( � ).
Next, in the second step, we estimate cohort-time treatment effects, as follows: where is the outcome of interest in municipality m at time t, and is a weight computed as follows: In words, the second step calculates difference-in-differences estimates by comparing units that were originally treated at time g, t periods later, to units in the control group --attributing higher weight to observations that have larger probabilities of being treated based on their characteristics.
Lastly, once we estimate cohort-time treatment effects, we can aggregate them in multiple ways in the third step. In particular, it is possible to compute dynamics treatment effects as: which is just the average of cohort-time treatment effects evaluated at T, which can even be negative (prior to the treatment onset , in the case of falsification tests). Alternatively, we could compute cohort-specific treatment effects as: We compute standard errors using Callaway and Sant'Anna's standard procedure. This is done by blockbootstrapping standard errors at the municipality level. The block-bootstrap procedure allows arbitrary correlations between regression errors within the same municipality over time.
The fundamental identification assumption for the differences-in-differences strategy is that of parallel trends for the outcomes of interest. That is, we assume that, in the absence of school reopening, the log of each outcome would have followed the same time trend across both groups. We point out in the main text that there are significant initial differences between the two groups at the time of reopening. This does not invalidate the identification assumption, since differences-in-differences parses out any difference in outcomes across groups at the baseline period.
Some factors contribute to the plausibility of this assumption. First, the authorization to reopen schools is defined at the health region level, but the actual decision to reopen schools or not is made at the municipal level. As such, there was considerable variation in reopening decisions even within each health regions in the State. Moreover, the authorization to reopen schools was subject to the endorsement of local legislators, introducing randomness in the timing of the actual return of in-person school activities even conditional on municipal reopening decisions. We present falsification tests in Figure E3 whereby we test and fail to reject the presence of parallel trends before school reopening.
The Callaway and Sant'Anna estimator additionally requires that units that are treated cannot be untreated. For this reason, we drop from the analyses 2 municipalities that authorized schools to reopen, but reverse those decisions shortly after.

Nearest-neighbor matching
As a robustness check, we also implement a nearest-neighbor propensity score matching. We implement the procedure sequentially, starting the process by considering a sample consisting of the first cohort and the control group (never treated municipalities). We estimate a Probit model, as before: Then, we calculate the propensity score for this sample as the predicted value from the estimate above. We include the following variables as controls: per capita income, population, number of students, school infrastructure, and the number of baseline cases and deaths.
For each municipality in the treated cohort, we find the municipality in the control group with the closest propensity score, without replacement. After we match all municipalities in the first cohort, we build a new sample, including the second cohort of treatment and the control group without the municipalities matched in the first step of the process. Then, we re-estimate the Probit model for this alternative sample and match all municipalities in the second cohort. We repeat this procedure until we find a match to all cohorts of treatments.
We implement this sequential algorithm for two samples. First, the full sample of the study, including 643 São Paulo municipalities. Second, for the sub-sample of municipalities with available mobility data. eFigure 1 starts by documenting the proportion of municipalities that reopened schools at different points in time, and the number of schools authorized to reopen as a result. The figure highlights that reopening decisions were staggered, with in-person school activities picking up only from November onwards; and even then, inperson school activities were far from universal in the State. eFigure 1: Timeline of school reopening Next, eTable 3 estimates the relationship between school reopening and disease activity separately for below-and above-median (a) per capita income, (b) average quality of school infrastructure, (c) senior population share, and (d) baseline Covid-19 deaths. We find no statistically significant association even for the highest risk or most vulnerable sub-samples. eTable 4 estimates whether school reopening affected school-aged children using a triple-differences strategy, whereby we not only compare municipalities that authorized schools to reopen to those that did not, before and after in-person activities could resume, but also, school-aged children (ages 7-18) to young adults (ages 19-22) within each group and period. This strategy provides evidence on the direct effects of school reopening on COVID-19 incidence, since the latter age group should not be directly affected by school reopening. The analysis relies on additional data from COVID-19 testing (publicly available from SUS), which allows identifying patients' age. The limitation is that age-specific data are only available for cases, and only for specific weeks. Next, eTable 7 estimates the relationship between school reopening and COVID-19 cases and deaths more flexibly, allowing for the possibility that the decision to reopen schools locally might also affect disease activity in neighboring municipalities. To do that, we implement a fixed-effects estimator with a continuous treatment variable, defined as the negative of the log distance to nearest municipality that reopened schools at each week (=0 if the municipality itself has authorized schools to reopen then). This specification also does not detect a systematic relationship between proximity to schools authorized to reopen and local disease activity.