Evaluation of Allocation Schemes of COVID-19 Testing Resources in a Community-Based Door-to-Door Testing Program

Key Points Question What are effective mechanisms to identify and reach vulnerable populations and equalize access to COVID-19 testing resources in the presence of substantial demographic disparities? Findings In this cohort study of 756 participants, a door-to-door program with community-based health workers was associated with a substantial increase in the proportion of Latinx and elderly individuals undergoing testing, relative to neighborhood testing sites. The protocol associated with the greatest increase in testing at-risk individuals was uncertainty sampling, followed by local knowledge, and then targeting households in areas with a high number of index cases. Meaning These findings suggest that community-based testing programs that allocate resources using uncertainty sampling might effectively reduce COVID-19 testing disparities.


eMethods. Definitions and Procedures Segment Definitions
There are 20 census tracts contained in the three ZIP codes 95122, 95116, and 95127. Each census tract was refined into segments: smaller, contiguous, geographic units that the promotores would be able to cover in one day. Segments were created with an algorithm that grouped parcels with matching street names, then split longer streets while combining shorter adjacent streets, and iteratively aimed for a compact arrangement of approximately 100 households, accounting for multifamily buildings. eFigure 1 illustrates the segments and situates where the intervention was implemented.

Field Details and Data Entry
The door-to-door testing program involved collaboration between a research team at Stanford, Case Investigation and Contact Tracing (CICT) teams at the Santa Clara County Department of Public Health, a large testing laboratory, and the teams of promotores from META (Mujeres Empresarias Tomando Acción). Key steps in the information and data flow can be categorized into three phases as shown in eFigure 2.
Before the Visit Through a data sharing agreement with the County, the Stanford team gained secure access to daily COVID-19 testing data from the CalREDIE state database. Each day before a given field assignment, the Stanford team processed the most recent seven days of positive case counts to identify recent index cases (for the index area selection protocol) and calculate the upper confidence bound positivity rate (for the uncertainty sampling protocol) for each census tract (see eFigure 3). From the top census tracts based on recent tests, the Stanford team generated assigned segments, which were sent to the promotores for use the following day, and control segments, which were retained for future comparative analysis. The assigned segments were provided as CSV files with supplementary maps as HTML files through a secure platform. County staff then imported the CSV files into Excel workbooks that would be accessible by the promotores in the field through secure, County-issued mobile devices, with each pair having a specific segment's worth of addresses and fields of information to complete for each address.

Door-to-Door House Visits
The promotores teams traveled to their assigned segment for the day and visited as many addresses as they could manage within a single shift, typically 3 hours. At each address, using the editable fields in the Excel workbook, the teams indicated whether they were able to visit the house and knock on the door or ring the doorbell, then whether they were able to have a conversation with a resident at the address. After following a standard script and protocol for offering and administering testing, the promotores indicated on the workbook the number of tests they collected from the address, which was later used as a cross-check with lab data, and any miscellaneous notes. In addition to visiting the Stanford-assigned addresses, the promotores had the discretion to also visit other addresses during their shift, which they could enter into the same workbook, and which would be identifiable in the data analysis as addresses generated through local knowledge.
After the Visit At the end of the daily shift, the promotores teams dropped off physical testing samples at a local collection site, after which the samples were delivered to and tested by the laboratory, and the resulting data was routed to the CalREDIE system. Through an arrangement with the lab, the data for the tests collected by the promotores was also flagged and directly routed to specialized CICT teams, so that the residents who interfaced with the door-to-door program could receive high-touch support, and to the Stanford team for detailed analysis. The laboratory also provided data for the Church and Fairgrounds testing sites, and this dataset included demographic data about testing site visitors which is not otherwise available from CalREDIE. The promotores teams and County staff also cleaned up and finalized the Excel workbook entries at the end of each day and transferred the data back to the Stanford team. Given the identification of assigned and control segments for a specific day, the CalREDIE system was used a few days later to identify segment-level testing and positivity rates.

Protocol Details
We provide more formal definitions of the risk-based protocols, as well as some simulations to illustrate when uncertainty sampling provides reliable gains over naive prevalence sampling.
For both protocols, once assigned (visited) and control segments were chosen for a day, they were removed from consideration for the following two weeks. After that, they were re-eligible for selection. A two week period was chosen based on the incubation period for COVID-19. 1

Index Area Selection
Let be the number of index cases in segment s over the past day, and its population size. Index area selection was made by ordering segments according to the quantity / and choosing either the top two or four (depending on whether there were one or two teams). We then randomized the selected segments, withholding half as a control set.
Uncertainty Sampling For a segment tract i let and be the total number of positive and negative COVID-19 tests over the past seven days in accordance with County practice. We do not consider inconclusive tests. The positivity rate is: The upper confidence bound is calculated as the upper bound of the Clopper-Pearson 95% confidence interval 2 , i.e., the maximum (supremum) such that where ( , )is a Binomial random variable with n trials and success probability .Once the tract(s) with the highest upper confidence bound is chosen, we randomly select two segments from within the tract: one to visit and one to use as a control. The upper confidence bound is a function of both the underlying infection rate in the population tested as well as the number of tests, due to sampling variability. eFigure 3, for instance, illustrates the positivity rate and upper confidence bound in census tracts for a specific period of time. While the point estimate of census tract 3 is higher than that of tract 4, for instance, the low volume in the former means that we have more uncertainty about how high the rate might in fact be. This means that we would allocate more testing resources to the latter.
Next, we perform a Monte Carlo simulation to demonstrate the benefits of uncertainty sampling when prevalence rates and testing rates may be only semi-correlated or even uncorrelated. We generate k = 20 areas, draw vectors of prevalence rates and testing rates uniformly at random (independently of one another). We calculate the resulting positivity rate from naive sampling (allocating tests to where the point estimate is highest) and uncertainty sampling. eFigure 4 summarizes the results from 5000 simulations, each with 100 simulated COVID-19 tests.
These simulations show that high variance in testing and prevalence rates result in the greatest performance gains from uncertainty sampling. Moreover, we see that uncertainty sampling never performs worse than prevalence sampling, unless the testing rates and prevalence rates are utterly correlated. The contrary conditions are, of course, precisely what motivated this allocation protocol: Santa Clara County was facing substantial disparities in case prevalence rates and the testing intervention was a response to the perceived inadequacies of testing resources in vulnerable communities. We hence believe that those are two critical factors for applying this intervention in other areas.

Demographic Differences within Door-to-Door Protocols
While there are sharp demographic differences between door-to-door testing and conventional testing sites, the demographic differences within the three selection protocols are more muted (see eTable 1). The local knowledge protocol tested a higher percentage of Latinx individuals than risk selection (94% compared to 78% and 85%, pvalue < 0.001, χ 2 test). Both risk selection strategies also seemed to test older individuals (65+) at higher rates than local knowledge -15% and 19% compared to 9% (p-value = 0.02, χ 2 test). Regardless of selection protocol, doorto-door testing reached more Latinx, women, and elderly individuals.

eFigure 3. Segments Where Intervention Was Implemented
Upper left: State of California (outlined in black), with Santa Clara County shaded in red. Bottom left: Santa Clara County (outlined in black) with the three ZIP codes of interest shaded in red. Right: Census tracts (outlined in black) entirely within the three ZIP codes and the refinement of those tracts into "segments'' (colored). One colored segment represents approximately 100 households. We focused on the census tracts contained entirely within the three ZIP codes due to resource constraints.

eTable. Demographic Breakdown of the 3 Door-to-Door Intervention Strategies
Demographic breakdown of the three door-to-door intervention strategies.

p-value
Gender Female

Male
Unknown / Other