International Analysis of Electronic Health Records of Children and Youth Hospitalized With COVID-19 Infection in 6 Countries

This cohort study aims to describe international hospitalization trends and key epidemiological and clinical features of children and youth with COVID-19.


Introduction
The clinical presentation of coronavirus disease 2019  differs substantially between children and youth and adults. The unique clinical features, complications, and outcomes of COVID-19 among children and youth warrant special consideration in epidemiologic, management, and prevention studies. 1 However, the low prevalence of disease in children and youthcompounded by the routine challenges of conducting large clinical trials in pediatric populations-has limited their inclusion in many studies. 2 Key questions remain related to risk factors for severe and rare disease manifestations and optimal use of clinical interventions. 3 The experience with COVID-19 has highlighted the critical need to have efficient methods to complement traditional clinical investigations and public health surveillance to study pediatric populations during a rapidly evolving pandemic.
Large volumes of clinical data are available in electronic health records (EHRs) to support epidemiological studies of medical conditions and analyze real-world outcomes related to specific populations and interventions. 4 When used appropriately, these data represent a powerful tool to fill in gaps and address shortcomings of conventional clinical trials. For example, EHR data have been applied to more efficiently assess medication safety in children and youth or to test at scale potential associations between risk factors and pediatric conditions. 5,6 These data are particularly conducive to the study of small populations or rare events that can be difficult to capture in smaller data sets. 7,8 Other key benefits of EHR data include the ability to ascertain clinical trajectories and to facilitate multinational studies by combining data across health care systems. Soon, EHR-based observational data may also contribute to assessing the impact of vaccines in children and youth, including efficacy and long-term safety in pediatric subpopulations with limited representation or follow-up in clinical trials.
The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaborative covering 351 adult and pediatric hospitals in 7 countries that has collected patient-level EHR data on 39 200 hospitalized patients with polymerase chain reaction (PCR)-confirmed diagnosis of SARS-CoV-2. 9 The use of common data elements across a federated network allows for integration and harmonization of data to enable analyses of the disease manifestation and epidemiology of COVID-19 across health care sites. Focusing on adult populations, studies have used data from the 4CE initiative to measure the prevalence of specific types of clinical complications, develop EHR-based severity algorithms, 10 identify laboratory tests predicting severity in patients with COVID-19, 11 and define country-level differences in demographic and epidemiological presentation. 9 Leveraging data from this collaborative, our objective was to demonstrate large-scale, multinational use of EHR data to study COVID-19 in children and youth and describe hospitalization trends and key epidemiological and clinical features of the disease.

Methods
In this cohort study, each participating site obtained institutional review board approval to share deidentified, aggregated patient data with the 4CE consortium. Informed consent was waived because the patient data were deidentified. The study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.

Participating Sites and Cohort Identification
Participating 4CE sites in France, Germany, Spain, Singapore, the UK, and the US reported pediatricspecific data and contributed patients to this cohort analysis. We analyzed patients younger than 21 years who were hospitalized between February 2 to October 10, 2020, and had a positive reverse transcription PCR test for SARS-CoV-2 infection 7 days before to 14 days after the date of admission.
Positive tests were identified by local data managers at each site who mapped internal codes for SARS-CoV-2 laboratory results. Demographic information on a subset of patients admitted through April 11, 2020, was previously described. 9

Data Extraction and Aggregation
Several sites included multiple hospitals, and pediatric data were extracted from each hospital participating in the pediatric substudy (eTable in Supplement 1). Certain sites applied obfuscation thresholds to minimize disclosure risks related to small patient numbers. When values were obfuscated, we inserted a value of 0.5 times the obfuscation threshold.
Sites executed queries on local clinical data warehouses containing patient-level EHR data. 9  Revision (ICD-10) codes and used logical identifier names and codes (LOINC) for laboratory tests and anatomical therapeutic chemical National Drug Codes for medications. Each contributing site uploaded their files to a central 4CE data upload tool, where quality control and validation steps were performed before analysis. 11 Patient-level files remained at each site and were not centrally shared at any point.

Demographic and Clinical Variables
Race and ethnicity data were collected by participating hospitals based on routine practices using local race and ethnicity classifications. Sites mapped these categories to the standard categories provided by the US National Institutes of Health before the file upload to 4CE. 12 We chose to assess race and ethnicity in this study because prior reports have indicated an association between race and ethnicity and clinical outcomes for children and youth with COVID-19. [13][14][15] A set of 16 laboratory values were selected, reflecting laboratory tests commonly performed as well as tests reported in prior studies to be abnormal in patients with COVID-19. 16 To describe clinical complications, we analyzed all diagnostic codes assigned to patients during the hospitalization. The diagnosis codes were reported from all sites using ICD-10. These codes were truncated to the first 3 characters, which represent the disease category. The codes that follow the first 3 characters add more detailed information about etiology, anatomic site, or manifestations, but would have resulted in too many categories with very low counts. To assess medication use, we determined the number of patients treated with a prespecified set of medications. These included repurposed agents used to manage COVID-19 during the study period (eg, hydroxychloroquine), investigational agents (eg, remdesivir), and adjunctive therapies used to manage complications related to COVID-19. 17 3-digit diagnosis codes were consistent with the ICD dictionary. Because all laboratory tests were mapped to the same LOINC codes with unified units, laboratory test values from each site were manually reviewed to ensure the result ranges were generally consistent with data observed across other sites. Sites with implausible laboratory values or values consistently lower or higher than other sites were contacted for further investigation and correction as needed. 11 The local investigations and final assessment of accepted values considered age-specific reference ranges as well as clinical assays and site ranges.

Statistical Analysis
We summarized the daily hospitalized case counts over time and the breakdown of the cases by demographic subgroups based on pooled analysis across participating hospitals by country. To describe the clinical profile of hospitalized cases, we reported mean laboratory values at admission and percentages of frequently observed complications. Mean values and percentages with 95% CIs were aggregated across all sites based on random-effects meta-analysis. 18 To summarize temporal trends of laboratory values, we combined data from sites with at least 3 observations and calculated mean laboratory values on each day of hospitalization, also using random-effects meta-analysis. 18 Additional details on this approach are provided in the eMethods and eFigures 2 and 3 in Supplement 1. We based 95% CIs on the z-statistic with normal approximations for both continuous outcomes and the proportion of binary outcomes. Statistical significance was prespecified at P < .05 and tests were 2-tailed.

Statistical analyses and visualizations were performed in R version 3.5.1 (R Project for Statistical
Computing) and Python version 3.7 (Python). We used the Altair package 19 to create figures for static publication and interactive web-based exploration. The Structured Query Language code used for data extraction, R Code used for analysis, and mapping tables used for laboratory tests and medications are available on GitHub.

Study Cohort
There were 347 male patients (52%; 95% CI, 48.5%-55.3%) and 324 female patients (48%; 95% CI, 44.4%-51.3%) in our cohort. There was a bimodal age distribution, with the greatest proportion of  The race/ethnicity variable was based on the categories as defined by the US National Institutes of Health. 12 For Singapore, the term Asian includes Chinese, Asian Indian, and Malaysian and the term other was used for Eurasian and other races and ethnicities. For patients in the UK and the US, the term other represents other races and ethnicities, mixed races, and missing information on race. Information on race and ethnicity were not collected in France, Germany, and Spain.

Clinical Features
A total of 27 364 laboratory values were obtained for the 16 laboratory tests examined ( Table 1).

US
For France, daily pediatric hospitalization data were obtained from Santé Publique France. 20 For Germany, weekly pediatric hospitalization data were obtained from the German Society for Pediatric Infectious Diseases. 21 National pediatric hospitalization data were not available for Singapore. For Spain, weekly pediatric hospitalization data were obtained from the Spanish National Epidemiological Surveillance Network, which lacks hospitalization counts between May 11 and July 15, 2020. 22 For the UK, daily pediatric hospitalization data were obtained from the Royal College of Paediatrics and Child Health and represent pediatric hospitalizations in England. 23 For the US, weekly pediatric hospitalization data between July 31, 2020, and October 9, 2020, were obtained from the Department of Health and Human Services. 24 The y-axis scales for country-level data are independent to compare country-level trends with Consortium for Clinical Characterization of COVID-19 by EHR (4CE) trends. The plots in Figure 2A display the counts with a 14-day (centered) rolling mean. 2 to 4. For example, compared with the initial values for C-reactive protein, 4-day measurements showed a decrease of 18 mg/L (95% CI, −16-54 mg/L). Interestingly, there was a peak in several laboratory values, such as albumin, D-dimer, and lactate dehydrogenase, starting on hospital days 6 to 8. For example, compared with the initial values for D-dimer, 8-day measurements showed an increase of 1.45 μg/mL (95% CI, 0.59-2.31 μg/mL).

Medication Use
To examine the use of specific drug classes in the treatment of COVID-19 in children and youth, we determined the number of sites treating at least 3 patients with a range of drugs considered candidate therapeutic agents in adults during the study period or used to manage certain complications and underlying conditions potentially exacerbated by COVID-19 ( Table 2). Only 2 sites treated at least 3 patients with an aminoquinoline, which includes hydroxychloroquine, and 1 site administered remdesivir to at least 3 patients. More sites administered adjunctive therapies, such as antithrombotic agents (8 sites), diuretics (8 sites), interleukin inhibitors (3 sites), and angiotensin converting enzyme inhibitors (3 sites).

Discussion
Using patient-level EHR data extracted from health care systems across 6 countries, this study offers insights on international trends of hospitalizations for children and youth with COVID-19 and defines epidemiological and clinical features associated with the disease in children and youth. Even among countries with few participating sites, hospitalization counts for children and youth over an 8 month period approximated population-level infection rates, demonstrating the potential application of this approach to monitoring disease activity in pediatric populations. Consistent with prior reports,  we found greater proportions of younger children among hospitalized patients. 26,27 Laboratory tests obtained on hospital admission indicated abnormalities in inflammation and coagulation.

JAMA Network Open | Health Informatics
Examination of management patterns revealed that the use of candidate therapeutic agents adopted in adult populations remained low in children and youth.
Our study demonstrates the value of using routinely collected data from EHRs to complement other forms of disease surveillance, especially when disease prevalence is low and rapid progression precludes the development of prospective research infrastructures. 28 These data may be particularly valuable in advancing our understanding of COVID-19 in children and youth, where fewer resources have focused on COVID-19-related illness because of the less severe impact of the disease and much lower disease prevalence. While there are important limitations to EHR data, including inconsistent Mean daily values across sites were calculated using random-effects meta-analysis. Values in parenthesis represent the minimum and maximum numbers of patients contributing data on any single day during the 14-day observation period. The shaded areas represent 95% CIs. SI conversion factors: To convert alanine aminotransferase to microkatal per liter, multiply by 0.0167; albumin to g/L, multiply by 10; aspartate aminotransferase to microkatal per liter, multiply by 0.0167; C-reactive protein to milligrams per liter; creatinine to micromoles per liter, multiply by 76. 25 The multinational design of 4CE allows ascertainment of differences in regional management patterns and uptake of therapeutic interventions in children and youth. Early in the pandemic, many agents emerged as candidate therapies for COVID-19, including both repurposed drugs and investigational agents. 17 Observational studies 31,[34][35][36][37][38] indicate that many of these drugs were widely used among hospitalized adult patients, although use in children and youth appears to have been lower. This likely reflects the less severe disease course in children and youth and is also consistent with patterns in off-label medication prescribing in pediatric patients, where use in pediatric populations tends to follow adoption in adults. 39 It also relates to the lower number of clinical trials performed in pediatric patients to test new therapies, including remdesivir. 2 Monitoring the use of pharmacotherapies in children and youth, including defining regional and country-level differences, will support activities to optimize and standardize care for children and youth with COVID-19 and guide prioritization of research activities to ensure availability of safe and effective pediatric therapies.
An area for further development of 4CE data is in the collection and analysis of race and ethnicity information. During this first phase of data collection, the race variable was limited to standard categories as defined by the US National Institutes of Health and how it is used in many US-based studies. 12 However, this categorization is subject to 2 major limitations for our purposes.
First, it combines race and ethnicity in a way that race cannot be reported for Hispanic and Latino individuals. This results in missing race information if a patient is recorded as Hispanic or Latino or missing ethnicity data if race is prioritized in local data collection. Second, this categorization does not lend itself to use in other countries, such as the UK or Singapore, where the primary racial and ethnic categories differ from those in the US. To meaningfully capture race and ethnicity information across countries, country-specific categories must be used. Accurate collection of race and ethnicity information is critical to advancing our understanding of differences in risk factors, infection rates, and health care use that have been reported in prior studies. 15,40,41 In future phases of 4CE, we plan to implement country-specific ontologies for collection of race and ethnicity data.

Limitations
This study has limitations. To enable an international federated network and preserve patient data privacy from each participating site, only aggregate counts were analyzed, limiting the ability to combine values or follow individual patients longitudinally. For example, while we can ascertain mean laboratory values across individual sites and even track these throughout the hospitalization, we cannot link laboratory results to specific patient characteristics. In the next phase of 4CE studies, prespecified analyses will be run within the primary data sets at each of the individual sites before aggregation at the consortium-level, enabling patient-level analyses. Additional limitations relate to the use of observational data, including nonsystematic recording of certain clinical data elements and shifting testing strategies for COVID-19, which may inform characteristics of the study population. 29

Conclusions
In this study of EHRs of children and youth hospitalized with COVID-19 in 6 countries, we