Prevalence, Comorbidity, and Sociodemographic Correlates of Psychiatric Disorders Reported in the All of Us Research Program

This cross-sectional study examines electronic health record data from the All of Us Research Program to measure prevalence, correlates, and overlap between psychiatric disorders.


SENSITIVITY ANALYSES OF EHR-BASED PHENOTYPES
Using electronic health records (EHR) for research is a rapidly advancing area. Prior work in medical informatics has shown that using EHR based diagnoses and other administrative data to derive cases can vary across the type of disorder one considers, with more severe disorders generally demonstrating the most reliable measures 23 . Previous work in the PsycheMERGE consortium has used the standard of two or more registered ICD codes as the definition for being considered a "case" 18 .
In order to determine whether more stringent definitions of being a "case" altered the prevalence of psychiatric problems in the All of Us database, we ran a series of sensitivity analyses. We restricted the definition of having disorder from 1 or more registered ICD codes and up to 4 or more registered ICD codes. Across all disorders, the total number of "cases" (not mutually exclusive) dropped as thresholds became more restrictive (Total cases 1+ ICD code(s) = 202,422; Total cases 2+ ICD codes = 137,265, Total cases 3+ ICD codes = 106,737; Total cases 4+ ICD codes 88,093). eFigure 2 presents the drop in those meeting criteria of being a "case" as the threshold becomes more restrictive for each disorder included in the analysis.
Overall, each disorder follows a similar pattern regardless of the prevalence of that disorder (see eTable 3 for exact counts by disorder). Additionally, when we look at the patterns of comorbidity across ICD thresholds (eFigure 3), we see that these patterns are largely unchanged from using any diagnosis, with the exception that mood disorders became slightly more prevalent than SUD for those with only one registered diagnosis. The main difference is related to the difference in the raw prevalence across thresholds above: as we are more restrictive, fewer participants meet criteria for a given disorder. Regardless of the threshold for defining cases, we see that the correlation between disorders is approximately the same (eFigure 4). Finally, estimates for risk across sociodemographic covariates is relatively stable to inclusion threshold (see eTable4 and eTable5).
Use of diagnoses based on single ICD codes represent a "best-case scenario" for those interested in psychiatric disorders in the All of Us biobank. We have no "gold standard" measurement (such as a physician chart review or Structured Clinical Interview for DSM Disorders confirmed diagnoses 24 ) to validate each threshold. However, future research can leverage the genetic data to compare genetic correlations across thresholds with results from published genome wide association studies (GWAS) of confirmed cases. For example, research with the Alcohol Use Disorder Identification Test (AUDIT), has followed this approach and identified AUDIT thresholds that are most likely to capture those who meet criteria for AUD 25 . Prev

SOURCES FOR PREVALENCE OF PSYCHIATRIC DISORDERS
The prevalence of each disorder is lower than the those from nationally representative samples [6][7][8][26][27][28][29][30][31] . Below are the sources for estimates of prevalence in the general population. * p < .05/6 = .0083 OR = odds ratio; SE = standard error; MOOD = any mood disorder; ANX = any anxiety disorder; SUD = any substance use disorder; STRESS = any stress-related disorder; SCZ = schizophrenia; PERS = any personality disorder. All estimates conditional on all other covariates included in the model.