Genetic and Phenotypic Features of Schizophrenia in the UK Biobank

Key Points Question How do individuals with a diagnosis of schizophrenia recruited in a large volunteer-based research resource (UK Biobank) differ from those in the Psychiatric Genomics Consortium (PGC) or those recruited from clinical settings? Findings In this cross-sectional study including more than 517 000 individuals, liability to schizophrenia in the UK Biobank had a high genetic correlation with the PGC. Compared with 4 clinically ascertained schizophrenia samples, UK Biobank participants with schizophrenia had significantly lower schizophrenia genetic liability as indexed by polygenic risk score, lower rates of copy number variants, and fewer phenotypic features of poor outcome. Meaning In this study, individuals with schizophrenia in the UK Biobank had features of less severe illness, which indicates that registries such as the UK Biobank can help to capture the full range of heterogeneity in schizophrenia research.


L
arge population-based volunteer biobanks are increasingly being used to study human disease.Millions of participants across the world from newly available biobanks will be made available for research within the next 5 years.However, these samples are known to be subject to ascertainment biases, 1 in particular healthy volunteer bias.For example, of the 9.2 million people invited to participate in the UK Biobank, the 5.5% that participated are disproportionately female, socioeconomically advantaged, and White.They are also less likely to be obese or to smoke, report fewer health conditions, and have lower mortality rates. 2 While ascertainment bias clearly affects prevalence estimates, it has been argued that it does not affect exposure-disease associations or scientific inference. 3,4However, studies have shown that these biases can change effect sizes in genetic association studies 5 and impact downstream analyses, 1 and new methods are being developed to detect biases and offset them. 6t is unclear how these selection biases coupled with differing methods of identifying and defining affected status, such as the use of self-report and electronic health records, influence the features of schizophrenia cohorts identified through large population-based samples and how such cohorts will differ from clinically ascertained samples.Nonrandom participation does not just affect population-based cohorts.Clinically ascertained studies of serious mental illness, typically through secondary care, can be underrepresented for those who have difficulty obtaining such care due to socioeconomic and other causes of health care disparities. 7,8Moreover, they are also likely to be underrepresented for people with mild forms of the disorder who may not be referred to secondary care, much less hospitalized, while those with excellent clinical outcomes are likely to be discharged early from secondary care, biasing against secondary care or hospitalbased recruitment.
The UK Biobank offers the opportunity to learn lessons of general relevance for large-scale volunteer-based studies. 9hile the UK Biobank population as a whole has been well characterized, the genetic and phenotypic features of those with serious mental illness have not.Here, we investigate the extent to which schizophrenia as diagnosed in UK Biobank resembles schizophrenia in large genetic studies, as represented by those included in the Psychiatric Genomics Consortium (PGC) or as diagnosed in clinically ascertained samples.We compared genetic correlations of the UK Biobank with the PGC and compared polygenic risk scores (PRS), rates of copy number variation (CNV), and phenotypic features of individuals with schizophrenia in the UK Biobank with 4 independent UK-based samples.These findings are of general relevance to studies from other human biobanks, mental health cohorts defined from electronic health records, and other alternative sources.

Participants
Participants were included from the UK Biobank 10 (approximately 500 000 individuals) and 4 schizophrenia sample col-lections (approximately 14 000 individuals from CLOZUK, 11 767 from CardiffCOGS, 12 648 from Cardiff F-Series, 12 and 381 from Cardiff Affected Sib-Pairs 12 ) (Table 1).Genetic analyses included all samples, and phenotypic analyses included all samples apart from CLOZUK.This study followed the Strengthening the Reporting of Genetic Association Studies (STREGA) reporting guideline.
UK Biobank is a biomedical database and research resource of approximately 500 000 individuals from across the UK aged 40 to 69 years at recruitment (between 2006 and 2010). 10There are 4 sources from which a schizophrenia diagnosis can be detected in UK Biobank: self-report (field identifiers 20002 and 20544), International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) code F20 medical record diagnosis from hospital admissions (field identifiers 41202 and 41204) or death records (field identifiers 40001 and 40002), or an equivalent read code from primary care records (field identifier 130875).eAppendix 1 in Supplement 1 further describes these sources.We defined schizophrenia in UK Biobank as a schizophrenia diagnosis reported from at least 1 of these sources.A total of 1438 participants met 1 or more of these criteria at the time of analysis (eTable 1 in Supplement 1), which was based on data extracted in July 2021.Controls were defined as participants who had no indication of a psychotic disorder from the above sources (ICD-10 codes F21-29 inclusive).The North-West Multi-Centre Ethics Committee granted ethical approval to UK Biobank, and all participants provided written informed consent.This study was conducted under UK Biobank project numbers 13310 and 14421.
CLOZUK is an anonymized sample of approximately 14 000 individuals taking clozapine in the UK with a diagnosis of treatment-resistant schizophrenia, as previously described. 11Car-diffCOGS (n = 767), Cardiff F-Series (n = 648), and Cardiff Affected Sib-Pairs (n = 381) participants were recruited from community, inpatient, and voluntary mental health services in the UK. 12 The Cardiff Affected Sib-Pairs sample includes families with 2 or more siblings diagnosed with schizophrenia (or schizoaffective disorder, provided one of the siblings

Key Points
Question How do individuals with a diagnosis of schizophrenia recruited in a large volunteer-based research resource (UK Biobank) differ from those in the Psychiatric Genomics Consortium (PGC) or those recruited from clinical settings?Findings In this cross-sectional study including more than 517 000 individuals, liability to schizophrenia in the UK Biobank had a high genetic correlation with the PGC.Compared with 4 clinically ascertained schizophrenia samples, UK Biobank participants with schizophrenia had significantly lower schizophrenia genetic liability as indexed by polygenic risk score, lower rates of copy number variants, and fewer phenotypic features of poor outcome.
Meaning In this study, individuals with schizophrenia in the UK Biobank had features of less severe illness, which indicates that registries such as the UK Biobank can help to capture the full range of heterogeneity in schizophrenia research.had schizophrenia).ICD-10 code F20 or DSM-IV schizophrenia diagnoses in CardiffCOGS, Cardiff F-Series, and Cardiff Affected Sib-Pairs were based on Schedules for Clinical Assessment in Neuropsychiatry 13 interviews and lifetime psychiatric clinical case notes.All schizophrenia sample collections received UK National Research Ethics Service approval, and study participants provided written informed consent.

Statistical Analysis
Comparison With PGC We conducted a genome-wide association study (GWAS) of schizophrenia in the UK Biobank and used the results to calculate genetic correlations with the PGC samples.The GWAS compared participants in UK Biobank with schizophrenia with participants without any mental or behavioral disorder (defined as ICD-10 codes F00-F99 in field category 1712) to circumvent artificial enrichments in the genetic correlations with other psychiatric conditions.UK Biobank participants were genotyped on either the UK Biobank Axiom or the UK BiLEVE Axiom purpose-built arrays.Standard qualitycontrol procedures were applied prior to imputation using the Haplotype Reference Consortium panel, as previously described. 14,15Single-nucleotide variants (SNVs) were excluded using PLINK version 2.0 16 in line with thresholds used by the PGC 17 : minor allele frequency less than 0.01, Hardy-Weinberg equilibrium P values less than 1.00 × 10 −6 using the midp and keep-fewhet options for multipopulation datasets, imputation quality information score less than 0.9, and SNV call rate less than 0.95.Individuals with SNV missingness greater than 0.05 were excluded.
Association testing was based on the Scalable and Accurate Implementation of Generalized Mixed Model (SAIGE) method. 18The SAIGE method is appropriate when casecontrol numbers are unbalanced and/or in the context of population structure.The null logistic model was conducted on a reduced dataset of relatively independent SNVs (n = 90 684), created using PLINK's 16 pruning procedure (r 2 less than .05and 500-kilobase window).Covariates included in the null logistic model were the first 5 principal components, plus any principal components from the first 20 that were associated with schizophrenia, genotyping array, self-reported ethnicity, sex, and age at interview (individuals with schizophrenia were younger than unaffected controls).The leave-1-chromosomeout option was implemented to account for related individuals.Post-GWAS processing was conducted using FUMA GWAS version 1.5.0 (Department of Complex Trait Genetics at VU University Amsterdam) 19 to annotate and visualize the results.
Genetic correlations were calculated using linkage disequilibrium score regression 20,21 between the schizophrenia GWAS in UK Biobank and GWAS for schizophrenia, 17 bipolar disorder, 22 major depressive disorder (MDD), 23 attentiondeficit/hyperactivity disorder, 24 autism spectrum disorder, 25 anorexia nervosa, 26 cannabis use disorder, 27 alcohol use disorder, 28 and intelligence. 29Corresponding genetic correlations were also calculated for the PGC GWAS for schizophrenia, and differences with UK Biobank schizophrenia results assessed via χ 2 tests.Unless otherwise stated, significance was set at P < .05.All P values were 2-tailed.
A schizophrenia PRS was calculated in UK Biobank using a method consistent with the PGC 17 to allow comparison of the variance explained in schizophrenia case-control with the PGC and UK Biobank.The PRS was calculated via a clumping and thresholding approach in PRSicev2 30 for those of European genetic ancestry, as previously described. 14

Comparison With Clinically Ascertained Cohorts
We compared PRSs, rates of CNVs, and phenotypic features between cohorts.Unless otherwise stated, statistical analyses were conducted in R version 4.2.1 (The R Foundation).
PRS | CardiffCOGS, Cardiff F-Series, and Cardiff Affected Sib-Pairs were genotyped on the Illumina HumanOmniExpress version 8 or 12 (Illumina).CLOZUK samples were genotyped on either the Illumina HumanOmniExpress-12 or Illumina Hu-  11 For Cardiff University samples, quality control and imputation using the Haplotype Reference Consortium panel was conducted as part of the DRAGON-Data protocol. 31The steps taken to combine the genetic data from UK Biobank and our clinically ascertained cohorts to calculate PRS are described in eAppendix 2 in Supplement 1.A subset of SNVs from this combined dataset with low levels of linkage disequilibrium (r 2 less than 0.2 at 500-kilobase window) were used to identify unrelated individuals and to calculate principal components.The randomly selected unrelated individuals were identified using the Kinship-Based Inference for GWAS (KING) robust kinship estimator in PLINK.A kinship cutoff of 0.044 was used, equivalent to removing thirddegree relatives.Principal components were calculated using PC-AiR 32 from the GENESIS package.Plots comparing principal components by study showed no evidence of differences by study and genotyping array (eFigure 1 in Supplement 1).PRS were calculated for schizophrenia, 17 bipolar disorder, 22 MDD, 33 and intelligence 29 based on GWAS summary statistics that did not overlap with those in the present study.In collaboration with Cardiff University after permission from UK Biobank under project number 13310, the Schizophrenia Working Group of the PGC generated a custom GWAS that excluded UK Biobank participants (based on checksums derived from the genomic data) and the Cardiff University samples.Intelligence summary statistics were derived as part of a related project. 14,29Bipolar disorder 22 and MDD 33 summary statistics were obtained from the PGC.Summary statistics were cleaned using summaRygwasqc. 34Using all SNVs in the combined dataset, we used PRS-CS 35 and PLINK to calculate the PRS using the EUR UK Biobank reference dataset, 10 000 burn-in iterations, 25 000 Markov chain Monte Carlo iterations, and a φ value of 1 for schizophrenia and the default φ value for intelligence, bipolar disorder, and MDD.
We scaled the PRS in all samples using principal components 36 to allow comparisons regardless of ancestry.This approach was effective as demonstrated by eFigure 2 in Supplement 1, which displays the adjusted and unadjusted PRS in biogeographical genetic ancestry groups 37 (eAppendix 3 and eFigure 3 in Supplement 1).Pairwise comparisons for the PRS were made between individuals with schizophrenia in UK Biobank and other samples using logistic regression controlling for sex.A Bonferroni correction was applied (20 tests; P < .0025) to determine significance.We repeated analyses in individuals of European genetic ancestry as defined by biogeographical grouping to ensure results were consistent.
Schizophrenia-Associated CNVs | Details of CNV calling have been described for CLOZUK, 38,39 UK Biobank, 40 and CardiffCOGS. 39he Cardiff F-Series and Cardiff Affected Sib-Pairs samples were called as part of the DRAGON-Data protocol. 31One member from each third-degree (or more closely) related pair within each dataset was removed at random.As the CNVs of interest are rare, we combined the participants from CardiffCOGS, Cardiff F-Series, and Cardiff Sib-pairs.Analyses were restricted to individuals of European genetic ancestry, as defined in eAppendix 4 in Supplement 1, due to the low numbers of observations and because most individuals in the clinically ascertained schizophrenia samples were of European genetic ancestry.We compared the number of individuals in UK Biobank with schizophrenia that carried any of 12 schizophrenia-associated CNVs 38 (eTable 2 in Supplement 1) to the other samples using pairwise Firth logistic regressions covarying for sex.
Schizophrenia-Related Phenotypes | UK Biobank participants with schizophrenia were compared with those in CardiffCOGS, Cardiff F-Series, and Cardiff Affected Sib-Pairs and with controls in UK Biobank for phenotypes known to be related to schizophrenia, including demographic characteristics, education attainment, cognitive ability, and known psychiatric and physical comorbidities of schizophrenia.CLOZUK was not included due to the absence of relevant phenotypic data.It was not possible to include phenotypes from UK Biobank's mental health questionnaire due to the low completion rate in individuals with schizophrenia (14.5%), a return rate much lower than for the UK Biobank as a whole (31.5%).Comparisons were made only when equivalent definitions were available across samples, ie, were assessed using similar wording on their respective questionnaires and/or where responses could be harmonized into comparable categories.eTable 3 in Supplement 1 details each phenotype and its definition in each sample.Pairwise comparisons were calculated between schizophrenia cases in UK Biobank and the other samples using logistic regression controlling for sex and age at recruitment.A Bonferroni correction for the number of tests was applied (52 tests; P < 9.62 × 10 −4 ).Year of birth was also included for the education variables.Secondary analyses were conducted restricted to those of European genetic ancestry (eMethods 3 in Supplement 1).

Comparison With PGC
After quality control, a GWAS including 1363 individuals with schizophrenia and 358 774 controls from UK Biobank did not identify any genome-wide significantly associated loci (threshold P < 5 × 10 −8 ), as expected for a case sample of this size (genomic control λ = 1.03; eFigures 4 and 5 in Supplement 1).Schizophrenia in the UK Biobank had a genetic correlation with the latest PGC schizophrenia GWAS 17 that was close to 1 (r g = 0.98; SE, 0.18).The genetic correlations between UK Biobank schizophrenia and bipolar disorder (r g = 0.73; SE, 0.14), MDD (r g = 0.34; SE, 0.08), intelligence (r g = −0.14;SE, 0.06), or between any of the other neuropsychiatric disorders were not significantly different from the genetic correlations between those traits and the latest PGC schizophrenia GWAS study (Figure 1; eTable 4 in Supplement 1).

Schizophrenia-Related Phenotypes
Phenotypic features of the samples are displayed in Figure 3 and eTable 7 in Supplement 1.Rates of comorbid affective diagnoses for the UK Biobank participants with schizophrenia are described in eAppendix 5 in Supplement 1.Compared with the clinically ascertained schizophrenia samples, UK Biobank participants had patterns consistent with lower severity of illness (Table 2); they were less likely to be male, and male participants were more likely to have children (there was no difference in female participants).All cognitive indices, including educational attainment and cognitive ability, were higher in UK Biobank participants with schizophrenia.These participants had higher rates of current employment and an older self-reported age of onset of psychosis compared with the clinically ascertained samples.UK Biobank participants with schizophrenia had lower rates of smoking but equivalent rates of comorbid physical illness once age was adjusted for.
Compared with controls, UK Biobank participants with schizophrenia were more likely to be male, less likely to have been married or to have had children, had lower educational outcomes indexed by a high school qualification (General Certificate of Secondary Education) or a higher-education degree, and had lower cognitive ability as measured by fluid intelligence (Figure 3; Table 2).They had higher rates of depression, tobacco use, epilepsy, heart disease, and type 2 diabetes.Individuals with schizophrenia of working age had lower levels of current employment.
There was no evidence of an underrepresentation of individuals with schizophrenia from ethnic minority groups compared with CLOZUK or controls (eTables 8 and 9 in Supplement 1).All phenotypic analyses were consistent in analyses restricted to those of European genetic ancestry.

Discussion
We compared individuals with schizophrenia from UK Biobank with those with schizophrenia in the PGC and with 4 clinically ascertained schizophrenia research samples.Participants with schizophrenia in UK Biobank had the genomic and phenotypic features expected from Genetic correlations between the schizophrenia genome-wide association study in UK Biobank and Psychiatric Genomics Consortium (PGC) schizophrenia, 17 bipolar disorder, 22 major depressive disorder (MDD), 23   previous research but consistent with them being less severely affected.
Schizophrenia in UK Biobank had a genetic correlation of 0.98 (SE, 0.18) with the latest PGC schizophrenia GWAS. 17chizophrenia PRS explained 6.8% of the variance in liability for schizophrenia case-control status in those of European genetic ancestry in UK Biobank, which, while lower than the variance explained across the PGC samples as a whole (7.3% in all samples; 8.1% in those of European genetic ancestry), 17 is within the range of other schizophrenia PGC samples.The association between schizophrenia PRS and schizophrenia case-control status in UK Biobank (all samples: OR, 1.69; 95% CI, 1.59-1.78;European genetic ancestry: OR, 2.04; 95% CI, 1.92-2.17)was also comparable with estimates from the PsycheMERGE consortium 41 (OR, 1.55; 95% CI, 1.39-1.72)and US Veterans Affairs Health Care System 42 (OR, 1.56; 95% CI, 1.52-1.61).A 2023 study found the average schizophrenia PRS did not differ between individuals with schizophrenia identified via different diagnostic sources in UK Biobank. 43In addition, we observed phenotypic associations expected of schizophrenia, such as an excess of male sex, lower cognitive outcomes, low rates of current employment, and rates of physical health comorbidities in UK Biobank participants with schizophrenia comparable with epidemiological samples of schizophrenia. 44fter PGC schizophrenia, the next highest genetic correlation for UK Biobank participants with schizophrenia was with bipolar disorder 22 (0.73; SE, 0.14), the psychiatric disorder most genetically correlated with schizophrenia, and correlations with other psychiatric disorders were consistent with those from the PGC schizophrenia GWAS, 17 indicating that the genetics of the schizophrenia diagnosis in UK Biobank is compatible with others typically used in genomic studies.This is further supported by the strength of the schizophrenia PRS (OR, 1.69; 95% CI, 1.59-1.78)association with schizophrenia case-control status in contrast to bipolar disorder (OR, 1.20; 95% CI, 1.13-1.27)or MDD (OR, 1.06; 95% CI, 1.00-1.12).
Comparisons with clinically ascertained schizophrenia cohorts indicated that those with schizophrenia in UK Biobank likely represent less severely affected cases.Compared with the other schizophrenia samples, UK Biobank participants with schizophrenia had lower rates of male sex, higher cognitive ability and educational attainment, lower rates of smoking, older age of onset of psychosis, and higher current employment.Further, the rate of schizophrenia-associated CNVs and the schizophrenia PRS was lower in UK Biobank cases compared with Cardiff schizophrenia samples, although the latter is well within the range of values for individual studies included in the PGC. 17hese findings reported here almost certainly reflect, in part, ascertainment differences.It is likely that focusing on clinically ascertained samples in research may bias estimates toward more severe outcomes and that UK Biobank could offer an opportunity to study those with better outcomes.In addition, we found ethnic minority groups to be equally represented in UK Biobank participants with schizophrenia compared with CLOZUK participants or controls.While biobanks have advantages, they also have biases and tend to undersample individuals with serious mental illness and hence are an inefficient way to recruit large numbers of representative schizophrenia cases.Further, many phenotypes routinely collected in clinical schizophrenia cohorts were not available in UK Biobank, and most people with schizophrenia did not complete online follow-up questionnaires, such as the mental health questionnaire.Given future studies of the genetic basis of heterogeneity in schizophrenia will require both large numbers and high-quality assessments, targeted cohorts will still be needed, but these could be enhanced by the use of linked electronic medical records. 45There is an inevitable added cost for studies recruiting individuals with serious forms of mental illness, but this is essential if we are to base our research on representative samples and be able to generalize our findings.Our findings have important implications for schizophrenia research conducted within and outside of UK Biobank.They indicate the need to integrate both cases recruited from secondary mental health services, which will be weighted toward more severe outcomes, and those from biobank resources, which will capture a higher proportion of less severely affected cases, to encapsulate the full spectrum of schizophrenia.

Limitations
This study has limitations.In this article, we selected a pragmatic definition of schizophrenia that will be applicable to other biobank studies.More sophisticated definitions based on diagnostic algorithms or machine learning approaches are being developed and could offer further advantages to the field in the future.This study was conducted within the UK, and many of the individuals were identified from linked medical records, so results will need replication to ensure generalizability to other biobanks, countries, and health care settings.The small sample size for CNV analyses meant that power to demonstrate significant differences was low.

Conclusions
Individuals with schizophrenia in UK Biobank have genomic and phenotypic features consistent with expectations for those with a diagnosis of schizophrenia but represent those less severely affected.The inclusion of such cases in wider schizophrenia studies has the potential to enhance representation of the spectrum of illness severity.

Table 1 .
Cohort Descriptions Abbreviations: NA, not applicable; PGC, Psychiatric Genomics Consortium; TRS, treatment-resistant schizophrenia.aCLOZUKage at recruitment is estimated from CLOZUK2 only and from the age at registration with Leyden Delta's monitoring system.Genetic and PhenotypicFeatures of Schizophrenia in the UK Biobank Original Investigation Research jamapsychiatry.com(Reprinted) JAMA Psychiatry Published online March 27, 2024 E3 Downloaded from jamanetwork.comby guest on 06/18/2024 manOmniExome-8 array.
attention-deficit/ hyperactivity disorder (ADHD), 24 autism spectrum disorder (ASD),25anorexia nervosa, 26 and intelligence.29Comparisoncorrelations with PGC schizophrenia are given in second row.The color of each box indicates the magnitude of the correlation.Statistics and statistical comparison between the correlations are provided in eTable 5 in Supplement 1.
Genetic and Phenotypic Features of Schizophrenia in the UK Biobank Original Investigation Research jamapsychiatry.com(Reprinted) JAMA Psychiatry Published online March 27, 2024 E5 Downloaded from jamanetwork.comby guest on 06/18/2024

Table 2 .
Phenotypic Comparisons Between UK Biobank Participants With Schizophrenia and Each Cohort a Abbreviations: GCSE, General Certificate of Secondary Education; ICD-10, International Statistical Classification of Diseases and Related Health Problems, Tenth Revision; NA, not available; OR, odds ratio.a Results from pairwise regressions between UK Biobank participants with schizophrenia and each of the other samples for each phenotype assessed.ORs refer to risk in UK Biobank participants with schizophrenia; if greater than 1, this indicates higher rates (or higher values for continuous phenotypes) of said phenotype in these participants compared with the other sample.Corresponding proportions and means are presented in eTable 7 in Supplement 1. Onset of psychosis (n = 638) and cognitive ability (n = 451) were only available for a subset of UK Biobank participants with schizophrenia.b Below the threshold the software could calculate.Cognitive ability defined in the UK Biobank from fluid intelligence (field identifier 20016) and from MATRICS composite score in CardiffCOGS (eTable 3 in Supplement 1).d Depression defined as ICD-10 codes F32 and F33. c