Sex Differences in Genetic Associations With Longevity

Key Points Question Are there sex differences in genetic associations with longevity? Findings In this case-control study of 2178 cases and 2299 controls who were Chinese with Han ethnicity, sex-specific genome-wide association study and sex-specific polygenic risk score analyses on longevity showed substantial and significant differences in genetic associations with longevity between men and women. Findings indicated that previously published genome-wide association studies on longevity identified some sex-independent genetic variants but missed sex-specific longevity loci and pathways. Meaning These novel findings contribute to filling the gaps in the research literature, and further investigations may substantially contribute to individualized health care and more effective and targeted health interventions for male and female elderly individuals.

implemented in PLINK.
We performed principal components analysis (PCA) to evaluate genomic correlations in our dataset on the basis of pairwise identity by state (IBS) for all of the successfully genotyped samples using PLINK (1.06). We used the filter option (MAF >0.01 and genotyping rate >0.9) as discussed above; about 80% of the SNPs were used to calculate IBS. After excluding the 15 outliers, our PCA showed that the cases and controls in this study were of Han Chinese ancestry and were well matched, without evidence of gross population stratification. In general, cases and controls were not separated on the basis of the first two principal components, and the cases and controls are evenly distributed in clusters of PC1 vs. PC2, PC1 vs. PC3 and PC2 vs. PC3 (see Supplementary Fig. 1 of ref. 19). We computed the inflation factor (λ), a metric describing genome-wide inflation in association statistics [20] , both before and after correcting for stratification using PLINK (1.06). Values of λ after correcting along 0, 1, 2, 3, or 4 eigenvectors are 1.109, 1.0241, 1.0235, 1.0226, or 1.0221, respectively, demonstrating that the correction is important and the top two eigenvectors correct nearly all of the stratification that can be corrected using the 4 eigenvectors. Consequently, in the subsequent GWAS regression models, we adjusted for the top two eigenvectors to minimize the effects of population stratification, following the approach adopted in the reference [21] . The genomic inflation factors (λ) in the South, North and combined datasets were 1.022, 1.010 and 1.022, respectively, indicating that the effects of population stratification on genetic analysis are well controlled [9] . The Manhattan plots and quantile-quantile plots for male/female and North/South datasets indicate that there is no inflation of the associations (eFigures 1-4).
Finally, after various stages of sample filtering, in total 4,477 of the CLHLS participants were included in the GWAS dataset (eTable 1), including 564 male and 1614 female centenarians (male mean age 101.5 with SD 3.58 and female mean age 103.1 with SD 3.41) and 773 male and 1526 female middle-aged controls (male mean age 48.7 with SD 6.68 and female mean age 48. times as large as that of the next largest GWAS on longevity. We also conducted quality-control filtering of the GWAS data from the total 4,477 individuals. Following the standard adopted in the published studies of GWAS using the same 900k SNPs Illumina HumanOmniZhongHua-8 BeadChips [22] as the one used in the present study, SNPs with call rates of less than 90% were removed from our GWAS dataset; SNPs were also excluded if they had a MAF less than 1% or if there was significant deviation from Hardy-Weinberg equilibrium in the samples, defined as P < 10 -5 . SNPs on the X and Y chromosomes and mitochondria were removed from the initial statistical analysis. Only 242 out of 743 (~30%) GWAS conducted from 2005 to 2011 considered chromosome X in their analyses [23] . Wise et al. [23] discussed several reasons for the exclusion of chromosome X, including a lower proportion of genes on chromosome X and a lower coverage of chromosome X on current genotyping platforms compared with autosomal coverage. They also described a number of technical hurdles that might add to the reluctance to include chromosome X in a GWAS, including complications in genotype calling, imputation, selection of test statistics, and the lack of readily available implementations. Accordingly, we excluded the chromosome X. After quality filtering and cleaning, 818,084 genotyped autosomal SNPs remained for association analysis in our final GWAS dataset.
To further increase genome coverage, we performed imputation analysis to infer the genotypes of all autosomal SNPs (MAF≥0.01) using IMPUTE software version 2 [24] and the 1000 Genomes Project integrated phase 1 release as reference panel. SNPs with a quality score (Rsq) of <0.9 were discarded before analysis. After standard GWAS quality-control filtering for subjects and SNPs as described above, we obtained data for 5,595,657 autosomal SNPs (818,084 genotyped SNPs and 4,777,573 imputed SNPs) in 2,178 centenarian cases and 2,299 middle-aged controls for the GWAS analyses.

S3.2 Stratifying South and North regions of China for discovery-evaluation analysis
China is considered to have geographically and traditionally distinct South and North regions that have existed for a few thousand years, although there is no legal or official administrative meaning to this division. As reviewed in Xu et al. [19] , previous analyses of anthropological, linguistic, and genetic data . Given the nature of our population analysis and due to the fact that all of the samples analyzed in this study belong to the same ethnic group of Han Chinese living in one country with the same culture, we also performed a combined analysis comparing all male/female centenarians with all male/female controls, respectively. The genomic inflation factors (λ) in the South and North datasets and the combined dataset were 1.022, 1.010 and 1.022, respectively, suggesting that the association test statistics conformed to the underlying null distribution and would not require further adjustment for genomic control.

S4. A bi-directional discovery-evaluation approach
In the traditional uni-directional discovery-evaluation approach, the entire sample is divided into two datasets, with one dataset used for discovery and the other dataset for replication or evaluation. The value threshold are analyzed in the evaluation dataset. The SNPs with a P value lower than the threshold (e.g., P <10 -4 ) in the discovery stage and nominal significance (e.g., P <0.05) in the evaluation stage are identified as significant and replicated SNPs; the significant and replicated SNPs are then examined through association analysis using the discovery-evaluation combined dataset.
This traditional uni-directional discovery-evaluation approach is especially useful when the expensive GWAS serves as the first stage of discovery and the much less expensive second stage of evaluation genotypes only the top SNPs with a P value lower than the threshold found in the discovery stage.
However, when analyzing two available independent GWAS datasets, such as our present study, the uni-directional discovery-evaluation approach of fixing one GWAS dataset as discovery and another GWAS dataset as evaluation would have a much higher false-negative rate, missing a substantial number of significant and replicated SNPs which have a p-value higher than the threshold and lower than the nominal significance level in the fixed discovery GWAS dataset but reach the threshold significance level in the fixed evaluation GWAS dataset.
To avoid the high false-negative rate and to fully utilize all information in the sex-specific independent North and South GWAS datasets available to us, we applied a novel bi-directional discovery-evaluation approach [25] in our sex-specific GWAS and sex-specific PRS to identify the sexspecific loci significantly associated with longevity. Following this approach, we first analyzed the sexspecific North region GWAS dataset as discovery sample and the sex-specific South region GWAS dataset as evaluation sample (or target sample in sex-specific PRS analyses); we then reversed the process and analyzed the sex-specific South region GWAS dataset as discovery sample and the sexspecific North region GWAS dataset as evaluation (or target) sample. The results presented in Tables 1 and 2 indicate that our bi-directional approach identified 11 male-specific loci and 11 female-specific loci (in total 22 sex-specific loci) significantly associated with longevity in one sex but not significant in other sex and replicated across North and South regions. The results also show that if we adopted the classic uni-directional approach which fixes North as discovery and South as evaluation without reversing the process, 10 (45.5%) of the 22 sex-specific loci would be missed because they have a would be missed by the classic uni-directional approach include the last 3 loci in Table 1 and last 7 loci   in Table 2). Clearly, our bi-directional approach substantially reduces the false-negative rate. At the same time, the overall false positive rate may increase as we identified more candidate sex-specific loci. However, this drawback is outweighed by the benefits, because the bi-directional discoveryevaluation approach substantially reduces the number of false negatives and creates a much more complete list of candidate sex-specific loci for further assessment in the sex-specific PRS analysis compared to the traditional uni-directional approach.
S5. The procedure of PRS analysis to identify truly sex-specific loci jointly associated with longevity using a more relaxed P T threshold As presented in Tables 1 and 2, we discovered 11/11 male/female specific longevity loci, which were replicated between North and South samples with a prior threshold of P<10 -3 in the discovery step and had a P<10 -5 in one sex but P>0.05 in the other sex and P<0.05 for the loci-sex interaction effects, using the North-South combined dataset. In addition to these findings, our sex-specific GWAS also found that 47/34 male/female loci, which were associated with longevity and replicated across North and South samples with a more relaxed prior threshold of P<0.01 in the discovery step and had a 10 -5 ≤P<10 -4 in one sex but P>0.05 in the other sex and P<0.05 for the loci-sex interaction effects, using the North-South combined dataset. We did not find any loci associated with longevity with a P<10 -3 in both sexes but with effects in opposite directions.
The results presented in sections A1, B1 and C1 show that sex-specific joint associations with longevity of the 11/11 male/female loci were replicated across North and South samples; namely, they were jointly and highly significantly associated with longevity in one, but not jointly associated with longevity in the other sex; the PRS-sex interaction effects were highly significant, either using North sample as discovery and South as target, or vice versa. However, our PRS analysis indicated that the 47/34 male/female candidate loci are jointly and highly significantly associated with longevity with P<10 -8 in one sex and also jointly and significantly associated with longevity in the other sex (P=0.007~0.001). Thus the groups of 47/34 male/female loci cannot be claimed as sex-specific longevity loci because they are also jointly and significantly associated with longevity in the other sex.
The joint and significant associations with longevity of these candidate loci in the other sex are because they also include some not-truly sex-specific loci, each of which is individually and slightly associated with longevity with a P larger than 0.05 and less than a P-threshold, but their joint effects are significant in the other sex. Thus, we need to filter and exclude those not-truly sex-specific loci from the 47/34 male/female loci in order to identify the groups of true sex-specific longevity loci.
Following a trial and error approach for selecting an ideal P T threshold to provide the best-fit PRS using the PRSice method and software [26] , we selected the P threshold in the other sex to exclude the not-truly sex-specific loci. As shown in panels A2, B2 and C2 of Table 4, among the 47/34 male/female candidate loci, we identified 35/25 male/female specific loci which were also true sexspecific longevity loci, after filtering out the not-true sex-specific loci with the P T threshold in the other sex identified by the PRSice. The North-South replicated 35 male-specific loci had a 10 -5 ≤P<10 -4 in males but P >0.25 in females and P<0.05 for the loci-sex interaction effects; the North-South replicated 25 female-specific loci had a 10 -5 ≤P<10 -4 in females but P >0.35 in males and P<0.05 for the loci-sex interaction effects.
We also tried but could not identify much larger sex-specific groups of loci with P<10 -3 or P<0.05, which are jointly and highly associated with longevity in one sex but are not jointly significant in the other sex and the sex-specific association replicated across North and South datasets. This is likely because inclusion of much large number of loci with P<10 -3 or P<0.05 could lead to overfitting -adding complexity but increasing bias [27] .

S6. Power estimates
Using the QUANTO (1.1) software, we evaluated the power of the sex-specific GWAS to detect genetic effects on longevity for samples of 564 male and 1,614 female centenarians and 773 male and 1,526 female middle-aged controls to achieve genome-wide significance (P=5×10 -8 ) and suggestive significance (P=5×10 -7 , P=5×10 -6 , P=5×10 -5 , P=5×10 -4 , P=5×10 -3 ) levels for different minor allele frequency (MAF), respectively. We used an additive genetic model and assumed that the prevalence of living to 100 years of age among male and female Han Chinese born in 1898-1908 is 2.33/10 6 and 7.83/10 6 , respectively, based on estimates from previous study [18] and the ratio of male centenarians to female centenarians from the Chinese census [28] . The detailed power estimates results and parameters used are presented in eTables 6a-6b and indicate that our female-specific GWAS has a rather good power and the power of our male-specific GWAS is also acceptably good.
We conducted the power estimates for the sex-specific PRS analyses on longevity using the AVENGEME R program [29] . The detailed results and parameters used in the power estimates are presented in eTable 7 and indicate that the power for both male-specific and female-specific PRS analyses are excellent: 0.997~0.999 for males and 1.00 for females. Morphogenesis of an organ. An organ is defined as a tissue or set of tissues that work together to perform a specific function or functions. Morphogenesis is the process by which anatomical structures are generated and organized. Organs are commonly observed as visibly distinct structures, but may also exist as loosely associated clusters of cells that work together to perform a specific function or functions.

POSITIVE REGULATION OF DEVELOPMENTAL PROCESS
Genes annotated by the GO term GO:0051094. Any process that activates or increases the rate or extent of development, the biological process whose specific outcome is the progression of an organism over time from an initial condition (e.g. a zygote, or a young adult) to a later condition (e.g. a multicellular animal or an aged adult).

REGULATION OF MULTICELLULAR
Genes annotated by the GO term GO:0051241. Any process that stops, prevents or reduces the frequency, rate or extent of an organismal process, the processes pertinent to the 0.0040 0.033724137 2/11/32 ORGANISMAL PROCESS function of an organism above the cellular level; includes the integrated processes of tissues and organs.

REGULATION OF APOPTOSIS
Genes annotated by the GO term GO:0042981. Any process that modulates the occurrence or rate of cell death by apoptosis.
Significant genes:  APOE, BIRC5, SAP30BP, HTT, IFI16, IL2RB,  ADAMTSL4, SMAD3, ARHGDIA, ABL1, CFL1,  TNFSF10, TAX1BP1, DNAJB6, VEGFA,  Genes annotated by the GO term GO:0048646. The process pertaining to the initial formation of an anatomical structure from unspecified parts. This process begins with the specific processes that contribute to the appearance of the discrete structure and ends when the structural rudiment is recognizable. An anatomical structure is any biological entity that occupies space and is distinguished from its surroundings. Anatomical structures can be macroscopic such as a carpel, or microscopic such as an acrosome.

REGULATION OF DEVELOPMENTAL PROCESS
Genes annotated by the GO term GO:0050793. Any process that modulates the frequency, rate or extent of development, the biological process whose specific outcome is the progression of a multicellular organism over time from an initial condition (e.g. a zygote, or a young adult) to a later condition (e.g. a multicellular animal or an aged adult). Notes: (1) In panel (C), the estimates of P of (loci x sex) interaction effects are based on logistic regressions including the loci, sex and (loci x sex) interaction term, using the North-South combined dataset; the estimates of P and OR of the main effects of the loci in males and females are not presented in panel (C) because they are almost exactly the same as the P and OR of the loci in panels (A-3) and (B), respectively, estimated based on the male and female datasets separately, which are statistically expected.  True  True  True  True  True  True  weighted  True  True  True  True  True  True  binary  True  True  True  True  True  True  lambdaS  --------- The following definitions of the parameters were taken from the Users' Manual of the AVENGEME software [29] 1) nsnp: number of independent SNPs in the GWAS. Clumping was performed from the total 5,595,657 SNPs using PLINK to retain SNPs with R2<0.1 within 250 kb windows. 2) discovery/target: size of discovery sample / size of target sample. 3) vg1: proportion of trait variance explained by the entire set of SNPs in the discovery sample. 4) cov12: covariance between genetic effects in the discovery and target samples, estimated by the estimatePolygenic Model function in AVENGEME.