Molecular Diversity of Clinically Stable Human Kidney Allografts

Key Points Question Can immunologic heterogeneity be identified in histologically stable kidney allografts? Findings This prognostic study used 28 public kidney transplant data sets with 2273 kidney tissue to develop and validate an unbiased, 6-gene and 5-cell-type transcriptional Instability Score to provide histology-independent reclassification of human transplant samples. Using this score, 46% of histologically stable samples were found to have molecular evidence of rejection, which was validated by an independent cohort that showed undiagnosed graft rejection and poor projected graft function survival. Meaning These findings suggest that the Instability Score could provide an important adjunct for comprehensive and highly quantitative phenotyping of protocol kidney transplant biopsy samples and could be integrated into clinical care for accurate estimation of subsequent patient clinical outcomes.

Data processing and normalization. Several pre-processing steps were applied prior to the main analysis. Raw fluorescence intensity data stored in .CEL or .txt files were downloaded and pre-processed depending on the platform. The data processing included background correction, log2 transformation, quantile normalization and probe to gene mapping using R language version 3.5.1 3 . For the Affymetrix platform, we used the R package SCAN.UPC 4 available at Bioconductor 5 (http://ww.bioconductor.org). In contrast to some other popular multi-array normalization algorithms like RMA 6 that estimate probe-level effects and standardize variances across arrays based on the information from a whole dataset, SCAN.UPC is a single-array method that normalizes every sample independently from other samples. This is considered as an advantage 4 since this approach is robust to any influence from possible outliers in the data. The database for the mapping between probes and Entrez gene IDs were taken from the BrainArray resource 7 version 22 (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/22.0.0/entrezg.asp). For the Agilent and Illumina platforms, we downloaded non-normalized raw data and performed data processing using neqc() function within limma package 8 from Bioconductor. This algorithm 9 estimates parameters based on normal-exponential (normexp) convolutional model with joint likelihood estimation and with the help of negative control probes. The offset 16 was added to the intensities after the background adjustment by default, as it was shown as the most optimal value to improve FDR of the normexp algorithm 8 . However, some data from the Agilent and Stanford platforms did not contain any negative control probes. Therefore, we used similar processing steps manually to reproduce the methodology by applying backgroundCorrect() function from limma package by the mle normexp The Quantile Normalization technique is the same method used for normalization at the probe level, but as it was shown before 21 it can be applied for batch effect elimination as well.
The RUV method 22 is based on adjustment on so-called negative control genes, that are expected not to be differentially expressed across phenotypes. There is still an open question which set of genes would be most suitable for this role. Basically, so far there are two possible ways to obtain a list of control genes: housekeeping genes and empirically found ones. The housekeeping genes are defined to be expressed on the similar level across all tissues and involved in basic cell processes 24 . However, besides the fact that there is no fully established list of housekeeping genes 25 , it was shown that they might not be the best to represent negative controls in adjustments 22 since they can be differentially expressed in some tissues 12 or related to diseases 26 . Another method is to empirically find genes that are expressed steadily across merging studies. However, there is some freedom in setting a number of such genes. Too small number can be not enough for proper adjustment, too largecan include some genes important for a current study. Some dependence on studies involved is presented in the search for negative control genes. For some discussion of advantages and disadvantages of both methods see reference 27 . Another challenging factor in applying the RUV method is the parameters adjustment. Depending on implementation of the algorithm, there are two main parameters to adjust: the dimension of the unwanted systematic noise and the ridge smoothing parameter. To find an optimal set of parameters is actually a tricky problem. In our study, we examined three different implementations of RUV method in R packages RUVcorr 27 , RUVnormalize 28 , and RUVSeq 29 (RUVg function). The first two packages implement naïve RUV-random method which is a variation of the RUV-2 method originally described in 22 . The third method RUVg within RUVseq package is originally designed as a discrete version of RUV-2 methods for RNA-seq counts data. We used this method for performance comparison with other approaches. We compared the performance of these RUV implementations with the housekeeping and empirical negative control genes. We also varied the parameter of the noise dimension k but set the smoothing ridge parameter to 0.001 for RUVnormalize and 0 for RUVcorr.
Another promising normalization method we examined is Harman 23 . This method is a Principal Component Analysis based optimization technique that maximizes batch removal but keeps some probability of the overcorrection as a parameter. We compared the method performance with the overcorrection parameter set to 0.95.
We considered three types of benchmarks to perform the comparison for normalization methods. We computed the percentage of variability in first ten principal components that can be explained by batch (i.e. by dataset study) and kept the maximum value among those ten as one of three benchmarks. For the other two we used the R package gPCA 30 to compute guided principal components (i.e. the principal components of batch modified data) and obtained a p-value (i.e. the probability of having batch effect in data) and delta statistic (i.e. the ratio of guided and unguided first principal component; the lower the better). Since all three metrics are in the range from 0 to 1, we sumed them and normalize to one and used as a final metric to justify the normalization method performance. The results are represented in the eFigure 7.
We found the Harman normalization to outperform all methods followed by ComBat and RUV implementations in RUVcorr and RUVseq with housekeeping genes at the noise dimension parameter k = 15-17. Further comparison analysis showed that while the data seems batch free, RUV and Harman adjusted a bit too much leaving minimum of biological variability, resulting in much fewer differentially expressed genes in comparison to ComBat. Therefore, we decided to use ComBat for cross-study normalization to keep as much heterogeneity as possible while successfully adjusting for batch effects. We observed that the results of the normalization performance vary depending on datasets: their number, platform, processing methods. There is no one-fit-all technique that should be used blindly to correct for batch effects when merging data. Therefore, it should be advised to perform such comparisons of normalization methods each time performing meta-analysis to choose best among available methods 14 .
Statistics. To identify differentially expressed genes in the first analysis AR vs Normal we used the Significance Analysis of Microarrays (SAM) 31 method that was implemented in the R package siggenes 32 . We utilized the false discovery rate (FDR) 33 with Benjamini-Hochberg procedure 34 for multiple testing correction and use the adjusted cutoff of 0.05. For the second level of significance, we selected only those genes that have the fold change greater than 1.5.
Pathway Analysis. We leveraged the Gene Ontology database using the gene set enrichment analysis implemented in the R package clusterProfiler 35 to perform functional annotations for the significantly up-and down-regulated genes. We used FDR multiple correction method with the enrichment significance cut-off at level 0.05. For the gene network analysis, we utilized the STRING protein-protein association networks database 36 (https://string-db.org).
Cell type enrichment analysis. In order to estimate the presence of certain cell types in biopsy samples, we leveraged a recently published cell type enrichment tool xCell 37 . xCell leverages gene expression data from microarray and RNA-seq experiments and is used to perform enrichment analysis for up to 64 immune and stromal cell types. We focused on 34 immune related and 11 non-immune cell types (eTable 3) that we selected manually as relevant to the transplant injury process. In our analysis, we used a dedicated R package that is available on the author's GitHub account for this purpose. This analytical method is a gene signatures-based method and converts the gene expression into cell type enrichment scores. The authors especially emphasize that this is not a deconvolution method that provides percentage of cell types containing in a tissue but rather the enrichment tool allowing to compare samples for each cell type but not otherwise. The enrichment scores for each cell type were used to compare AR and Normal samples and to identify cell types that are significantly different in individuals with AR as opposed to normal controls by performing the non-parametric two-sample Mann-Whitney-Wilcoxon statistical test. We utilized the multiple testing correction by using the Benjamini-Hochberg method. Adjusted p-value < 0.05 was used as the threshold.
Feature selection procedure. In our efforts to select the most important features in distinguishing AR vs Normal samples, we performed the following steps. First, we split the whole data into training and testing sets in the ratio 80:20 and performed all feature selection procedures on the training set with benchmarking on the testing set. After identifying the significant features from a statistical test described above on the training set, we searched for features that correlate with the outcome no less than 2/3 of a maximum correlation value. In the final step, we applied Recursive Feature Elimination (RFE) technique with Random Forest (RF) model from the R package caret 38 . We used 5-fold cross validation (CV) technique with 100 repeats. To benchmark the model, we used the area under the ROC curve that is more suitable for data with some disbalance in outcome -in our case the ratio AR:Normal is 0.84. To decrease the bias of random split as well as to avoid the model overfitting, we also introduced the tolerance of 1% to the feature selection mechanism, i.e. the algorithm was choosing the simplest model with the smallest number of features that performs within range 99-100% of the best model. To perform the steps described above, we adopted the R package feseR 39 and modified it to implement the parallel computations, the AUROC metric for model benchmarking, and the tolerance parameter of model performance. After the feature selection steps, we benchmarked the features with RF model on the testing set.
Instability Score and hSTA sub-phenotyping. The method of sub-phenotyping hSTA samples is formed on creating a scoring system based on selected features and scoring the hSTA samples. Based on the score values, histologically STA samples are identified as molecularly AR or molecularly STA. We denoted this split as hSTA/mAR and hSTA/mSTA, respectively.
For our current analysis, two types of data were available: gene expression data and cell type enrichment scores (obtained computationally using xCell). Based on these two data types, we performed the feature selection procedure described above to find sets of genes and cell types that highly associated with AR. Next, we z-scaled each feature, a gene or a cell type, and built a logistic regression model with all features to identify feature importance as model coefficients. Using these coefficients, we created a linear formula to compute a score, that we called the Instability Score (or InstaScore):