Artificial Intelligence Algorithms to Assess Hormonal Status From Tissue Microarrays in Patients With Breast Cancer

Key Points Question Can molecular markers of cancer be extracted from tissue morphology as seen in hematoxylin-eosin–stained images? Findings In this diagnostic study of tissue microarray hematoxylin-eosin–stained images from 5356 patients with breast cancer, molecular biomarker expression was found to be significantly associated with tissue histomorphology. A deep learning model was able to predict estrogen receptor expression solely from hematoxylin-eosin–stained images with noninferior accuracy to standard immunohistochemistry. Meaning These results suggest that deep learning models may assist pathologists in molecular profiling of cancer with practically no added cost and time.


eMethods 2. Practical Considerations for Combining Image Scores Into a Single Patient Score
The logistic regression obtains a set of features encoding a single H&E image and outputs its r score. Given k images with output scores r1,r2,…,rk belonging to the same patient, we computed the combined patient's r score similarly to [1]. Namely, we averaged the feature set across all images belonging to this patient and then applied the logistic regression. This could be interpreted as concatenating all TMA images of the patient and applying the pipeline on the concatenated image. In practice, this procedure is equivalent to averaging the prediction scores li before the sigmoid operation To change the resolution according to the parameter R, all images were resized to R × R pixels using a standard bicubic interpolation. To change the cut size according to the parameter C, a square of size C × C pixels was cut out from the center of each image. To change the number of TMA images per patient according to the parameter S, for each patient, S of the images belonging to the patient were selected randomly. Then, only these images were taken into consideration in the prediction process. To change the number of patients according to the parameter P, P patients were selected randomly to form a new dataset. Each of the parameters P,R,C,S was changed at a time while the rest were kept fixed at their maximal values. Since for the parameters P and S a random selection was used in the process, we repeated the experiment N times and averaged the results. The number of repetitions N was set such that the 95% confidence interval of the resulting average was less than 1% around the average value.

eMethods 4. A Full Description of the MBMP Process
The MBMP process consists of four stages. The first three stages are aimed at constructing the model, and are done only once per cohort, per cancer type, and per marker. The fourth stage is a decision module that outputs the final prediction based on the patient's H&E-stained images.

Data collection
In the first stage, digital H&E-stained images should be collected of a set of patients from the cohort. Our experiments show that with better resolution, larger cut-size, larger number of patients and especially larger number of TMA images per patient, the system's performance is likely to improve. In our work we used TMA images with size 512 × 512. If the work is done on whole slide images (WSI) and not on TMAs, we recommend extracting nonoverlapping patches from each WSI and treat them as multiple TMA images. The molecular expression of the molecule in question should be collected for each patient. In our work we focus on binary labels (positive/negative status). However, we showed that the actual percentage is expressed as well and not only the binary status, and thus could probably be predicted given enough data. The molecular expression labels of the patients are then assigned to their H&E images. The patients (and their images) in the collected dataset should be split to a training and validation set. For both cohorts in our work, we used 5/6 of the patients in the training set and the rest of the 1/6 patients for the validation set. A different setting may be more beneficial for different types datasets.
2. Training or using an existing cohort-specific and a cross-cohort ResNets Once H&E images and their corresponding molecular expression labels are collected, the next step is to train the cohort-specific ResNet to predict the labels from the H&E images. The architecture of the ResNet we used is described in eFigure 3A. We used the common ResNet architecture [2] without any architectural changes. The ResNet should be trained using the training set. If a Cross-Cohort ResNet that was already trained on several other cohorts is available, it might be sufficient to use the existing one. Otherwise, to improve the performance, another Cross-Cohort ResNet should be trained to predict the label from H&E belonging to all available cohorts (including the current one). This ResNet should not be trained on the validation set of the current cohort.

Training the logistic regression on a validation set
In the next step, the two ResNets should be applied to the images belonging to the validation set of patients that were not seen so far. The 64 features of each ResNet should be concatenated into a vector of 128 features per image, according to the inference model in eFigure 3B. A logistic regression should then be trained on these features to extract the final label.

Inference and decision
The previous stages should be done only once per cohort, cancer type, and molecule in question. Given H&E images of a test patient that was never seen by the system, the two trained ResNets should be applied to all images to extract 128 features from each. These features should be averaged to obtain one vector of 128 features. Then, the trained logistic regression should be applied to these features (eFigure 3B). The output of the logistic regression is a score r that denotes the probability of the molecule in question is expressed for this patient. The molecular expression should be predicted as negative for r < Tl, positive for Th < r, and inconclusive for Tl < r < Th. These thresholds should hold the condition 0 < Tl ≤ Th < 1, and should be determined according to the desired accuracy and screening process (see Methods and Results in the main text).

eMethods 5. Additional Details and Considerations Regarding the Train and Test Partitions
For the logistic regression, the performance was assessed by 10-fold cross-validation model, where in each fold the system was trained on 9 subsets of the patients and tested on the held-out patients. This partition setting exploited 9/10=90% of the data for training the model, while statistical outcomes were obtained and evaluated for all cases. The deep CNN was trained and asses by 6-fold cross validation, where in each fold, 5 groups were used to train the ResNet units. The trained ResNet units were applied to the remainder group to extract the features. Then, these features were used to train and test the logistic regression unit in an inner 10-fold cross-validation manner. This partition setting exploited 5/6=83.3% of the data for training the CNN model and 9/10*1/6 = 15% of the data for training the logistic regression on the CNN-based features. We randomly chose 3 out of the 6 folds and performed the training and inference process. Thus, statistical outcomes were obtained and evaluated for 3/6=50% of the cases. The performance of each of the presented systems was assessed on images of previously unseen patients. Extra care was taken to ensure that any data belonging to the test patients, including their H&E-stained TMA images, IHCstained TMA images, and biomarker annotations, were concealed from the systems during their training.  The H&E image is decoupled into red, green, and blue (RGB) channels such that each channel was a two-dimensional array. In the same manner, it is decoupled into hue, saturation, and value channels. Lastly, it is decoupled into hematoxylin and eosin channels, according to the color unmixing model previously described [4]. Overall, the H&E image is transformed into 8 two-dimensional images. (C) Gradient (magnitude) and Laplacian of Gaussians (LoG) are widely used operators that involve convolving the image with appropriate filters to produce another image that typically can be used for finding edges and extracting interesting structures like blobs. Here, each one of these two operators are applied to each one of the 8 channels to obtain an overall of 8*3=24 two-dimensional channel images (including the original 8 channels). Note that these channels inherit the superpixel segmentation of stage (A). (D) Unlike operators which map an image to an image, the local operators in this stage map each superpixel patch into a scalar value. We used 9 such operatorsmin, max, mean, median, standard deviation (STD), the standard first order histogram features skewness, energy, and entropy, and distance to nearest patch neighbor (between centroids). In this manner, 24*9 = 216 scalar values are extracted from each patch. (E) The features of each patch are integrated with the features of its 8 nearest patch neighbors (measured via distance between their centroids), by applying 6 operators -min, max, mean, median, sum, and STD, overall obtaining 216*6=1,296 scalar values per patch. (F) Finally, each feature is averaged across all patches to obtain a final vector of 1,296 scalar values that encodes the entire image.

eTable 1. Cut Points, Number of Patients, and Number of H&E Images
The molecular biomarkers in Cohort 2 (A) and Cohort 1 (B) are shown with their corresponding cut-off points. The number of patients participating in the experiment varies for different biomarkers and, together with the total number of H&E images, is indicated for each biomarker.The cut-off points were obtained from the papers describing expression of these biomarkers for Cohort 1 and Cohort 2 http://www.gpecimage.ubc.ca/. (ITILsintratumoral tumor-infiltrating lymphocytes. STILs -stromal tumor-infiltrating lymphocytes).