Evaluation of a Machine Learning Model Based on Pretreatment Symptoms and Electroencephalographic Features to Predict Outcomes of Antidepressant Treatment in Adults With Depression

Key Points Question Can machine learning models predict improvement of various depressive symptoms with antidepressant treatment based on pretreatment symptom scores and electroencephalographic measures? Findings In this prognostic study, using the machine learning approach of gradient-boosted decision trees, the ElecTreeScore algorithm could reliably distinguish the patients who responded to treatment from those who did not based on various depressive symptoms using pretreatment symptom scores and electroencephalographic features (using the cross-validation approach on 518 patients). Meaning Machine learning approaches that include pretreatment symptom scores and electroencephalographic features may help predict which depressive symptoms will improve with antidepressants.


Introduction
Major depressive disorder (MDD) is the second leading cause of years lived with disability worldwide, affecting 16 million adults in the United States each year. 1 Typically less than 50% of patients with MDD respond (Ն50% reduction in depressive symptoms) to their initial antidepressant medication and even fewer achieve remission (symptoms return to the healthy range). 2 Clinicians must decide for each patient whether antidepressant treatment is likely to increase the chances of response and ideally remission, weighing the benefits against the undesirable outcomes, including adverse effect burden. 3 The Hamilton Rating Scale for Depression (HRSD) is a widely used test to quantify the severity of illness in patients with a diagnosis of depression. 4,5The HRSD consists of 17 symptoms of depression-including loss of weight, thoughts of suicide, and feelings of guilt-which are rated on either a 3-point or 5-point scale, and 4 additional symptoms that are used to subtype depression but not to assess its severity.Most studies of depression sum all of the 17 symptoms to a single score for assessing severity of depression, treating depression as a single, unidimensional, condition. 68][9] Two individuals with equal HRSD total scores may have very different clinical conditions 10 ; specific depressive symptoms such as sad mood, insomnia, and suicidal ideation may be understood as distinct phenomena that differ from each other in important dimensions.2][13][14][15] However, we lack an understanding of whether EEG biomarkers predict improvement in specific clinical symptoms as well as robust toolkits to use in making such predictions. 10,16,17derstanding the association between EEG-recorded neural activity and response to antidepressant medication for patients with MDD has long been a topic of inquiry.Prior studies have highlighted the relevance of particular EEG frequency bands in antidepressant response.For example, patients who did not respond to antidepressants have been characterized by relatively elevated theta power at rest, 18,19 although the reverse outcome of relative reduced theta has also been observed. 20Using source localization, theta activity relevant to predicting response among those taking fluoxetine hydrochloride or venlafaxine hydrochloride has been localized to the rostral anterior cingulate and medial orbitofrontal regions. 14A distinct profile of alpha power has been associated with antidepressant response.For example, response (rather than nonresponse) to antidepressants has been associated with elevated alpha source density. 11Other lines of investigation have examined metrics for quantifying alpha asymmetry.Although there is evidence that relatively greater right-sided alpha distinguishes patients who responded to antidepressants from those who did not, 21 other studies observe such an alpha asymmetry effect only in women with depression. 22though, to our knowledge, there is little work using EEG biomarkers to probe drug-specific antidepressant effects, one analysis from the International Study to Predict Optimized Treatment in Depression (iSPOT-D) indicated that abnormalities in EEG peak alpha may be alleviated by sertraline hydrochloride in particular. 23By contrast, alpha peak frequency may predict a poorer response among patients taking escitalopram oxlate and extended-release venlafaxine hydrochloride. 24other study using data gathered by CAN-BIND (Canadian Biomarker Integration Network for Depression) found that the patients who responded to escitalopram were identified by elevated absolute alpha and relative delta power in the left hemisphere, whereas the patients who did not respond to escitalopram showed the opposite. 25Machine learning methods have been used to identify EEG features predictive of symptom response to other psychoactive drugs, such as clozapine. 26These studies show that EEG features are not only useful for predicting improvement in general but may also be useful differential predictors of improvement.
In this study, we developed the ElecTreeScore algorithm, a machine learning model to predict the treatment response of antidepressant medications for each symptom of the HRSD based on pretreatment EEG in addition to symptom severity.We developed the ElecTreeScore using data from iSPOT-D, 27 which has a sufficiently large sample to obtain reliable associations between EEG markers and individual symptoms, and validated the predictive performance of the machine learning model on a holdout test set.We investigated the most important HRSD and EEG features for the prediction and the outcome of depression using the HRSD and EEG features in combination vs using either alone.This approach afforded the opportunity to identify the association of baseline symptoms and EEG features and to evaluate the extent to which EEG features are associated with depression over and above symptom severity.Drawing on prior findings from the application of EEG in characterizing antidepressant response, our study investigated whether a machine learning approach, using gradient-boosted decision trees (GBDTs), could accurately predict acute improvement in individual depressive symptoms with antidepressants based on pretreatment symptom scores and EEG.

Methods
The study was approved by each site's governing institutional review board

Data Set
The data set used in this study was collected as part of iSPOT-D, an international multicenter, Because a pragmatic design was used to deliberately mimic real-world practice in which the goal is to select among active treatments, no placebo control was included.
At the baseline and week 8 clinic visits, the severity of the participant's depressive symptoms was rated on the 21-symptom HRSD (HRSD-21).Study clinical personnel made the ratings based on the participant's reported information during a semistructured interview.Ten of the HRSD-21 symptoms are rated on a 5-point scale (0 = absent; 1 = doubtful or mild; 2 = mild to moderate; 3 = moderate to severe; and 4 = very severe), while the other 11 symptoms are rated on a 3-point scale (0 = absent; 1 = doubtful or mild; and 2 = clearly present).
In addition, electrophysiological measures were also acquired; resting-state EEG was recorded for 2 minutes while participants were relaxed with eyes closed and eyes open.
Electroencephalograms were continuously recorded from 26 sites in 5 regions (frontal, temporal, central, parietal, and occipital) with a NuAmps system (Compumedics) and QuickCap (Compumedics).For each site, we computed absolute and relative band powers for the delta, theta, alpha, beta, and gamma bands.
The data available for the study were from the first 1008 participants with MDD, of whom we excluded those who dropped out (n = 286), those with missing EEGs (n = 125), and those with missing features (n = 79).Previously published work using the iSPOT-D data set has shown that there are no significant differences in attrition across treatment groups 29 and no significant differences in baseline HRSD scores between those who completed the study and those who dropped out. 30 The flow of patients for the resulting data set (n = 518) is summarized in Figure 1.The statistics for the HRSD score at baseline and after treatment are shown in eTable 1 of the Supplement.The iSPOT-D study was approved by the institutional review boards at all of the participating sites, and the associated trial was registered with ClinicalTrials.gov(NCT00693849).

Symptom Improvement Prediction
Our primary objective was to predict improvement in individual symptoms, defined as the difference in score for each of the symptoms on the HRSD-21 report from the baseline visit to the week 8 clinical visit using pretreatment EEG features.We first extracted electrophysiological features from the raw EEGs recorded at the baseline visit and then developed a machine learning approach for the prediction task.

Extracting EEG Features
Pretreatment EEG recordings at the baseline visit were processed to generate EEG features.Data on the power of the EEG signals in each frequency range at each electrode site were extracted using the Welch method for spectral density estimation.Specifically, the Welch method was carried out by dividing the EEG signal into successive overlapping windows forming the periodogram for each block and then averaging; the Hanning window was chosen to reduce the side-lobe level in the spectral density estimate, with an overlap of 50% to tradeoff between frequency resolution and smoothness.
At each electrode, the absolute power and the relative power were computed using the Simpson rule for the frequency ranges of delta (0.5-4 Hz), theta (4-8 Hz), alpha (8-12 Hz), beta (12-30 Hz), and gamma (30-100 Hz).Two additional features were computed: a frontal alpha asymmetry feature by subtracting alpha power for a left scalp site (F3) from the homologous right site (F4) and a betaalpha ratio feature by taking the ratio of the beta features at each of the sites with the corresponding alpha features.Furthermore, power features were optionally filtered to only include occipital sites (O1, Oz, and O2) and/or frontal sites (F7, F3, Fz, F4, and F8).

ElecTreeScore Algorithm
We developed ElecTreeScore, a machine learning model using GBDTs for the task of predicting improvement in individual symptoms using pretreatment EEG and baseline HRSD scores.Gradientboosted decision trees are a type of machine learning model that can capture nonlinear associations in data that traditional linear models are unable to capture and can handle mixes of categorical and continuous covariates. 31The training procedure for GBDTs involves the construction of an ensemble of decision trees such that each tree learns from the errors of the prior tree to iteratively improve predictions. 32Concretely, with each iteration, a new tree is constructed by sampling from the data and first identifying which variable most effectively divides the members into groups with low within-group variation in symptom improvement and high between-group variation in symptom improvement; then, the variable selection process is repeated to further divide each resulting subset of the data, producing a series of branches in the decision tree.The next tree is fit using the same process on the residuals of the previous learner.The implementation details for the model are detailed in the eAppendix in the Supplement.
We trained GBDTs for each of the 21 HRSD categories across several possible combinations of both input features and parameters for the model.Models were trained on valid combinations of EEG bands, relative and absolute power for frequency bands, electrode site-specific features, and asymmetry features.The combination process first chooses whether to use relative or absolute power, then iterates over combinations of EEG bands, including alpha, beta, delta, theta, and gamma bands (1 possible selection is choosing only alpha and beta bands).Finally, the process iterates over regions where EEG bands are obtained, namely the frontal and occipital regions.After the EEG feature selection process, a list of input features, such as "Fz alpha absolute," were chosen by the algorithm.We use terms such as "Fz alpha absolute" as abbreviations to communicate which regions, bands, and power metric (absolute or relative) are reported in the results.Coupled with the input feature search is a grid search across GBDTs parameters, including the number of estimators, the maximum depth of each tree, and the number of leaves.The possible combinations of both input features and parameters for the models, as well as the details for the stratified k-fold validation, are detailed in the eAppendix in the Supplement.

Statistical Analysis
Statistical analysis was conducted from January 5 to June 30, 2019.We evaluated the performance of the improvement prediction models on their discriminative ability.Discrimination measures a predictor's ability to separate patients with different responses.The C index, a widely applicable measure of predictive discrimination and a generalization of the area under the receiver operating characteristic curve statistic, is defined as the proportion of all usable patient pairs in which the predictions and outcomes are concordant. 33Concretely, the interpretation of the C index is the probability that the algorithm will correctly identify, given 2 random patients with different improvement levels, which patient showed greater improvement.We also reported model goodness of fit using the coefficient of determination (R 2 ) and the mean absolute error using output after model calibration.The calibration is computed between training outputs of GBDT and the corresponding ground truth value.A linear regression with an L1 regularization coefficient of 0.01

JAMA Network Open | Psychiatry
was chosen to be the calibration model.We have also reported model calibration using regression slope and intercept.We computed 95% CIs for these metrics using the nonparametric bootstrap with 1000 bootstrap replicates.
The model was trained and validated using k-fold-stratified cross-validation with k set to 5. In this procedure, the data set was randomly partitioned into 5 equally sized subsamples (with no patient overlap) consisting of an approximately equal percentage of each class.In the crossvalidation procedure, of the k subsamples, a single subsample was retained as the validation data for testing the model, and the remaining k − 1 subsamples were used as training data.The crossvalidation process was then repeated k times, with each of the k subsamples used exactly once as the validation data.The predictions on the k subsamples were then pooled, and the C index was computed; we assessed the variability in our estimates of the C index by using the nonparametric bootstrap with 1000 bootstrap replicates on the pooled cohort.

Feature Importances
We used SHAP (Shapley Additive Explanations) to quantify the effect of each feature on the models. 34Shapley values explain a prediction by allocating credit among the various input features (such as "Fz alpha absolute," interpreted as "absolute alpha bandpower at the medial frontal [Fz]   site"); feature credit is calculated as the change in the expected value of the model's prediction of improvement for a symptom when a feature is observed vs unknown.To uncover clinically important EEG features that were globally predictive of the improvement for each of the individual symptoms on the HRSD, we aggregated the Shapley values for features on individual predictions and reported the top features per model along with their averaged Shapley contributions as a percentage of the associations of all the features.

Using Both EEG and Baseline Symptoms vs Using Baseline Symptoms Alone
We assessed whether the combination of baseline symptom scores and EEG features provide additional predictive value for symptom improvement compared with the baseline symptom scores alone.Thus, for each symptom, we trained additional models that used only the baseline symptom scores as input.We computed the increase in the C index of the default (EEG + HRSD) models compared with models that contained only baseline symptom scores.

Incorporation of Treatment Group
As an exploratory analysis, we assessed whether the incorporation of the treatment group would increase the performance of the models in the prediction of symptom improvement.For each item, we retrained the model with inclusion of 3 binary features indicating the presence of each treatment, using the same EEG input features as in the model without the treatment group, and tuning the model across the same grid search parameters.We computed the difference in the C index of the models with and without the additional treatment features.
Our implementation used Python, version 3.

Results
The

Feature Importance
The most important feature for each symptom was the score of that symptom at baseline.The importance of the baseline symptom score was higher than 20% on all symptoms, with the highest association for waking early (64.3%), and lowest association for depressed mood (23.2%) (Table 2).
On 10 symptoms, prediction of improvement in a particular symptom involved associations from other symptoms as 1 of the 3 most important features, with the highest association of nighttime awakening (9.2% importance) with the prediction of improvement on the obsessive thoughts symptom.
The importance of any single EEG feature was higher than 5% for prediction of 7 symptoms (trouble sleeping, weight loss, agitation, worrying, obsessive thoughts, health preoccupation, and loss of insight), indicating the potential independent associations of pretreatment EEG.The most important EEG features were the absolute delta band power at the occipital electrode sites (O1, 18.8%; and Oz, 6.7%) for loss of insight (Table 2).Other notable EEG features included absolute occipital (O1) theta power for predicting improvement in obsessive thoughts (7.3%), relative central (C4) theta power for improvement in health preoccupation (6.8%), absolute temporal (T7 and T3) alpha power for improvement in trouble sleeping (6.7%), absolute occipital (Oz) alpha power for

Using Both EEG and Baseline Symptoms vs Using Baseline Symptoms Alone
Over and above the use of baseline symptom scores alone, the use of both EEG and baseline symptom features produced a significant increase in the C index for improvement in 4 symptoms, including energy loss (C index increase, 0.035 [95% CI, 0.011-0.059]),appetite changes (C index increase, 0.017 [95% CI, 0.003-0.030]),psychomotor retardation (C index increase, 0.020 [95% CI, 0.008-0.032]),and loss of insight (C index increase, 0.012 [95% CI, 0.001-0.020])(Table 3).On the R 2 metric, for loss of insight, use of both EEG and baseline symptom features produced an R 2 of 0.551 (95% CI, 0.473-0.639),significantly higher than the R 2 of 0.375 (95% CI, 0.31-0.448)produced by the use of the baseline symptom features alone.The differences for individual symptoms are reported in Table 3 and the absolute performances under both conditions are detailed in eTables 2, 3, 4, 5, 6, and 7 in the Supplement.

Association of Treatment Group
There was no significant increase detected in the C index of any of the 21 items with the inclusion of the treatment group feature.The performances of the models for individual symptoms are reported in the eFigure and eTable 8 in the Supplement.

Discussion
In this study, we developed a learning algorithm, ElecTreeScore, to evaluate the association of objective EEG measures acquired before treatment with the prediction of acute antidepressant response for individual symptoms of depression.Under this approach, we took into account the important associations between baseline severity and treatment-associated change in symptoms and considered the association of EEG features in their own right and to what extent EEG features have a meaningful association with outcomes in addition to symptom severity.
Our machine learning approach resulted in 3 main findings.First, we found that different specific topologic characteristics and frequencies of neural activity assessed by the EEG were important for the prediction of antidepressant-associated improvement in specific symptoms in models with high discriminative performance.Second, although we found that baseline scores for individual symptoms of depression are strong predictors by themselves, as expected, we also found

HRSD baseline scores EEG features
Left, the electroencephalographic (EEG) features and Hamilton Rating Scale for Depression (HRSD) baseline features for the test patient at baseline.Four of the HRSD features and 4 of the EEG features are depicted as examples.Right, one of the decision trees used by ElecTreeScore to make its prediction.The light gray boxes correspond to decision points where left branches are followed when the feature value is smaller than the decision boundary, while right branches are followed when the feature value is larger than the decision boundary.The other boxes that are different, darker shades of gray correspond to the level of treatment response predicted by the model.The categories of "none," "low," "medium," and "high" are used for the purposes of visualizing and communicating the results, without losing the essence of the statistical findings.
that EEG features add 5% or more in importance to the discriminative performance for 7 of the symptoms: trouble sleeping, weight loss, agitation, worrying, obsessive thoughts, health preoccupation, and loss of insight.Third, we demonstrated the value of the pretreatment EEG features in predicting improvement in a subset of specific depressive symptoms-loss of insight, energy loss, appetite changes, and psychomotor retardation-significantly better than with pretreatment symptom severity alone.
As expected, the most important feature was the score of the symptom at baseline, as seen when comparing the discriminative performance of training on only the EEG features and adding in the HRSD survey scores as inputs.However, our machine learning model suggests that EEG features are meaningfully associated with predicting individual symptom improvement both in combination with baseline symptom severity and over and above symptom severity as independent predictors.To identify independent predictors, we evaluated the addition of EEG features to baseline symptom severity and, in this model, 4 categories saw a significant increase in discriminative power: energy loss, psychomotor retardation, appetite changes, and loss of insight.Previous studies, with few exceptions, 35 have focused on using EEG features to predict response or remission, which are defined by differences in summed symptom scores, 24,36,37 and have yielded mixed outcomes. 20,22ectroencephalographic features that predict the change in summed symptom scores may not be replicated across populations of depression in which the primary depressive symptoms are highly heterogenous; thus, our findings offer an indication that the use of individual symptoms may be one means to address the replication gap in evaluating the potential value of EEG biomarkers of treatment outcomes in future studies.This approach might also help determine if EEG features add value to the previous suggestion that symptoms may have a differential rate of improvement. 10,13r results expand our growing knowledge of the neurobiology of depression by revealing the relative importance of specific EEG markers in predicting treatment-associated changes in specific symptom domains beyond the association of baseline symptoms alone.In particular, we observed that prediction of treatment-associated changes in psychomotor retardation, energy loss, appetite a Positive means that performance was higher with both sets of features included.
changes, and loss of insight are improved significantly with the inclusion of EEG features, with parietal alpha power providing the largest association for psychomotor retardation, parietal delta power providing the largest association for energy loss, frontal alpha power providing the largest association for appetite changes, and occipital delta power providing the largest association for loss of insight.These associations of baseline EEG markers build on findings for the implication of EEG marker abnormalities in depression and point to future lines of investigation for treatment trials.For example, hedonic hunger signals and altered eating behaviors have been previously associated with frontal alpha power 38 ; our finding that occipital delta power is substantially associated with improvement in the symptom of loss of insight is in accordance with prior work showing altered delta power in depression. 39,40Loss of insight is implicated in higher risks of suicide and self-harm 41 and delayed treatment seeking [42][43][44][45] ; in this context, we speculate that knowing about pretreatment delta power might be of use in identifying an important feature for treatment in patients at risk of a poor prognosis.Energy loss and psychomotor retardation are also implicated in anhedonic forms of depression that have a poor prognosis.Together, these findings suggest that changes in specific pretreatment EEG features are not just implicated in the pathophysiological characteristics of depression, but may be associated with antidepressant response in specific symptoms.Our models therefore generate testable hypotheses about the potential mechanisms of symptom change over time that may be tested in future studies.
In our exploratory analyses, we did not find evidence that the inclusion of treatment group significantly improved model performance.This finding suggests that the EEG markers associated with changes in symptom scores were general predictors of treatment outcome rather than differentiating response among the treatment types.In a previous functional neuroimaging study of a subset of this sample, resting-state predictors were also robust, general predictors of treatment outcome. 46By contrast, specific task-evoked markers have been found to be differential predictors of response to different treatments. 47,48Therefore, future studies may investigate task-evoked EEG markers in determining differential treatment response.
Although EEG offers one of the most proximal measures of neural function, there have been barriers to its use as a pertinent objective predictor of antidepressant response.6][37] A recent meta-analysis reported that only 6 of 71 studies of EEG markers and antidepressant outcomes were studied with cross-validation or another out-of-sample verification. 17As the field develops, and the opportunity for acquiring larger samples becomes feasible, we can further address the understandable power constraints of these foundational studies.Prior treatment studies have also understandably focused on response outcomes based on averaged symptom ratings.It is notable that prediction by EEG markers in our model was specific to individual symptoms.Evaluation of individual symptoms (rather than summed severity scores) may thus be valuable in the future application of machine learning with biomarkers such as those derived from EEG recordings.Because direct symptom measurement is increasingly included as a routine part of clinical psychiatry, 49 it is feasible to consider how clinicians of the future will have access to symptom profiles linked to biomarkers through machine learning algorithms.A first-use case might be for detection of high-risk patients; for example, those with symptoms such as loss of touch with reality (loss of insight, and unreality and nihilism) are included in primary care guidelines as an indication of elevated suicide risk 41 and for which same-day mental health care is recommended.
Regarding clinical applications in treatment management, our models provide a first proof of principle that noninvasive neurobiological markers and pretreatment symptom assessments may be used to determine whether specific symptom domains are likely to persist with standard antidepressant treatment.Currently, only approximately 30% of patients recover with the first antidepressant treatment attempted, and approximately only one-half of patients show some symptom response. 29Physicians lack algorithmic support for determining who will respond to available treatments, as well as a means to select between them.To reduce patients' burden of trying JAMA Network Open | Psychiatry multiple rounds of unsuccessful treatments (often associated with worsening of symptom severity), models such as ours, when validated in prospective clinical settings, could be used to predict outcomes ahead of time.Future studies may attempt to recruit individuals with a more constrained definition of baseline severity in specific symptom domains (eg, balanced samples with exceptionally high or low scores on 1 symptom domain) to determine more directly the maximum additional benefit of EEG markers once the variance of baseline severity has been more constrained.These results bring us closer to a future of using predictive models to guide individualized treatment strategies on the basis of specific symptom domains in combination with objective markers.

Limitations
This study has several important limitations.First, we only explored the interactions of markers from EEGs recorded with eyes closed; this decision was based on previous literature, but using EEGs recorded with eyes open is an area of further investigation.Second, while we found that the model was a general predictor of response across treatments, we did not perform a subgroup analysis of performance on each treatment or analyze the performance of models separately trained on each treatment, which may be able to capture adverse effects associated with certain antidepressants.
Third, we did not evaluate the performance of our algorithm for other treatments for depression (such as repetitive transcranial magnetic stimulation, for which EEG markers may also be able to predict response) 17 or for treatments that add a second medication to an initial, ineffective antidepressant drug. 42Fourth, the absence of a placebo means that we are unable to determine with our present models whether the changes in symptoms observed are specifically caused by the antidepressant treatments used, but future studies may use our modeling approach to address this possibility in placebo-controlled trials.Fifth, our models have been validated retrospectively, and on the same data set (iSPOT-D) that the model has been developed, necessarily given the limited availability of large data sets with pretreatment EEG recordings with associated pretreatment and posttreatment scores.Future studies should investigate the utility of ElecTreeScore in prospective data sets to advance the translational goal of application for clinical use.

Conclusions
A machine learning model was developed to predict improvement of specific symptoms associated with antidepressants using symptom ratings and EEG measures acquired at the pretreatment baseline.We found that the model had high discriminative performance for identifying improvement in specific symptoms, reflected in high C index scores of 0.8 or higher on 12 of 21 clinician-rated symptoms.The most important feature in the prediction of symptom improvements was the symptom score at baseline, whereas EEG features had smaller but meaningful associations with the prediction of specific symptom improvements.Overall, our findings build on prior work in 2 key ways: first, by demonstrating that predictive models can capitalize on established roles for using EEG markers to quantify neural activity in psychiatric illness to predict treatment-associated changes over time, and second, by explicitly using individual symptoms as independent outcome variables, to parse the extreme heterogeneity of major depression.Future work should investigate the performance of this model prospectively and in application of independent samples and clinical settings.

Figure 1
Figure 1.Patient Flow Diagram 1008 Patients assessed

Figure 2 .
Figure 2. The ElecTreeScore Algorithm Applied to a Sample Patient in the Data Set to Predict Level of Improvement of the Loss of Insight Depressive Symptom

Downloaded From: https://jamanetwork.com/ on 09/29/2023
The machine learning model achieved C index scores, indicative of discriminative performance, of 0.8 or higher on 12 of 21 clinician-rated symptoms.The highest C index scores for prediction of improvement were for the following symptoms: loss of insight (C index, 0.963 [95% CI 0.939-1.000]),unrealityandnihilism(Cindex,0.951[95% CI, 0.932-0.976]),andweightloss(Cindex,0.923[95% CI, 0.896-0.953])(Table2).The lowest C index scores were for the following machine learning model on each symptom are detailed in Table2.An example of the machine learning model applied to a sample patient in the data set is illustrated in Figure2.

Table 1 .
Distribution of the Improvement Outcome (Symptom Score at Week 8 Minus Symptom Score at Baseline) on Each of 21 Symptoms on the HRSD-21 Report in the Data Set Set a Negative values for mean magnitude change are indicative of improvement in symptoms.

Table 2 .
Performance of Machine Learning Model on Predicting the Improvement for Each Symptom of the HRSD-21 Depression Assessment Scale Using Pretreatment EEG Features and Baseline HRSD-21 Scores (continued) a The 3 most important features for each model, and their relative contributions computed using Shapley values, are reported.

Table 3 .
Difference in C Index on the Prediction Task Using Combinations of HRSD-21 and EEG Features a Tozzi L, Goldstein-Piekarski AN, Korgaonkar MS, Williams LM.Connectivity of the cognitive control network during response inhibition as a predictive and response biomarker in major depression: evidence from a randomized clinical trial.Biol Psychiatry.2020;87(5):462-472.doi:10.1016/j.biopsych.2019.08.005 49.Kroenke K, Spitzer RL, Williams JBW.The PHQ-9: validity of a brief depression severity measure.J Gen Intern Med.2001;16(9):606-613.doi:10.1046/j.1525-1497.2001.016009606.xMean and Standard Deviation Scores for Each of the 21 Items on the HRSD21 Report at the Baseline Visit, the Week 8 Clinical Visit on the Entire Dataset eTable 2. The C-Indices of the Machine Learning Models on the Improvement Prediction (Reduction in HRSD Score) Using Baseline HRSD Features With and Without the EEG Features (Positive Means That Performance Was Higher With the EEG Features Included) eTable 3. The C-Indices of the Machine Learning Models on the Improvement Prediction Task (Reduction in HRSD Score) Using Baseline EEG Features With and Without HRSD Features (Positive Means That Performance Was Higher With the HRSD Features Included) eTable 4. Comparison of R2 Score Computed on Calibrated Machine Learning Model Predictions eTable 5. Comparison of MAE Score Computed on Calibrated Machine Learning Model Predictions eTable 6.Comparison of Regression Slope Computed on Calibrated Machine Learning Model Predictions eTable 7. Comparison of Regression Intercept Computed on Calibrated Machine Learning Model Predictions eTable 8. Short Notation of HRSD Targets, Used in eFigure eFigure.Visualization of Comparison of Confidence Interval Between Models That Use One-Hot Encoded Treatment Arm as Input and Models That Do Not Use Treatment Information eAppendix.Supplementary Information 48.