Reinforcement Learning Disruptions in Individuals With Depression and Sensitivity to Symptom Change Following Cognitive Behavioral Therapy

Key Points Question Are depression symptoms associated with features of reinforcement learning, and if so, is treatment-related symptom change associated with learning changes? Findings In this mixed cross-sectional–cohort study including 101 participants, participants with and without depression completed a probabilistic learning task during functional magnetic resonance imaging; participants with depression were reassessed after cognitive behavioral therapy (CBT). Computational model–based analyses of behavioral choices and neural data identified associations of learning with symptoms during reward learning and loss learning, respectively; symptom improvement following CBT was associated with normalization of learning parameters. Meaning Mapping reinforcement learning processes to symptoms of depression reveals mechanistic features of these symptoms and points to possible learning-based therapeutic processes and targets.

M ajor depressive disorder affects approximately 7% of people in the US each year 1 and is among the highest causes of disability in the world. 2 However, characterizing and treating major depressive disorder is hampered by significant symptom heterogeneity. 3 Recent paradigms 4 suggest that moving beyond diagnostic status to focus on associations of major depression's central impairments of anhedonia and negative affect 5 with neurocomputational substrates of reinforcement learning 6,7 may more precisely identify disrupted processes in individuals with depression and novel treatment targets. To that end, we sought to investigate the association of computational model-derived learning impairments with canonical depression symptoms and tested the translational relevance of these impairments by examining their responsiveness to symptom change after cognitive behavioral therapy (CBT).
According to computational formalizations of reinforcement learning, expectations about the outcomes of choices are updated based on prediction errors. [7][8][9] This framework separates learning into computationally derived components associated with behaviorally and neurobiologically distinguishable processes (eg, outcome valuation vs expectation updating 10,11 ). Computational model-based analyses differentiate and quantify these learning processes as model parameters that may then be associated with symptoms at the individual level. For depression, this approach has the potential to identify sources of disrupted responsivity to rewards and losses, including altered value updating following feedback, relative valuation of positive or negative feedback, and changes in overall valuation of outcomes. 6 Such learning processes may be affected by both stimulus valence (eg, learning from rewards vs losses) and depression symptoms. 6,[12][13][14][15][16][17][18][19][20][21] With regard to the canonical symptoms of depression, anhedonia (ie, reduced experience of pleasure) affects reward learning more than depression as a whole, 10,[22][23][24][25] while negative affect, characterized by subjective distress and negative cognitions, may be associated with altered loss and error processing. 12,13,18,20 This link between symptom clusters and neurobehavioral alterations is consistent with other work showing symptom, not diagnosis, effects. 26-28 Initial findings combining these literatures 13,17,18,29 suggest valence-dependent roles of learning anomalies in depression, but to our knowledge, no study has fully examined which reward learning and loss learning processes are associated with the core depressive symptoms of anhedonia and negative affect.
Demonstrating sensitivity to symptom change is critical to establishing the translational relevance of biobehavioral markers of psychiatric illness. 30,31 Some evidence suggests that successful depression treatment may normalize reward responses and, in youth, reduce overresponsivity to punishments, 32-34 but how these changes are associated with baseline impairments and whether they map onto mechanistic learning processes are unclear. To address these issues, we examined participants with depression who engaged in CBT, an efficacious psychotherapy theorized to reduce symptoms in part through changing learning, 35-37 and tested whether symptom improvement following CBT was associated with changes in learning components. Given previous work indicating correlated decreases in all symptom measures after CBT, 38,39 these analyses focused on learning parameter changes associated with overall symptom change rather than specific symptom subscales following CBT.
To summarize, we examined participants with and without a depression diagnosis performing reward and loss variants of a learning task while undergoing functional magnetic resonance imaging; a subset of the participants with depression was retested after completing CBT. We hypothesized that distinct processes in reward and loss learning, captured by computational model-derived parameters measuring aspects of updating and valuation and their corresponding neural signals, would be associated with symptoms of anhedonia and negative affect, respectively. Moreover, we posited that changes in these reward and loss learning parameters would be correlated with symptom improvement after treatment.

Study Design and Participants
A total of 101 participants were recruited via community advertisements from southwest Virginia and Houston, Texas. The Baylor College of Medicine and Virginia Tech institutional review boards approved study procedures, and all participants provided written informed consent after receiving a complete description of the study. A total of 69 participants with depression had a primary DSM-IV 40 diagnosis of major depressive disorder or dysthymia, assessed with the Structured Clinical Interview for DSM-IV 41 ; 32 nonpsychiatric control participants had no history of any DSM disorder. Participants completed a battery of measures, including the Mood and Anxiety Symptom Questionnaire (MASQ), 42 a validated self-report measure of symptom clusters of anhedonia (anhedonic depression subscale), negative affect (general distress subscale), and arousal (anxious

Key Points
Question Are depression symptoms associated with features of reinforcement learning, and if so, is treatment-related symptom change associated with learning changes?
Findings In this mixed cross-sectional-cohort study including 101 participants, participants with and without depression completed a probabilistic learning task during functional magnetic resonance imaging; participants with depression were reassessed after cognitive behavioral therapy (CBT). Computational model-based analyses of behavioral choices and neural data identified associations of learning with symptoms during reward learning and loss learning, respectively; symptom improvement following CBT was associated with normalization of learning parameters.
Meaning Mapping reinforcement learning processes to symptoms of depression reveals mechanistic features of these symptoms and points to possible learning-based therapeutic processes and targets.
arousal subscale), which were the primary symptoms of interest, as well as the Beck Depression Inventory-II 43 to assess overall depression severity, the Wechsler Test of Adult Reading 44 to estimate verbal IQ, and a demographics questionnaire. The eMethods and eTable 8 in the Supplement contains further details about participants and measures. This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.

Reinforcement Learning Task
Participants completed reward and loss variants of a probabilistic operant learning task ( Figure 1A) with the goal of learning which of 2 options was more likely to lead to a higher outcome (larger reward in reward learning blocks or smaller loss in loss learning blocks; Figure 1B) 45 while undergoing functional magnetic resonance imaging. The task was presented in pseudorandomized blocks of trials consisting of all reward outcomes or all loss outcomes (learning curves by baseline symptoms shown in Figure 1C). The eMethods in the Supplement contains further task design details.

Behavioral Analyses at Baseline
Model-based analyses used reinforcement learning models to test hypotheses about potential sources of learning disruptions in participants with depression (updating, relative valuation of more negative or positive outcomes, and overall valuation changes 6,10,46 ). The best-fitting model separated learning by valence (reward vs loss) and included 3 free parameters for both reward learning and loss learning: learning rate (α), which indexed the degree of updating based on prediction error; outcome sensitivity (ρ), which multiplicatively scaled more extreme outcome values,

Symptoms
Medium High A, Overall task structure. Participants completed trials of the same valence (reward learning or loss learning) with stimuli remaining consistent throughout a block. Once participants had learned stimulus contingencies for a block, a new block began with different stimuli and the other valence (loss or reward). The task ended when participants had at least 25 correct trials and 50 total trials for both reward learning and loss learning (median number of trials completed, 50). B, Schematic depiction of reward learning and loss learning trials. On each trial, participants were presented with 2 abstract stimuli. After choosing a stimulus, the chosen option was highlighted for a brief period and then an outcome (monetary reward or loss) was shown. Participants learned which option led to the better (75% probability of high reward or low loss) outcome.
The task involved blocks consisting of trials with all reward (top) and all loss (bottom) outcomes. C, Reward learning and loss learning performance by symptom severity among all 101 participants. Performance was quantified as proportion of choices that were the better option. Over time, participants showed learning (running mean over 3 trials; averaged over all blocks by valence). Top panels comprise reward learning blocks while bottom panels comprise loss learning blocks. Behavior is separated by anhedonia, negative affect, and anxious arousal symptom severity, with participants with symptoms in the lowest tercile marked by a solid line, the middle tercile by a dashed line, and the highest tercile with a dotted line. The lines indicate mean scores, and the shaded areas indicate SEs. resulting in differential valuation of large vs small outcome values; and outcome shift (τ), which linearly shifted all outcome values, resulting in an overall positive or negative valuation bias. Model validation showed that parameters could be independently estimated, were associated with model-agnostic behavior, had good split-half and test-retest reliability, and were stable over time in participants without depression (eMethods and eFigures 2 to 4 and 12 in the Supplement).

Baseline Imaging Analyses
Parametric regressors of interest in first-level imaging analyses used prediction error δ t at outcome and chosen ex-jpected value Q t at stimulus onset. Analyses focused on meta-analytically-defined 47 regions of interest (ROIs) in ventral striatum and ventromedial prefrontal cortex/subgenual anterior cingulate cortex, primary brain regions implicated in reinforcement learning. 48 Functional magnetic resonance imaging data collection, preprocessing, and further analysis information are contained in the eMethods in the Supplement.

Behavioral Analyses of Changes in Symptoms and Learning Parameters Following Cognitive Behavioral Therapy
A total of 28 participants with depression who elected to engage in a 12-week course of standard, manual-guided CBT 49 were assessed following CBT completion with the same procedures described above (eFigure 10 in the Supplement for a diagram of participant flow). These analyses assessed associations between symptom improvement, well-established with CBT, 35,50 and learning parameters. Further details on CBT, CBT-related analyses, and symptom-independent parameter change (including stability of parameters in 20 nonpsychiatric control participants) can be found in eMethods and eFigures 8 and 9 in the Supplement.

Statistical Analysis
Associations between symptoms and learning parameters were estimated within models fit to participants' choices. Models were fit using hierarchical Bayesian estimation 51 with data from all participants in the pertinent analysis; significant associations between symptoms and learning parameters were defined as a 95% posterior credible interval (CrI) of the regression coefficient for the association of symptom with learning parameter excluding 0, analogous to a frequentist α of .05. The posterior mean, the posterior mean divided by the posterior standard deviation (approximate standardized regression β), and 95% CrIs are reported. To control for false-positive rates, all associations were assessed at baseline (and similarly for treatment analyses) using hierarchical modeling 52 ; additional Bayesian error control 53 approaches restricted experimentwise error to 5%. Simulation-based power analyses, assuming 80% power, indicated we were powered to detect small effect sizes in regression analyses between symptom severity measures and learning parameters involving all 101 participants and medium effect sizes in analyses of 69 participants with depression only. See the eMethods in the Supplement for modeling details.
To test the association of neural activation with symptom measures, regressor-related blood oxygenation leveldependent activity (ROI values) was correlated with symptom measures using linear regressions and Pearson correlations. Following previous literature, for reward learning, the moderation of expected value and prediction errorrelated activity by symptoms was evaluated by testing the interaction of prediction error neural signal and symptoms on expected value neural signal in striatal ROIs. 25 Frequentist analyses used an α level of .05 from 2-tailed tests.
Effects of symptom change following CBT (time × symptom analysis) were assessed as the association between changes in learning parameters and symptom changes (improvement pretreatment to posttreatment, similar to a mixed-effects analysis with 2 time points) in the participants with depression, controlling for initial symptom severity. Like baseline analyses, the CrI (set to 90% for significance to reflect directionality of hypotheses), the mean value of this association, and the standardized mean value as a measure of effect size are reported. As symptom subscale changes with CBT are typically highly intercorrelated, 38,39 primary analyses focused on changes in learning parameters against changes in overall symptoms; significant associations were then examined with exploratory analyses within anhedonia, negative affect, and arousal subscales. Frequentist analyses involving symptom change used an α level of .05 and 1-tailed tests based on directional hypotheses. Analyses were carried out in R version 3.6.0 (The R Foundation) and Stan version 2.19 (using the rstan package in R).

Participant Characteristics at Baseline
Of 101 included adults, 69 (68.3%) were female, and the mean (SD) age was 34.4 (11.2) years. A total of 69 participants with a depression diagnosis and 32 participants without a depression diagnosis were included at baseline; 48 participants (28 with depression who received CBT and 20 without depression) were included at follow-up (mean [SD] of 115.1 [15.6] days). Clinical and demographic data are reported in Table 1. As expected, participants with a depression diagnosis had elevated symptoms but did not differ from participants without depression on estimated IQ, age, or self-reported gender.

Baseline Model-Based Analyses
We tested associations of computational model-derived learning parameters (learning rate, outcome sensitivity, and outcome shift) with symptom severity (MASQ subscales of anhedonia, negative affect, and anxious arousal, tested in separate regressions) during reward and loss learning ( Table 2). Follow-up analyses simultaneously assessed all 3 MASQ subscales in the same analysis and tested depression diagnosis and Beck Depression Inventory-II score as a measure of overall depression severity (eResults in the Supplement). Analyses were carried out across all participants and within participants with a diagnosis of depression only.

Reward Learning
Behavioral | During reward learning, in participants with depression, greater anhedonia was associated with reduced learning rate, indicating slower updating of reward values with increased anhedonia and greater outcome sensitivity parameter values (learning rate: posterior mean regression β = −0.14; 95% CrI, −0.12 to −0.03; outcome sensitivity: poste-rior mean regression β = 0.18; 95% CrI, 0.02 to 0.37; Figure 2A). These associations were apparent with anhedonia and absent with negative affect, arousal, or depression diagnosis. No measures were associated with changes in the outcome shift parameter. Results were similar when assessing all MASQ scales in the same analysis (eResults in the Supplement). When including participants without depression, these associations were not present (association of  Neural | Reward prediction error and expected value signals did not vary with any symptom measure or depression diagnosis (meta-analytically defined striatal or ventromedial prefrontal cortex ROIs or exploratory whole-brain analysis; eFigure 6 and eTables 1 to 4 in the Supplement). Following previous work, to assess if anhedonia disrupted associations among otherwise intact striatal signals, 25,54 we investigated associations between prediction error (at outcome) and expected value (at choice) in ventral striatum. In line with this previous work, anhedonia moderated the association between expected value and prediction error signals (interaction: t 97 = −2.10; P = .04; Figure 2B), which was not associated with learning rate differences.

Loss Learning
Behavioral | During loss learning, a different pattern of associations emerged such that negative affect severity was associated with more negative outcome shift parameter values, indicating more negative valuation of losses (outcome shift: posterior mean regression β = −0.11; 95% CrI, −0.20 to −0.01; Figure 2C). This association was present in all participants, regardless of depression diagnosis, and was not observed with anhedonia, arousal, or depression diagnosis. Associations were similar when assessing all MASQ scales simultaneously. No symptom subscales were associated with loss learning rate or outcome sensitivity parameters. Neural | Prediction error activity in the subgenual anterior cingulate cortex ROI was negatively associated with negative affect (r = −0.28; P = .005; Figure 2D), with no differences in striatal activity or expected value signals. Exploratory follow-up analyses (eMethods, eFigure 7, and eTables 5 to 7 in the Supplement) suggested reduced subgenual anterior cingulate cortex representation of outcome value in participants with high negative affect drove this association. Expected value and the association between expected value and prediction error were unrelated to symptom measures during loss learning.

Associations Between Learning Parameter Changes and Symptom Changes Following Cognitive Behavioral Therapy
The association of baseline symptoms with learning parameters suggested the translational potential of reinforcement learning processes beyond descriptive characterizations of depression. We thus sought to assess whether these altered learning processes were associated with symptom changes following CBT in participants with depression. As expected, after CBT, participants showed large mean decreases in all symptoms ( Figure 3A). Consistent with the literature, 55 participants showed heterogenous degrees of change, enabling investigation of individual differences in symptom change ( Figure 3A). As within-participant changes in symptom measures were highly correlated (eg, correlation between change in anhedonia and negative affect: r = 0.62; P < .001), analyses focused on overall improvement (summing anhedonia + negative affect + arousal scales; if significant, exploratory analyses, reported in eTable 9 in the Supplement, focused on individual subscale change) as associated with changes in learning parameters (outcome sensitivity, outcome shift, and learning rate during reward learning and loss learning).

Behavioral
As described above, at baseline, reward learning rate was negatively correlated and reward outcome sensitivity positively correlated with anhedonia. Increases in reward learning rate following CBT were significantly associated with overall symptom improvement, including improved anhedonia (reward learning rate: posterior mean regression β = 0.15; 90% CrI, 0.001 to 0.41; Figure 3B; Table 2). Changes in reward outcome sensitivity were not significantly associated with overall symptom improvement. Changes in reward outcome shift, unrelated to symptoms at baseline, were also not associated with symptom change.
During loss learning at baseline, the outcome shift learning parameter showed a negative association with negative affect. Increases in loss outcome shift after CBT were significantly associated with overall symptom improvement, including improved negative affect (loss outcome shift: posterior mean regression β = 0.42; 90% CrI, 0.09 to 0.77; Figure 3B; Table 2). Changes in loss learning rate and outcome sensitivity, which were unrelated to symptoms at baseline, were also not associated with changes in overall symptoms.

Neural
For reward learning, at baseline, anhedonia moderated associations between prediction error and expected value signaling in striatum; following CBT, participants with depression with high anhedonia showed a significant change in the correlation between ventral striatum signaling to prediction error and expected value from pretreatment to posttreatment (Fisher r to z = 1.65; 1-tailed P = .05; eFigure 11A in the Supplement). In participants without depression, this correlation was stable across time (z = 0.46; P = .65), and the overall interaction across all participants was significant (interaction of baseline anhedonia with the association between changes in expected value and prediction error signals in ventral striatum: t 43 = 1.85; 1-tailed P = .04), indicating a shift only in participants with high anhedonia. For loss learning, changes in prediction error signaling in subgenual anterior cingulate cortex (related to negative affect at baseline) were not associated with improvements in negative affect (eFigure 11B in the Supplement).

Discussion
Here, we used a computational model of reinforcement learning to distinguish among learning processes in participants with and without depression and showed, across neural and behavioral levels, associations of anhedonia with reduced updating but greater differentiation of rewards (captured by reward learning rate and outcome sensitivity parameters, respectively) and of negative affect with more negative valuation of losses (captured by the loss outcome shift parameter). Broad symptom improvement after CBT, including improved anhedonia and negative affect, was associated with normalization of reward learning rate and loss outcome shift disruptions, respectively.
Similar to other studies with large patient samples, 25,54,56 for reward learning, we found no support of a reduction in valuation with anhedonia but rather a moderation of neural expected value prediction error correlations in ventral striatum. These results suggest that highly anhedonic individuals paradoxically process large rewards as more rewarding but then fail to update future reward expectations and are consistent with previous findings of increased immediate responsivity but reduced long-term effects of rewards in individuals with depression. 57,58 During loss learning, participants higher in negative affect showed more negative valuation of outcomes and no learning rate variation, suggesting that maladaptive overresponsivity to losses observed in individuals with depression 12,15,20 may be due to valuing negative feedback more negatively and not to overadjusting following negative feedback. Overall, our findings of altered learning processes at baseline show that depression impairs aspects of reward learning and loss learning but in potentially distinct ways: poor reward learning is because of slow updating while disrupted loss learning results from pessimistic valuation of outcomes. Of interest, the increased reward outcome sensitivity in participants with depression with greater anhedonia is discrepant from previous reports 10,59 that included participants lower in anhedonia or smaller patient samples not permitting dimensional analyses. Future studies are warranted to fully delineate the nature and impact of associations between anhedonia and outcome sensitivity in depression.
Of clinical importance is whether reinforcement learning processes are sensitive to treatment, which would indicate a potential causal relationship between learning changes and symptom improvement. We indeed found that symptom change following CBT was correlated with remediation of altered learning parameters and neural responses. Specifically, greater symptom improvement was accompanied by increased reward learning rate, a normalized association between neural signals of expected value and prediction error during reward learning, and a more positive outcome shift during loss learning. The learning changes have conceptual overlap with CBT's focus on challenging negative evaluations and reflecting on outcomes of pleasurable activities. 49 These associations suggest that model-derived learning parameters go beyond describing alterations at baseline and are sensitive to treatment-induced changes in symptoms. The data may thus inform the future development and testing of symptombased or parameter-based therapies that directly target behavioral or neural circuits involved in reinforcement learning. Model-derived learning parameters may also be used to improve outcomes and tailor extant treatments to individual symptom presentations (eg, focusing on updating reward expectations in patients high in anhedonia vs more positive valuation of negative outcomes in patients high in negative affect).

Limitations
The limitations of this work warrant attention. First, although our sample size was comparatively large, even larger samples would ensure adequate power to detect smaller effects, particularly with more conservative Bayesian multilevel analyses, 53,60 and to statistically dissociate changes in specific symptom clusters after treatment. Second, while comparable with other work of this scope, our exclusion rate owing to issues with scanning or behavioral data was relatively high and may affect the generalizability of results. Indeed, although participants with depression were not excluded more often than those without depression, excluded participants did have lower estimated IQ. In addition, larger, more clinically heterogenous samples may be needed to detect symptom subscale-specific changes or reward outcome sensitivity changes following CBT. Future work may also clarify the specificity and sensitivity of learning parameters to change by com-paring changes in associations of learning with symptoms between CBT and other treatments or natural variation in symptoms over time.

Conclusions
In individuals with depression, associations between symptoms and disrupted valuation have long been hypothesized but difficult to dissociate. By parsing components of value-based learning in a large, well-characterized sample, we show that associations of learning with symptoms in individuals with depression are present and may vary by valence and learning process. The remediation of computational model-identified learning processes associated with symptom changes after CBT suggest a mechanistic role of learning disruptions in those with depression. More broadly, this work may provide a bridge between behaviorally oriented clinicians and computational (neuro)scientists toward novel integrative ways for understanding and treating depression.