Figure 1. Example of reinforcement learning task stimuli and feedback. A, Feedback delivered after a correct choice (indicated by a blue border) in the reward trials. B, Feedback delivered following an incorrect choice. C, Feedback delivered following a correct choice in the loss-avoidance trials. D, Feedback delivered following an incorrect choice.
Figure 2. Differences in reinforcement learning among patients and healthy control (HC) subjects in 90% and 80% probability gain and loss-avoidance conditions. A and B, Performance in the 90% and 80% gain conditions, respectively. C and D, Performance in the 90% and 80% loss-avoidance conditions, respectively. HNS indicates high-negative symptom; LNS, low-negative symptom.
Figure 3. Performance on the gain and loss-avoidance difference score among patients and healthy control (HC) subjects. The difference score was calculated using block 4 performance. Scores above zero indicate better learning from gain than from loss avoidance, while scores below zero indicate better learning from loss avoidance than from gain. HNS indicates high-negative symptom; LNS, low-negative symptom.
Figure 4. Observed and model simulation results for end acquisition and transfer test phase performance in patients and healthy control (HC) subjects. A and B, Observed (A) and simulated (B) end acquisition performance across groups, showing how the modeled controls had a preference for learning from gains relative to losses, an effect that is reduced in the low-negative symptom (LNS) group and absent in the high-negative symptom (HNS) group. C and D, Observed transfer test phase performance (C) and simulation results (D). Note that the simulations capture the reduced preference for frequent winners (FW) over frequent loss avoiders (FLA) in the HNS group (the only significant difference in the behavioral analyses of the transfer test phase pairs), coupled with a preserved preference for frequent winners over frequent losers (FL) and infrequent winners (IW). All groups and simulated groups show a preference for frequent loss avoiders over infrequent winners, despite having lower expected value.
Figure 5. The relative contribution of Q learning and actor-critic learning to behavioral choices. A, Greater contribution of Q learning in healthy control (HC) subjects relative to the patient groups. Only the contrast between the HC group and the high-negative symptom (HNS) group was statistically significant. B, Predicted performance in a model of pure actor-critic (AC) or pure Q learning (Q) in the 2 diagnostic transfer test phase pairs. The Q model shows clear preference for frequent winners (FW) over frequent loss avoiders (FLA), whereas the actor-critic model does not. The 2 models show opposite preferences for frequent loss avoiders over infrequent winners (IW). One thousand model simulations were run to generate these predictions using parameters fit to the controls, but the pattern is robust to parameter changes. LNS indicates low-negative symptom.
Gold JM, Waltz JA, Matveeva TM, Kasanova Z, Strauss GP, Herbener ES, Collins AGE, Frank MJ. Negative Symptoms and the Failure to Represent the Expected Reward Value of ActionsBehavioral and Computational Modeling Evidence. Arch Gen Psychiatry. 2012;69(2):129-138. doi:10.1001/archgenpsychiatry.2011.1269
Author Affiliations: Department of Psychiatry, Maryland Psychiatric Research Center, University of Maryland School of Medicine, Baltimore (Drs Gold, Waltz, and Strauss and Mss Matveeva and Kasanova); Departments of Psychiatry and Psychology, University of Illinois, Chicago (Dr Herbener); and Departments of Cognitive, Linguistic& Psychological Sciences and Psychiatry and Human Behavior, Brown University, Providence, Rhode Island (Drs Collins and Frank).
Context Negative symptoms are a core feature of schizophrenia, but their pathogenesis remains unclear. Negative symptoms are defined by the absence of normal function. However, there must be a productive mechanism that leads to this absence.
Objective To test a reinforcement learning account suggesting that negative symptoms result from a failure in the representation of the expected value of rewards coupled with preserved loss-avoidance learning.
Design Participants performed a probabilistic reinforcement learning paradigm involving stimulus pairs in which choices resulted in reward or in loss avoidance. Following training, participants indicated their valuation of the stimuli in a transfer test phase. Computational modeling was used to distinguish between alternative accounts of the data.
Setting A tertiary care research outpatient clinic.
Patients In total, 47 clinically stable patients with a diagnosis of schizophrenia or schizoaffective disorder and 28 healthy volunteers participated in the study. Patients were divided into a high-negative symptom group and a low-negative symptom group.
Main Outcome Measures The number of choices leading to reward or loss avoidance, as well as performance in the transfer test phase. Quantitative fits from 3 different models were examined.
Results Patients in the high-negative symptom group demonstrated impaired learning from rewards but intact loss-avoidance learning and failed to distinguish rewarding stimuli from loss-avoiding stimuli in the transfer test phase. Model fits revealed that patients in the high-negative symptom group were better characterized by an“actor-critic” model, learning stimulus-response associations, whereas control subjects and patients in the low-negative symptom group incorporated expected value of their actions (“Q learning”) into the selection process.
Conclusions Negative symptoms in schizophrenia are associated with a specific reinforcement learning abnormality: patients with high-negative symptoms do not represent the expected value of rewards when making decisions but learn to avoid punishments through the use of prediction errors. This computational framework offers the potential to understand negative symptoms at a mechanistic level.
In the past decade, interest in the role of deficits in reinforcement learning (RL) and reward processing for understanding the symptoms of schizophrenia has been increasing.1- 4 This work has been shaped by studies5,6 of behaving primates showing that the pattern of dopamine cell firing seems to code reward prediction errors (PEs), with cells increasing their phasic firing rates when outcomes are better than expected (positive PEs) and briefly ceasing to fire when outcomes are worse than expected (negative PEs). It is thought that positive PE signals are broadcast to dopamine cell target areas and serve to reinforce currently active motor responses and representations that are associated with better-than-expected outcomes. In contrast, transient cessations in dopamine cell activity indicate that current actions have resulted in poorer-than-expected outcomes and should be avoided. This pattern of dopamine cell firing has been successfully modeled using RL algorithms,7- 9 and there is consistent support for the notion that phasic dopamine signals modify synaptic plasticity in corticostriatal circuits associated with action selection.10- 12
It is well documented that antipsychotic medications achieve their effect through the blockade of dopamine receptors, supporting the inference that psychosis is linked to excessive dopamine release.13- 16 It has been proposed that excessive dopamine cell firing might“reinforce” or inappropriately increase the salience of stimuli and responses, driving aberrant learning processes that contribute to psychosis.3,17 This hypothesis has been supported by empirical evidence indicating associations between abnormal PE signaling and the presence of psychosis18 and the severity of delusions.19
Waltz et al20 argued that the negative symptoms of schizophrenia may be understood as reflecting a different RL abnormality.2 It was found that patients with schizophrenia show reduced learning from positive outcomes compared with control subjects but do not differ from control subjects in learning from negative outcomes.20 This pattern of performance is most pronounced in patients with high-negative symptoms. However, it is unclear whether impairments in learning from positive outcomes reflect impaired learning from positive PEs per se or a deficit in representing the expected reward value of choices themselves. Past investigations used learning tasks in which the optimal response was associated with reward receipt that should generate a positive PE and the less optimal choice was associated with an actual loss or withholding an expected reward, both of which should generate negative PEs.20 Therefore, these earlier studies could not distinguish between a failure to learn from positive PEs and a failure in the representation of the prospective reward values during decision making.
This distinction between positive PEs and the valuation of positive outcomes when making choices maps onto the distributed neural system that is involved with decision making. In the basal ganglia, reinforcement outcomes influence subsequent behavioral choices through synaptic plasticity in response to PEs signaled by dopamine neurons. This“slow” learning system is complemented by the contribution of the orbitofrontal cortex (OFC),21 thought to represent the expected value of potential outcomes in working memory.22,23 These OFC value representations are believed to be more rapidly and flexibly updated than those in the basal ganglia and provide additional“top-down” influence on decision making. Therefore, reward-based decision making involves (at least) 2 separate processes, namely, a learning mechanism that reinforces choices that have led to positive PEs in the past and a representation of the expected value of a situation-action pair.23- 25
In computational models of reward learning and decision making, the contribution of the basal ganglia system is often formalized using an“actor-critic” framework or a“Q learning” framework.9 In the former, a“critic” evaluates the reward values of particular states, and the“actor” selects responses as a function of learned stimulus-response weights. When outcomes differ from expectations, PEs are used to modify learning in the critic itself (to better predict reward values in the future). The critic's PEs also serve to increase and decrease stimulus-response weights in the actor. With learning, this scheme allows the actor to select actions with strong stimulus-response weights (ie, those that have produced more positive than negative PEs) without representing the expected reward values of the actions themselves. In contrast, in Q learning, instead of learning the value of particular states, the expected quality (“Q value”) of each action is learned separately; actions are selected by comparing the various Q values of each candidate action and probabilistically choosing the largest one. In this case, PEs are computed with respect to the expected Q value of the selected action and are used to adjust expected action value directly. Therefore, whereas the actor in the actor-critic scheme does not consider the outcome values of competing actions, the Q-learning scheme makes these fundamental. There is compelling evidence that the OFC has a critical role in representing these kinds of value representations.23,26- 28
Although these 2 RL algorithms are similar,9 they make different predictions about the nature of representations used in reward-based decision making that can be highlighted with appropriate task manipulations. Herein, we examine a hybrid model in which (putatively striatal) action weights are learned as a function of PEs but are also modulated by representation of the expected outcome Q value (due to putative top-down input from the OFC).23
Moreover, this modeling framework provides an important means of contrasting different hypotheses about the origins of the reward learning deficits found in earlier work. Specifically, deficits in learning from rewarding outcomes could be the result of a primary failure in the ability to signal positive PEs or, alternatively, impairments in the ability to represent the positive expected value of decision outcomes.
Following work by Pessiglione et al29 and Kim et al,30 we implemented a task in which participants were asked to simultaneously learn 4 discriminations. In 2 pairs, the choice of the optimal stimulus is probabilistically associated with the receipt of money (a positive PE), and the choice of the nonoptimal stimulus results in no reward (ie, a zero outcome). Failure to obtain a reward in these pairs should result in a negative PE. In these pairs, as in prior work, rewards and positive PEs are conflated. In 2 other pairs, the choice of the optimal stimulus results in no loss (ie, loss avoidance, a zero outcome), whereas the choice of the nonoptimal stimulus results in overt monetary loss. In this overall design, the same“zero” outcome would result in a positive PE when it occurs in the context of potential negative outcomes31,32 but would result in a negative PE when encountered in the context of potential rewarding outcomes. This interpretation is consistent with computational models indicating that active avoidance relies on learning from positive PEs33- 35 and that avoidance of an aversive outcome activates reward areas.30
After the initial acquisition phase of the task, participants completed a transfer test phase in which they chose between novel combinations of all the trained stimuli without additional feedback.36 Critically, some pairs involved selecting between an action that had been rewarding and one that had simply avoided a loss. Both actions should have produced positive PEs during learning, leading to an increased tendency to select them. If one's choices are solely determined by the strength of association with positive PEs (as in actor-critic), the rewarding stimulus and loss-avoiding stimulus would be of equal value. Alternatively, if one was sensitive to the expected outcome of the action (eg, if action selection relies on Q values), participants should prefer the gain-producing action over the action with zero outcome.
We hypothesized that patients having schizophrenia with high-negative symptoms would show specific impairment in representing reward value. This would likely implicate OFC dysfunction in these patients.20,37
Forty-seven patients (45 outpatients and 2 inpatients) meeting DSM-IV38 criteria for schizophrenia (n = 42) or schizoaffective disorder (n = 5) and 28 demographically similar volunteer healthy control (HC) subjects participated in the study. All patients had been on a stable medication regimen for at least 4 weeks at the time of testing and were considered clinically stable by treatment providers. The outpatients were recruited from the Maryland Psychiatric Research Center and from local clinics. The 2 inpatients were recruited from the Maryland Psychiatric Research Center Treatment Research Unit. All were taking antipsychotic medication (Table). Fifteen patients were also treated with antidepressants, 9 with mood stabilizers, 7 with anxiolytic agents, and 6 with anticholinergic medication.
The HCs were recruited from the community via random-digit dialing and word of mouth among recruited participants. They had no current Axis I or Axis II diagnoses as established by the Structured Clinical Interview for DSM-IV–Axis I Disorders39 and Structured Interview for DSM-IV Personality,40 reported no family history of psychosis, and were taking no psychotropic medications. All participants had no history of significant neurological injury or disease and reported no significant medical or substance use disorders. All participants provided informed consent for a protocol approved by the University of Maryland Institutional Review Board.
Patients were divided into a high-negative symptom (HNS) group and a low-negative symptom group (LNS) by a median split on the sum of the avolition and anhedonia global items on the Scale for the Assessment of Negative Symptoms.41 These items were selected because they have been found to reflect a single factor42,43 and are more theoretically relevant to reward learning than the“restricted affect” factor of the Scale for the Assessment of Negative Symptoms.
The 3 groups did not significantly differ in age, sex, or race/ethnicity (Table). The HC group had more years of education than both patient groups, and the LNS group completed more years of education than the HNS group. Both patient groups had moderate symptom severity, as indicated by their Brief Psychiatric Rating Scale44 total scores. When Brief Psychiatric Rating Scale factors45 were examined, the HNS group had greater severity on the negative cluster symptom factor, while the 2 patient groups did not differ on the positive cluster or disorganized cluster factors.
All participants completed measures of word reading,46,47 general intelligence,48 and the MATRICS (Measurement and Treatment Research to Improve Cognition in Schizophrenia Consensus Cognitive) battery.49 The HC group scored significantly higher than both patient groups on all standard measures (Table). The 2 patient groups showed almost identical performance on all standard cognitive measures.
The learning task was administered via commercially available software (E-Prime; Psychology Software Tools) and was run on a laptop computer with a 17-in monitor. Stimuli were color images of landscapes appearing on a gray background. Participants were presented with 4 pairs of landscape items, 1 pair at a time (Figure 1). Two pairs involved potential gain; if the correct item was selected, participants saw an image of a nickel coupled with the word“Win!,” whereas if the incorrect item was selected, they saw“Not a winner, Try again!” The correct response was reinforced on 90% of trials in one pair and on 80% of trials in the other pair. Two other pairs involved learning to avoid losses; in these pairs, selection of the correct response received the feedback“Keep your money!,” whereas selection of the incorrect item resulted in the feedback“Lose!” Therefore, if the best item in the loss-avoiding pairs was selected, participants avoided a loss 90% or 80% of the time. A brief 12-trial practice session was administered to ensure task comprehension, followed by 160 learning trials with all pair types presented in a randomized order. Each pair was shown 40 times during training. To examine learning, the 160 trials were divided into 4 learning blocks of 40 trials.
Following training, the transfer test phase was presented. In these 64 trials, the original 4 training pairs were each presented 4 times, and the 24 novel pairings were each presented twice. For novel pairings, each trained item was presented with every other trained item (ie, an item that had been a 90% winner was paired with both items from the 80% gain pair, the 90% loss-avoidance pair, and the 80% loss-avoidance pair). Participants were instructed to pick the item in the pair that they thought was“best” based on their earlier learning. No feedback was administered during this phase.
We examined the ability of the following 3 different models to fit each participant's trial-by-trial sequence of choices across training and transfer test phases: (1) a standard actor-critic architecture simulating pure basal ganglia–dependent learning, (2) a pure Q-learning model simulating action selection as a function of learned expected reward value, and (3) a hybrid model in which an actor-critic is“augmented” by a Q-learning component meant to capture the top-down influence of OFC value representations onto striatum. See the Appendix, eFigure 1, and eFigure 2.
An omnibus repeated-measures analysis of variance (ANOVA) was first conducted with a between-subject factor of group (HC, LNS, and HNS) and within-subject factors for feedback valence (gain vs loss avoidance), probability (90% and 80%), and learning block (blocks 1-4). Hyun-Feldt correction was applied if assumption of sphericity was violated; unless indicated, sphericity was not violated, and no correction was made. Significant interactions were followed by a series of ANOVA and post hoc least significant difference (LSD) test contrasts examining differences in block 4 performances. To examine the balance of learning from gain vs loss avoidance, we subtracted block 4 learning achieved from gain from that achieved from loss avoidance, testing for group differences using ANOVA, followed by within-group paired-sample t test. Transfer test phase performance was examined using 1-way ANOVA, followed by LSD post hoc contrasts.
As shown in Figure 2A, the HC group and the LNS group demonstrated robust learning in the 90% gain condition, with the HNS group demonstrating limited learning. In contrast, the groups performed similarly in the 80% gain condition (Figure 2B). The loss-avoidance learning blocks produced different results. Here, the HNS group matched or performed at slightly higher levels than the other 2 groups, suggesting that their learning is more effectively driven by loss avoidance than by gain seeking (Figure 2C and D).
The omnibus repeated-measures ANOVA with factors of group, feedback valence, probability, and learning block yielded main effects of probability (better performance in the 90% condition than the 80% condition; F1,72 = 6.08, P = .02), learning block (better performance over time; F3,216 = 18.34, P < .001), and a probability- × -group interaction (where both the HC group and the LNS group show better performance in the 90% condition than the 80% condition, while the HNS group shows similar performance with both probabilities; F2,72 = 3.89, P = .03). In addition, there was a significant feedback valence– × –learning block interaction (F6,72 = 4.42, P = .005), qualified by a trend toward a feedback valence– × –learning block– × -group interaction (F6,72 = 2.22, P = .06 after Hyun-Feldt correction). This last interaction suggests that the groups learned differently over time as a function of whether they were learning from rewards or from loss avoidance.
To assess whether feedback valence differentially affected final performance levels, we conducted a 2 feedback valence– × –2 probability– × –3 group repeated-measures ANOVA with block 4 performance as the dependent variable because it captured asymptotic learning levels. This analysis produced a significant main effect of probability (F1,72 = 4.77, P = .03 [90% greater than 80% stimuli]) and a significant group- × –feedback valence interaction (F2,72 = 4.51, P = .01) (ie, the groups learned differently as a function of feedback valence). The probability- × -group interaction fell short of significance (F2,72 = 2.43, P = .10); no other effects approached significance. One-way ANOVAs examining performance for each of the 4 stimulus pairs were conducted to explore the nature of the feedback valence– × -group interaction. The only significant overall group difference was found on the 90% rewarded stimulus (F2,74 = 3.83, P = .03). Post hoc LSD contrasts indicated that the HC group demonstrated significantly greater learning on this stimulus than the HNS group (P = .007); no other contrasts were significant.
To further examine feedback valence effects on learning, we computed difference scores for both the 90% and 80% conditions between end acquisition performance on gain-seeking trials and loss-avoidance trials (Figure 3). A positive difference score indicated better learning from gain, while a negative difference scores indicated better learning from loss avoidance. Individual 1-way ANOVAs indicated that the 3 groups differed significantly on the 90% pairs (F2,72 = 4.56, P = .01). Post hoc LSD contrasts indicated significantly better learning from gain than from loss avoidance in the HC group than in the HNS group (P = .01); all other contrasts and tests of other pairs were nonsignificant.
Finally, we conducted within-group paired-sample t tests to test the comparative influence of learning achieved from gain vs loss avoidance at each probability level. There was only one statistically significant difference: the HNS group learned significantly more from the 90% loss-avoidance stimulus than from the 90% gain stimulus (P < .05).
Performance on 9 types of novel stimulus pairings was examined for the transfer test phase (Appendix, eFigure 1, eFigure 2, and Figure 4C). Pairings in which participants were confronted with the most frequently rewarded stimuli (FW in the figures) and the stimuli that most reliably avoided losses (FLA in the figures) provided the critical test of the hypothesis that the HNS group showed a specific impairment in representation of expected positive value of decision outcomes rather than learning from positive PEs. The 1-way ANOVA examining differences among the groups was significant (F2,74 = 5.81, P = .005), with post hoc LSD comparisons indicating a significant difference between the HC group and the HNS group (P = .001) and an approach toward a significant difference between the LNS group and the HNS group (P = .06). As shown in Figure 4C, the HC group showed a robust preference for frequently rewarded stimuli over loss avoiders, consistent with the pattern expected if they were representing the positive expected value of the stimuli rather than relying on the number of times a stimulus has been associated with a positive PE. In contrast, the HNS group showed no preference for gain relative to loss avoiders, indicating that their preferences were based on the accumulation of positive PEs and did not take into account the value associated with those positive PEs. Although we assessed whether there were significant differences between groups in other stimulus–feedback valence comparisons, no other statistically significant differences were found.
An alternative explanation for our results is that the lack of preference for gain over loss avoidance in the HNS group might be due to difficulty in learning about rewards in general. However, as shown in Figure 4, the HNS group demonstrated a robust preference for frequently rewarded stimuli over frequently losing (FW vs FL in the figure) stimuli during the transfer test phase, with no differences observed among the 3 groups (overall F2,74 = 2.06, P = .14). Furthermore, the HNS group preferred frequently rewarded stimuli over infrequently rewarded stimuli (FW vs IW in the figure). Therefore, the failure to prefer“winners” over loss avoiders cannot be explained by a failure to have learned which stimuli were associated with reward receipt.
We also examined the preference for frequent loss avoiders over infrequent winners (FLA vs IW in the figures). All 3 groups had a robust preference for the loss avoiders, despite the fact that the infrequent winner actually had a slightly positive expected value that was higher than that of loss avoiders. Therefore, all 3 groups preferred the stimulus that was more frequently associated with a positive PE over a choice that had a higher expected value but was also associated with more frequent negative PEs during learning.
We calculated haloperidol equivalents for antipsychotic medication dosage for each patient using Expert Consensus Panel guidelines.50 There was no difference in overall antipsychotic burden between the HNS group and the LNS group (t = 0.58, P = .56). Furthermore, we found no significant correlations between medication dosage and any measures of acquisition, training, or transfer test phase performance. These results suggest that antipsychotic burden is unlikely to account for our findings; however, we cannot rule out an effect of antipsychotic medication on performance that might only be observed by studying nonmedicated patients.
The goal of computational modeling was to provide quantitative fits of the overall pattern of acquisition and transfer test phase data by each of 3 models (Appendix, eFigure 1, and eFigure 2). Figure 4B and D show that the best-fitting model reproduces the central features of the data in both training and transfer test phases, including better learning from gain than from loss avoidance (Figure 4B) and preference for frequent winners over frequent loss avoiders at the transfer test phase in the HC group (Figure 4D). Both of these effects are severely attenuated in the HNS group. The simple actor-critic model was insufficient for the HC group because it captured neither (1) more robust acquisition for winners vs loss avoiders (Figure 4A) nor (2) the observed robust preference for winners over loss avoiders at the transfer test phase. The pure Q-learning model could not account for the observed preference of frequent loss avoiders (FLA in the figures) compared with infrequent winners (IW in the figures) across all groups because infrequent winners have higher expected value (Figure 5B). The critical results are that the hybrid actor-critic–Q-learning model provided the best overall fit to the data and that the HNS group differed from the HC group and the LNS group specifically by demonstrating a reduced Q-learning component.
We tested whether the fitted parameter values from the hybrid model differed by group using ANOVA. We found a main effect of group for the mixing parameter c (Figure 5A) (F2,67 = 3.8, P = .03), indicating a significant difference between groups in the degree to which the Q-value component influenced choices. Follow-up analyses revealed significantly lower contribution of Q values for the HNS group compared with the HC group (t = 2.77, P = .008), as well as a trend in the comparison of the LNS group with the HC group (t = 1.70, P = .09). As shown in Figure 5A, the HC group data were characterized by greater influence of Q learning than actor-critic learning, whereas the HNS group showed the opposite pattern.
These results provide insight into the origins of avolition and anhedonia in schizophrenia. First, patients with the most severe negative symptoms demonstrate deficits in learning from rewarding outcomes. This deficit is not a manifestation of a general learning impairment because the HNS group performed at levels similar to those of the HC group when learning to avoid losses. Second, in the transfer test phase, the HNS group did not show a preference for a frequently rewarded stimulus over a frequent loss avoider; that is, they were less able to take expected reward values into account during decision making; therefore, decisions were based on stimulus-response weights learned from prior PEs.
This is an RL formula for avolition: patients are better able to learn actions that lead to the avoidance of punishing outcomes than they are to learn actions that lead to positive outcomes. This pattern of data suggests that negative symptoms are not associated with reduced learning from positive PEs per se, as previously suggested, but rather with impairment in the representation of positive expected value to guide decisions. This conclusion is consistent with other data suggesting that negative symptoms are associated with deficits in reward-based tasks that depend on prefrontal or orbitofrontal cortical function.20,37,51
It is notable that the LNS group differed minimally from the HC group in RL behavior, with no statistically significant differences observed. Therefore, RL impairments may not be characteristic of all patients with schizophrenia but may be most evident in patients with HNS. Furthermore, the fact that the performance of the LNS group approached that of the HC group demonstrates that RL deficits are not caused by the use of antipsychotic medications: both patient groups were similarly medicated, and only the HNS group showed a deficit in learning from gain. Further study is needed in medication-free patients to address this question more definitively.
How do we account for impairment in learning from rewards with spared loss-avoidance learning in patients with HNS? Herein, the computational modeling serves to constrain our interpretation by providing a formalization of behavioral deficits grounded by a convergence of theoretical, cognitive, and neuroscientific constructs.52 By reducing the Q-learning contribution, which is thought to reflect the top-down influence of the OFC, we were able to closely simulate the pattern of data observed in both the training and transfer test phases in the HNS group. Insofar as the role of Q learning in the model is consistent with current evidence about OFC function,53,54 the modeling results provide proof of principle that this type of mechanism can account for the origins of severe negative symptoms. Clearly, this is an oversimplification because many other neuromodulatory systems and anatomic areas are involved in reward learning and may be implicated in the impairments documented herein. However, the modeling results demonstrate that it is possible to account for patient behavior in our task environment with a simple RL approach. The finding that patients and the HC group differed not only within the parameters of a given model but also in the best-fitting model itself implies that caution should be applied when interpreting functional imaging or behavioral data that assume that patients and control subjects are using the same neural and cognitive strategy (ie, the same model).
Overall, our data suggest that abnormalities in the reward system of patients with HNS are more strongly due to abnormalities in the cortical (representational) part of the reward system than to the basic machinery of dopamine signaling in the basal ganglia and limbic system. The representation of goal-directed action-outcome associations has been shown to rely on prefrontal cortical function,55 and degraded prefrontal cortical representations may explain why the HNS group showed no preference for a gain-producing stimulus over a loss-avoiding stimulus, despite the fact that one was associated with a positive outcome and another with a zero outcome. These interpretations also converge with findings suggesting that negative symptoms are associated with a reduced tendency to make strategic exploratory responses to determine whether better rewards may be available than those experienced thus far,37 the same pattern observed in healthy individuals with the COMT Val/Val genotype56 and associated with prefrontal cortical activation.57
Other findings from our group are consistent with the results reported herein. Waltz et al20 reported that patients with schizophrenia showed impaired learning from frequently rewarded stimuli but showed intact avoidance of infrequently rewarded stimuli. In a reanalysis of that data set stimulated by the present findings, it was clear that patients in the HNS group drove the impaired reward learning effect (Appendix, eFigure 1, and eFigure 2). Furthermore, in functional magnetic resonance imaging, Waltz et al57 showed intact modulation of blood oxygenation level–dependent signal response in the striatum in response to negative PEs but showed decreased signal in response to reward receipt. In addition, Waltz et al20 and Strauss et al37 demonstrated impairments in learning from positive rewards and spared learning from negative outcomes using“Go” vs“NoGo” learning paradigms with different behavioral end points. The present experiment extends these findings in a critical fashion by implicating the abnormal valuation of positive outcomes in patients' blunted learning from positive PEs associated with rewards.
Most work investigating PE signaling in schizophrenia has focused on the possibility that aberrant positive PEs may underlie positive symptoms.18,19,58 Our focus is different, and the present design is not optimal for detecting aberrant positive PEs. Therefore, our results do not contradict prior studies but rather suggest that PE-driven RL models may also offer a means of understanding negative symptoms.
The origin of negative symptoms remains a major puzzle. By definition, such symptoms are the absence of normal function. Yet, such an absence must implicate the presence of an underlying causal mechanism. Our data suggest that patients with HNS fail to represent the relative value of different rewards when making decisions, while avoiding losses and punishing outcomes. This is an RL formula for avolition, likely resulting in a narrowing of patients' behavioral repertoires and a failure to activate behavior to accomplish goals.
Correspondence: James M. Gold, PhD, Department of Psychiatry, Maryland Psychiatric Research Center, University of Maryland School of Medicine, PO Box 21247, Baltimore, MD 21228 (firstname.lastname@example.org).
Submitted for Publication: February 2, 2011; final revision received July 29, 2011; accepted August 4, 2011.
Financial Disclosure: Dr Gold receives royalty payments from the Brief Assessment of Cognition in Schizophrenia and has consulted for Merck, Pfizer, Solvay, GlaxoSmithKline, and AstraZenaca.
Funding/Support: This work was supported by grant R01 MH080066 from the National Institute of Mental Health.
Role of the Sponsors: The funding organization had no role in the design or conduct of the study; the collection, management, analysis, or interpretation of the data; or in the preparation, review, or approval of the manuscript.
Previous Presentation: This study was presented at the 13th International Congress on Schizophrenia Research; April 5, 2011; Colorado Springs, Colorado.
Additional Contributions: Sharon August, MA; Leeka Hubzin, MA; Jacqueline Kiwanuka, MBA; and Dhivya Pahwa, BA, contributed to the study.