Evaluation of Early Ketamine Effects on Belief-Updating Biases in Patients With Treatment-Resistant Depression

Key Points Question What are the effects of ketamine on belief updating in patients with treatment-resistant depression (TRD)? Findings This case-control study in patients with TRD showed that belief updating became more optimistically biased as soon as 4 hours after a first ketamine infusion. This early cognitive effect of ketamine was formalized by stronger asymmetrical reinforcement learning and mediated at 1 week of treatment the clinical antidepressant effect. Meaning These findings provide new perspectives for the understanding of the cognitive effects of fast-acting antidepressants that potentially can be leveraged to promote sustained clinical improvement and treatment responsiveness.


eMethods 3. Correlation Between Patients' Anticipation of Treatment Outcomes and MADRS Scores
Meta-analyses of pharmacological antidepressant treatment have shown that patients who expect to benefit from treatment respond best to antidepressant medication 4 .
This relationship between pharmacological effects and beliefs has led to the response expectancy theory of placebo effects 5 . To assess the extent to which patients anticipated positive treatment outcomes, expectancy ratings were collected at T0, 24 h prior to the first ketamine infusion. Thus, patients rated the following types of expectancies on a visual analogue scale between 0 and 100: (A) expected drug efficiency (i.e. how much they expect the treatment to be efficient within the next few days), (B) expected response (i.e. how intense they expect their depression to be after treatment with ketamine), and (C) expectation of remission (i.e. how much they expect remission after treatment). Due to technical problems, these ratings were obtained for only 18 of the 26 TRD patients.
We explored the idea that the patients who believe they will get better are also those who benefit most from treatment by assessing how much of a patient's anticipation in a positive treatment outcome moderated their global clinical improvement after ketamine treatment. Thus, three Pearson's correlations were conducted between global clinical improvement (i.e., expressed by the difference in the MADRS score before and one week after ketamine treatment), and (A) expected response, (B) expected drug efficiency, and (C) expectation of remission ratings that were measured 24 h before the first ketamine infusion at T0. The threshold for statistical significance was p = 0.016, corresponding to a Bonferroni correction for multiple comparisons.
We found a significant positive correlation between expected drug efficiency and global clinical improvement at one week after treatment relative to baseline (r = 0.63, p = 0.004, eFigure 1A). Moreover, patients who expected less intense depressive symptoms after treatment also showed fewer depressive symptoms after one week relative to baseline (r = -0.58, p = 0.01, eFigure 1B). Expected remission from depression and global clinical improvement trended toward a positive correlation (r = 0.28, p = 0.25).
The positive correlation between anticipated treatment outcome and clinical improvement is in accordance with the results of a previous studies 4 . These patients have often experienced the failure of many different lines of treatment and often perceive ketamine as the last chance for a cure. Here, patients were addressed to our mood center after the failure of at least two antidepressants, which might have induced the hope to benefit from a new treatment strategy, such as ketamine. At the time of the study, ketamine was a novel treatment strategy for depression and was implemented in only a handful of clinical departments in France. This context may have induced anticipation of positive treatment outcomes by patients and may explain their beneficial role in treatment responsiveness 4,5 . Of note, these two types of expectation (i.e., prognostic response and drug efficiency) did not significantly correlate with the emergence of optimism biases in belief updating. This finding suggests that changes in expectations improve depressive symptoms via two pathways: belief-updating bias and prognostic response and expected drug efficiency.  In total, 120 adverse life events were used and randomly allotted to three lists of 40 trials. The actual base rates for 60 of the 120 events were taken from previously published studies on healthy and depressed patients 6,7 . The remaining 60 events and their base rates were newly created for this study. Base rates ranged between 10 and 70%. To ensure that the range of possible overestimation was equal to the range of possible underestimation, participants estimated their beliefs anywhere within the range of 5 and 95%.
Participants used the numerical buttons on the computer keyboard to enter their responses and pressed the space key to confirm each response. All responses were selfpaced, and participants were required to respond to go to the next rating and trial. All subjects completed a practice session of three trials before beginning the main experiment. The task was identical at all testing timepoints, except for the fact that a different set of adverse life events was used in each testing session. Thus, the 120 life events were randomly divided into three lists of 40 different adverse life events, with one list for each timepoint (T0, T1, T2). The order of the lists was counterbalanced across participants (Latin square design).
The task measured the dependent variablebelief updating (UPD), which reflected the percentage to which participants changed their initial belief estimate (E1) relative to their second belief estimate (E2) given after being provided with information about the actual base rates of experiencing a given adverse life event, according to equation 1: (1) UPD = E1 -E2 Belief updating (UPD) was driven by the magnitude of the estimation errors (EEs), which indicated whether participants initially overestimated or underestimated their actual likelihood of experiencing an adverse life event (E1), relative to its actual base rate (aBR), and was calculated according to equation 2: (2) EE = E1 -aBR The estimation error was further used to categorize trials into good or bad news trials: for good news trials, the estimation error was positive (EE > 0), which indicated an initial overestimation of one's likelihood of experiencing an adverse life event relative to the actual base rate of that event (E1 > aBR). For bad news trials, the estimation error was negative (EE < 0), which indicated an initial underestimation of one's likelihood of experiencing an adverse life event relative to its actual base rate of occurrence (E1 < aBR).
The following additional variables were calculated with the participant responses during the belief-updating task: The updating bias: A complementary way to test belief-updating biases (UDB) is to directly calculate the update bias according to equation 3: (3) UDB = |UPD| good news -|UPD| bad news A positive difference indicates that participants updated beliefs about their lifetime risks of experiencing adverse life events to a greater extent after good news than after bad news. Furthermore, belief updating is driven by the magnitude of estimation errors.
To remove this confounder, the absolute updates for each participant were first averaged for good and bad news trials and then divided by the average absolute estimation error (|EE|) in good news and bad news trials. The updating bias (UDP) normalized by the magnitude of EE was then calculated according to equation 4: (4) UDB = (|UPD| / |EE|)good news -(|UPD| / |EE|)bad news A positive difference indicates that participants updated beliefs about their lifetime risks of experiencing adverse life events to a greater extent after good news than after bad news. To note, trials with zero estimation error and non-responses were excluded from these calculations.
The general knowledge about lifetime risks, called the distance: This variable expresses how much participants consider their own likelihood of experiencing a given adverse event to be different from the lifetime risk of a person with a similar socio-economic background. It is calculated according to equation 5: Moreover, this difference and the estimation error were used to calculate the personal relevance of a life event following equations 6 and 7: (6) PR = 1 -(distance / (eBR -1)) for trials when E1 < eBR (7) PR = 1 -((1distance)/(99 -eBR)) for trials when E1 > eBR This measure corresponds to the concept of 'relative personal relevance', as described by Kuzmanovic and Rigoux 2017 8 . Equations (5) and (6) formalize PR as a score between 1 and 0, with 1 indicating equal risk perception (e.g. minimal difference or relative PR) for oneself and someone else. A relative PR = 0 indicates that the participant's own risk is maximally different (or minimally equal) from the risk for someone else. Note the PR score expresses personal relevance relative to the average, irrespective of whether participants over-or underestimated their risk. For example, if a participant displayed a risk perception of 90% for a given event that was very different relative to someone else's risk (10%), the distance measure is negative and high (e.g. distance = -80% = 10 -90). Equation (5) then gives a relative PR value of 0.089, which is close to 0 and indicates minimal matching (maximal difference) of the risk perception for oneself and someone else and irrespective of the valence of this relative risk perception (i.e., whether it is a negative, over-, or positive underestimation).
Similarly, the PR value is close to zero if the initial estimation of the participant was extremely low (distance = 80 = 90 -10, 10%) relative to someone else's risk (90%). In this case, equation (6) gives the same PR value of 0.089, indicating maximal relative relevance (or minimal matching of E1 and eBR). On the contrary, if a participant perceives his/her risk to be average relative to someone else's (e.g. distance = 40 -30 = 10), the PR score will be close to 1 (e.g., 0.74). This score then, irrespective of the valence of the distance measure, indicates personal relevance equal to that, on average, in society.
The original RL-like model of belief updating validated by Kuzmanovic and Rigoux included personal relevance as a weighting factor of the influence of estimation error on belief updating. However, in our data, we found that a simpler model without personal relevance best explained the observed belief-updating behaviour of participants (see model comparisons). That is why we solely explored group and testing timepoint effects on personal relevance reported in eTable 10.

Psychometric properties of the belief-updating task: Test-retest reliability
We performed a test-retest correlational analysis for both healthy controls and TRD patients to check for the psychometric properties of the belief-updating task to reliably assess belief-updating biases. Specifically, Pearson's correlation coefficients tested whether the participants who presented a good news/bad news belief-updating bias, controlled for estimation error magnitude at baseline, also presented the bias at later testing timepoints. However, we also expected this correlation to be smaller in the TRD patients when comparing the baseline to post-ketamine treatment testing timepoints, because the patients were expected to show a weak bias at baseline consistent with previous studies on depressed patients 4 . On the contrary, given our hypothesis about the effects of ketamine, the correlation should become significant, similar to that of the healthy controls, following ketamine treatment.
Consistent with these hypotheses, belief-updating biases at T0 and T1 correlated positively and significantly for healthy volunteers (Pearson's r = 0.47, p = 0.0075). For the TRD patients the belief-updating biases at the two post-ketamine treatment testing timepoints also correlated significantly (r = 0.4 p = 0.05; r = 0.6, p = 0.0004). However, as expected and consisted with the main findings, the weak belief-updating biases at baseline did not correlate with the much stronger belief-updating biases measured at the two testing timepoints following ketamine treatment (T0 to T1: r = 0.1, p = 0.6; T0 to T2: r = 0.3, p = 0.2).

Group and testing time effects on other variables of belief updating
To test for potential differences across testing timepoints, good and bad news trials, and groups, the following belief-updating variables were fitted by linear mixed effects models according to equation 8:
The model included the following fixed effects: estimation error valence (coded -1 for bad news and 1 for good news), group (coded 1 for healthy controls and -1 for TRD patients), testing timepoint (coded -1 for baseline and 1 for 4 h after a first ketamine infusion for the TRD patients or 2 nd assessment for healthy controls), absolute estimation error magnitude (|EE|) (* except for models fitting the estimation error magnitude as a dependent variable (DV)), age, and level of education. The models further included fixed effects for the three-way interaction 'group by time by valence' to test whether the effects changed as a function of whether participants over-or underestimated their likelihood, (i.e., estimation error valence), sequential testing (i.e., testing timepoint), and participant group (i.e., TRD versus healthy controls). The model also nested the intercept by participant number to control for inter-individual differences in the DVs at the random level and random slopes for the effect of estimation error valence and magnitude.

Differences in initial belief estimates between groups and testing timepoints and good/bad news trials
We checked whether the ketamine effect observed on belief updating was driven by differences between TRD patients and controls in the initial belief estimates by conducting a linear mixed effects model according to equation (9): (9) E1 ~ 1 + group + time + EEvalence + age + education + group:time:EEvalence The results are reported in SI Table 8 and show that the effects of ketamine cannot be explained by differences in the first estimate between TRD patients and healthy controls, as indicated by a non-significant three way interaction group by time by estimation error valence for the initial beliefs estimates (E1 T0 vs T1: ß = -0.75, t(216) = -1.68, p = 0.09, 95%CI [-1.6 ; 0.12], eTable 8).

Differences in estimation error magnitude between groups, testing timepoints, and good/bad news trials
The model detected a main effect of testing timepoint (ß = -1.13, t(217) = -3.5, p = 0.0006, 95%CI [-1.7; -0.49]). The estimation error magnitude decreased the more often participants performed the task. No other main effects of group, estimation error valence, or interactions were significant, which suggests that the magnitude of estimation errors did not differ between participant groups (TRD patients vs healthy controls) or good and bad news trials (eTable 11).

Differences in confidence in the base rates between groups, testing timepoints, and good/bad news trials
The model detected no significant main effects or interactions on confidence in the base rate (eTable 12).
For the main analyses reported in the main manuscript, paradoxical trials were excluded from the update measure. These were trials in which participant estimates increased despite good news and decreased despite bad news and, thus, the responses were removed from analyses. It is not clear how to interpret these trials. They could be error trials due to fatigue or a confirmation bias. We included a particularly symptomatic population (high MADRS and high resistance score) and the experimenters observed The model relies on a generic reinforcement learning (RL) algorithm that assumes belief updating to be proportional to the size of the EEs, which are themselves weighted by the learning rate (LR) following equation 10: The learning rate determines how much beliefs are updated as a function of the size of the EEs. Consistent with previous work and to test for the good news/bad news bias in belief-updating, learning rates were estimated separately for good and bad news trials 9,11 according to equations 11 and 12.
(11) LRgood = Alpha + Asymmetry The alpha parameter accounted for the tendency to learn from estimation errors independently of their valence. The asymmetry parameter indicates how much updating is biased by the valence (good/bad) of the estimation error. In more detail, the model estimated optimal alpha and asymmetry parameters per participant.
The model was implemented by using the VBA toolbox. Information was shared across all trials of a given participant, but not across participants and testing timepoints. More specifically, the toolbox uses a variational Bayes approach to approximate Bayesian inferences about parameter estimates and model comparisons. The priors for alpha and asymmetry were unbound and untransformed. The mean of the prior distribution for alpha was set to 1 and for asymmetry, zero. The model identified meaningful parameters from simulated data (see parameter recovery below).

Model fitting details
The model was not hierarchical and the sample size of each group and timepoint did not change the values of the parameter estimates. Following 8,9 the model was fitted to each participant separately using Bayesian variational inference, yielding to a posterior distribution between parameters (for statistics: mean and variance of parameters) and a free-energy approximation for model evidence, following 10 .
The individual free-energy approximations (i.e., per participant) were then fit into a random-effect Bayesian model comparison to (a) determine the probability of each participant to be best described by one of eight versions of the model (described in the Methods section) and (b) the frequency of each model version in the population to determine the model version that dominated within the population above chance. This approach formally controls for trial-to-trial variations in estimated base rates, initial estimates, and estimation error magnitude between the good news and bad news conditions, groups, and timepoints. Moreover, learning rates were calculated from optimal alpha and asymmetry components that were estimated for all trials of a given individual participant. The learning rates were calculated separately either as the sum of optimal alpha + asymmetry values for good news trials for EE > 0 or as the difference of optimal alphaasymmetry parameters values for bad news trials for EE < 0. This means that information for alpha and asymmetry estimations were shared between all trials of a given participant, but they fit into learning rates differently as a function of the estimation error sign. Information was not shared between participants, groups, or timepoints.

Model simulations
We checked whether the model plausibly captured asymmetric belief updating by creating surrogate data that simulated the behaviour of participants during the beliefupdating task. The following parameters were used for the simulations: total trial number T = 40, alpha, asymmetry, and the number of simulations Nrep =110 per parameter setting. The alpha and asymmetry parameters were defined based on values that allowed us to best explore the parameter space qualitatively and quantitatively for the competing models. Belief-updating data was simulated for four different models: M1, with alpha = 0.6 and asymmetry = 0.1; M2, with alpha = 0.6 and asymmetry = off: M3, with alpha = off and asymmetry = 0.1; and M4, with alpha = off and asymmetry = off. M1 and M3 made similar qualitative predictions, although they differed quantitatively (eFigure 2). Notably, for M1, which took into account non-zero values for alpha and asymmetry, the belief updating was greater following positive estimation errors than following negative estimation errors than for M3, which took into account only non-zero values for asymmetry and switched alpha off. On the contrary, M2 and M4 predicted similar belief updating after positive and negative estimation errors.
However, M2, which set alpha to non-zero and switched the asymmetry parameter off, predicted less overall belief updating than M4, which predicted that belief updating is proportional to the estimation error and personal relevance, without any weighting by a learning rate.

Model comparisons
Bayesian model comparisons showed that a model that assumed asymmetrical learning from estimation errors, described best the data for all three testing time points in the TRD patients, and for the 1 st assessment in the healthy controls (TRD patients T0: Ef In accordance with the model comparisons, the average alpha parameters were significantly smaller than 1 at all testing timepoints, indicating that belief updating was not solely driven by the estimation error magnitude in either TRD patients (T0: alpha = 0.51 ± 0.03, t(25) = -14.3, p = 1.5e-13; T1: alpha = 0.43 ± 0.03, t(25) = -18.1, p = 7.1e-16; T2: alpha = 0.45 ± 0.04 0.07, t(25) = -14.8, p = 6.6e-14) or healthy controls (T0: alpha = 0.4 ± 0.04, t(29) = -13.7, p = 2.9e-14; T1: alpha = 0.4 ± 0.04, t(29) = -13.3, p = 6.8e-14). No differences were observed in the alpha parameter between testing timepoints for the TRD patients or controls. Such a non-difference in the alpha parameter between testing timepoints and groups indicates that the differences observed on the average learning rates before and after ketamine treatment were not driven by a deficit in learning (e.g., update beliefs) from estimation errors.

Decomposition of learning rates II: the asymmetry parameter
On the other hand, the asymmetry (A) parameter was on average significantly greater than zero at baseline (T0: A = 0.08 ± 0.02, t(29) = 3.9, p = 4.6e-04), and remained significantly greater than zero at the 2 nd assessment timepoint for the healthy volunteers

Parameter recovery
We conducted a parameter recovery analysis to check whether the alpha and asymmetry parameters of the winning RL model were identifiable and described the data better than any other set of parameters. We first simulated belief updates using the RL model and the VBA_simulate function of the VBA toolbox. Alpha and asymmetry values to generate belief updating were randomly sampled from a Gaussian distribution, with a mean mu = 0.7 and a precision sigma = 0.2 for alpha and mu = 0.06 and sigma = 0.1 for asymmetry. Further input to the model consisted of the estimation error magnitude (EE), which was randomly sampled from a Gaussian distribution, with a mean mu = 14 and sigma = 0.95 for EE.
Random numbers were generated using the VBA_random function of the VBAtoolbox. The total trial number was set to 40. The valence of the estimation errors was pseudo-randomly distributed to obtain 20 positive (good news) estimation error trials and 20 negative (bad news) estimation error trials. After simulating the data, the model was inverted to obtain the fit values for alpha and asymmetry using the VBA_StateSpaceModel function of the VBA toolbox. This process was repeated 30 times using new generating values for alpha and asymmetry. Finally, the fit and generating parameters were compared using Pearson's correlations.
For both alpha and asymmetry, the fit and generating values co-varied significantly (ralpha = 0.95, p = 1.5e-16; rasymmetry = 0.93, p = 5.9e-04) (eFigure 5). This result indicates that the belief updates generated by the RL model with known parameters can be fit with that model to recover the parameters.  The model tested the following serial regression paths, controlling for the effects of the previous path at each regression: (1) path c: y = cx + ey (2) path a: m = ax + em  MSM: Maudsley Staging Method, MADRS: Montgomery-Asberg Depression Rating Scale, SSRI: selective serotonin reuptake inhibitors, SNRIs: serotonin and noradrenalin reuptake inhibitors, sem: standard error of the mean. The years of current episode was defined as the duration of the current depressive episode. The age at onset of illness was defined as the age (years) at diagnosis with MDD. Disease severity was defined according to the MADRS score: moderate (20 to 34), severe (35 to 60). Resistance severity was defined according to the MSM score: moderate (7 to 10), severe (11 to 15).  The number of excluded trials did not differ significantly between groups or testing timepoints using paired and two-sampled, two-tailed t-tests. Paradoxical updating was defined by a higher second estimate (E2) relative to the first estimate (E1) (E1 < E2), despite good news. Specifically, in these trials, the base rate (BR) indicated that participants had overestimated their likelihood of experiencing an adverse life event (E1 > BR). Paradoxical updating also occurred when E2 of participants further decreased (E1 > E2), despite bad news (E1 < BR). The base rate (BR) indicated they had underestimated their likelihood of experiencing an adverse life event.  Efestimated model frequency that reflects how frequent each model is in each cohort ; pxpexceedance probability, which reflects the probability of that model to be predominant in the cohort above chance. BIC -Bayesian information criterion, AIC -Aikaike information criterium