[Skip to Content]
Access to paid content on this site is currently suspended due to excessive activity being detected from your IP address Please contact the publisher to request reinstatement.
[Skip to Content Landing]
Limit 200 characters
Limit 25 characters
Conflicts of Interest Disclosure

Identify all potential conflicts of interest that might be relevant to your comment.

Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.

Err on the side of full disclosure.

If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.

Not all submitted comments are published. Please see our commenting policy for details.

Limit 140 characters
Limit 3600 characters or approximately 600 words
    2 Comments for this article
    A COMMENT ON “Quantifying the Association Between Psychotherapy Content and Clinical Outcomes Using Deep Learning”
    Marten Kilberg, MscPsy, MscEE | None
    I have several concerns with the scientific approach in this study. For example, a simple factor analysis would probably have provided more new information than training an algorithm (deep-learning) with 24 predefined and selected categories. In general, the training of an AI system predicts how it performs (Brownlee, 2017).

    The authors use the word “objective” regarding their analysis, but choses quote: "We defined a total of 24 feature categories, informed by the CBT competences framework and the Revised Cognitive Therapy Scale" (examples of categories are “Set_Goal” vs “Hello”). This can be motivated by different types of rationales but it
    limits considerably the possible scientific conclusions. The limitation based on the design used imposes several restrictions and limits conclusions to a relative comparison between the selected 24 categories (and nothing more), but the authors seek in their conclusions support for the CBT-model and specific factors, quote: "The findings support the key principles underlying CBT as a treatment...". A more basic interpretation of the results might be "the more the therapist address the problem the client seeks help for, the better outcome". There could be other valid interpretations but it is wise to keep it simple and not overgeneralize. The authors also draw conclusions about common vs specific factors without defining them, discussing them and set the design correctly to make any conclusions regarding this classification of factors.

    Furthermore, there are some fundamental limits to objectivity in the type of research due to the lack of triple blind presentation and the results in psychology must always be interpreted to be meaningful. In psychotherapy research this interpretation takes place in at least three places; in therapists, in clients and in researchers.

    Other concerns: Has the researcher allegiance with a particular intervention been taken into account? Is the focus on falsifying hypotheses or proving hypotheses? No discussion and limitation of measures PHQ-9 and GAD-7 and of the interrater reliability “κ = 0.54”. No discussion about treatment integrity which is questioned (Wampold & Imel, 2015). No discussion about limitations in multivariable regression.

    While AI and deep-learning could be very valuable to digest massive information, it could also mislead and in worst case be misused. If we want to improve outcome in psychotherapy, then we need to be careful and thorough in the research. An alternative hypothesis to the sometimes reported stagnation in psychotherapy outcome (Miller, 2019), might be: Not asking the right questions and lack of critical evaluation of research methods.

    Marten Kilberg, Msc, Psychology Gothenburg University, Msc Electrical Engineering Chalmers University of technology. ilfroggoya...se

    Brownlee, J (2017-july 13). "What is the Difference Between Test and Validation Datasets?". Retrieved 12 October 2017.
    Miller, S. D. (2019, February 1). Time for a New Paradigm? Psychotherapy Outcomes Stagnant for 40 years. Retrieved from https://www.scottdmiller.com/time-for-a-new-paradigm-psychotherapy-outcomes-stagnant-for-40-years/
    Wampold B., E., Imel Z., (2015). The great psychotherapy debate: The evidence for what make psychotherapy work. New York and London. Routledge.
    Response to “A comment on “Quantifying the Association Between Psychotherapy Content and Clinical Outcomes Using Deep Learning””
    Michael Ewbank, PhD | Ieso Ditial Health
    We would like to thank Mr Kilberg for the interest he has shown in our work. Although Mr Kilberg expresses a number of concerns, it is unclear to us how many of these issues would specifically explain the results of our study. In addition, we believe that some of his concerns may reflect a misunderstanding of our work. For example, he suggests that a factor analysis would have provided more new information than training a deep learning model, however, a factor analysis is a method of data reduction and can only be performed if one has a set of variables to include. Here, we had access to an unstructured data set of ~90,000 therapy session transcripts. Deep learning provides the most practicable method of imposing structure on these data. While future work could use factor analysis to reveal the latent variables underlying our chosen categories, obtaining measures of these categories was only possible using deep learning. It is also important to clarify that deep learning was not used to directly predict outcomes, rather the outputs of the model were included as predictor variables in a logistic regression.

    Mr Kilberg is correct in stating that our conclusions are limited to a comparison between the 24 chosen feature categories. However, any study must ultimately constrain the set of variables used. Given this limitation, the categories chosen were based on the most well-developed observational coding schemes that currently exist.(Blackburn et al., 2001; Roth & Pilling, 2008) We believe that a positive association between the quantity of CBT change methods and reliable improvement provides support for the key principles underlying CBT. Moreover, the results do not support Mr Kilberg’s proposal that “the more active the therapist the better the outcome”, as sessions in which a therapist is more “active” in using non-therapy related content, for example, are associated with worse outcomes.

    Issues of triple blinding and researcher allegiance are undoubtedly important considerations when interpreting the results of randomised control trials, however we should clarify that data in this study came from a real-world clinical setting. Moreover, researchers tagging transcripts were blind to the outcome of each case and the deep learning model was also blind to outcomes. For practical reasons it is not possible for a research paper to include a discussion of the limitations of all measures/methods used. However, readers are free to explore the literature regarding the validity of these instruments and approaches.

    We agree with Mr Kilberg that careful and thorough research is needed to improve outcomes in psychotherapy. However, we feel that stagnation of outcomes is unlikely to be due to “not asking the right questions”, but more likely due to most psychotherapy studies not having (and being unlikely to have) the resources required to have sufficient statistical power (Bell, Marcus, & Goodlad, 2013). We believe the application of deep learning to large clinical data sets can overcome this limitation and enable a data-driven understanding of mental health treatments that can accelerate improvement in outcomes.

    Michael P. Ewbank PhD
    PhD, Ana Catarino, PhD
    Andrew D. Blackwell, PhD
    Clinical Science Laboratory, Ieso Digital Health, Cambridge, UK.

    Blackburn I, et al. Behav Cogn Psychother. 2001;29:431-446. doi:https://doi.org/10.1017/s1352465801004040
    Bell EC, et al. J Consult Clin Psychol. 2013;81(4):722-736. doi:10.1037/a0033004
    Roth AD, Pilling S. Behav Cogn Psychother. 2008;36(February):129-147. doi:10.1017/S1352465808004141
    CONFLICT OF INTEREST: All authors are employees of Ieso Digital Health
    Original Investigation
    August 22, 2019

    Quantifying the Association Between Psychotherapy Content and Clinical Outcomes Using Deep Learning

    Author Affiliations
    • 1Clinical Science Laboratory, Ieso Digital Health, Cambridge, England
    JAMA Psychiatry. 2020;77(1):35-43. doi:10.1001/jamapsychiatry.2019.2664
    Key Points

    Question  What aspects of psychotherapy content are significantly associated with clinical outcomes?

    Findings  In this quality improvement study, a deep learning model was trained to automatically categorize therapist utterances from approximately 90 000 hours of internet-enabled cognitive behavior therapy (CBT). Increased quantities of CBT change methods were positively associated with reliable improvement in patient symptoms, and the quantity of nontherapy-related content showed a negative association.

    Meaning  The findings support the key principles underlying CBT as a treatment and demonstrate that applying deep learning to large clinical data sets can provide valuable insights into the effectiveness of psychotherapy.


    Importance  Compared with the treatment of physical conditions, the quality of care of mental health disorders remains poor and the rate of improvement in treatment is slow, a primary reason being the lack of objective and systematic methods for measuring the delivery of psychotherapy.

    Objective  To use a deep learning model applied to a large-scale clinical data set of cognitive behavioral therapy (CBT) session transcripts to generate a quantifiable measure of treatment delivered and to determine the association between the quantity of each aspect of therapy delivered and clinical outcomes.

    Design, Setting, and Participants  All data were obtained from patients receiving internet-enabled CBT for the treatment of a mental health disorder between June 2012 and March 2018 in England. Cognitive behavioral therapy was delivered in a secure online therapy room via instant synchronous messaging. The initial sample comprised a total of 17 572 patients (90 934 therapy session transcripts). Patients self-referred or were referred by a primary health care worker directly to the service.

    Exposures  All patients received National Institute for Heath and Care Excellence–approved disorder-specific CBT treatment protocols delivered by a qualified CBT therapist.

    Main Outcomes and Measures  Clinical outcomes were measured in terms of reliable improvement in patient symptoms and treatment engagement. Reliable improvement was calculated based on 2 severity measures: Patient Health Questionnaire (PHQ-9)21 and Generalized Anxiety Disorder 7-item scale (GAD-7),22 corresponding to depressive and anxiety symptoms respectively, completed by the patient at initial assessment and before every therapy session (see eMethods in the Supplement for details).

    Results  Treatment sessions from a total of 14 899 patients (10 882 women) aged between 18 and 94 years (median age, 34.8 years) were included in the final analysis. We trained a deep learning model to automatically categorize therapist utterances into 1 or more of 24 feature categories. The trained model was applied to our data set to obtain quantifiable measures of each feature of treatment delivered. A logistic regression revealed that increased quantities of a number of session features, including change methods (cognitive and behavioral techniques used in CBT), were associated with greater odds of reliable improvement in patient symptoms (odds ratio, 1.11; 95% CI, 1.06-1.17) and patient engagement (odds ratio, 1.20, 95% CI, 1.12-1.27). The quantity of nontherapy-related content was associated with reduced odds of symptom improvement (odds ratio, 0.89; 95% CI, 0.85-0.92) and patient engagement (odds ratio, 0.88, 95% CI, 0.84-0.92).

    Conclusions and Relevance  This work demonstrates an association between clinical outcomes in psychotherapy and the content of therapist utterances. These findings support the principle that CBT change methods help produce improvements in patients’ presenting symptoms. The application of deep learning to large clinical data sets can provide valuable insights into psychotherapy, informing the development of new treatments and helping standardize clinical practice.