Prediction of Suicide Attempts Using Clinician Assessment, Patient Self-report, and Electronic Health Records

Key Points Question What is the best method to predict which patients presenting to the emergency department will make a suicide attempt within 1 and 6 months after the visit? Findings This prognostic study of 1818 patients found that prediction of suicide attempts in the 1 month and 6 months after a patient visited an emergency department was significantly improved using machine learning models applied to data from a brief patient self-report scale, especially when supplemented with data from patients’ electronic health records and/or clinicians’ assessments. Meaning This study suggests that clinicians can improve their ability to identify patients at high risk of suicide by using data from a brief patient self-report scale and electronic health records.


I. Super Learner
Super Learner is an ensemble machine learning approach that uses cross-validation (CV) to select a weighted combination of predicted outcome scores across a collection of candidate algorithms (learners) to yield an optimal combination according to a pre-specified criterion that performs at least as well as the best component algorithm. R package: Superlearner (van der Laan, Polley, & Hubbard, 2007). II. Learners in the Super Learner library A. Generalized linear models Maximum likelihood estimation with flexible link function. R package: stats (Nelder & Wedderburn, 1972).

B. Elastic Net
Elastic net is a regularization method that minimizes the problem of overlap among predictors by explicitly penalizing over-fitting with a composite penalty λ{MPP x Plasso + (1-MPP) X Pridge}, where MPP is a mixing parameter penalty with values between 0 and 1 that controls relative weighting between the lasso penalty (Plasso) and the ridge penalty (Pridge). The parameter λ controls the total amount of penalization. The ridge penalty handles multicollinearity by shrinking all coefficients smoothly towards 0 but retains all variables in the model. The lasso penalty allows simultaneous coefficient shrinkage and variable selection, tending to select at most one predictor in each strongly correlated set, but at the expense of giving unstable estimates in the presence of high multicollinearity. The elastic net approach of combining the ridge and lasso penalties has the advantage of yielding more stable and accurate estimates than either ridge or lasso alone while maintaining model parsimony. R package: glmnet (Friedman, Hastie, & Tibshirani, 2010). Hyperparameters: =( 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1). C. Splines

C1. Adaptive splines
Adaptive spline regression flexibly captures both linear and piece-wise non-linear associations as well as interactions among these associations by connecting linear segments (splines) of varying slopes and smooths to create piece-wise curves (basis functions). Final fit is built using a stepwise procedure that selects the optimal combination of basis functions. R package: earth (Milborrow, Hastie, Tibshirani, Miller, & Lumley, 2016). Hyperparameters: degree = (1, 3, 5) C2. Adaptive polynomial splines a Adaptive polynomial splines are like adaptive splines but differ in the order in which basis functions (e.g., linear versus nonlinear) are added to build the final model. R package: polspline (Kooperberg, 2015). D. Decision treesbagging Random Forest. Independent variables are partitioned (based on contiguous values) and stacked to build short decision trees that are combined (ensemble) to create an aggregate "forest". Random forest builds numerous trees in bootstrapped samples and generates an aggregate tree by averaging across trees, thereby reducing over-fitting. R package: ranger (Wright & Ziegler, 2017 (2) To your knowledge, has this patient made a suicide attempt in the past week?

NO YES
(3) Based on your clinical judgment and all you know of this patient, if untreated, what is the likelihood that this patient will make a suicide attempt in the next 1 month?* 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (4) Based on your clinical judgment and all you know of this patient, if untreated, what is the likelihood that this patient will make a suicide attempt in the next 6 months?* 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Thank you for participating in this study. You will be asked to: • Complete a brief categorization task • Answer a few questions about yourself • Complete a brief color-naming task • Answer a few more questions about yourself

Please let the researcher know if you have ANY questions.
Press the "Next" button to begin.

[IAT INSTRUCTIONS]
In this task, you will be presented with a set of words to classify into groups. This will take about 5 minutes. Here are the categories and items in this task: CKPT: If A9: "temporarily laid off"/"looking for work"/ "retired"/"disabled" / "other" → GO TO A9a. All others go to A10.

A9a. How long has it been since you worked?
Within past 6 months 7 months to 11 months 1 year to 5 years More than 5 years A10. What is your best estimate of your [if married: "and your spouse's"] total income from all sources, before taxes, in 2014?
A11. Please answer "C" for this question.

B1.
Have you had problems with the following issues that lasted more than 1 month and were so bad that they caused problems for you at work/school, with your family, or friends? If so, when was the last time?

C3c. When was the last time you had a plan to kill yourself?
Past week 2-4 weeks ago 1-3 months ago 4-6 months ago 7-12 months ago

C4.
Did you ever make a suicide attempt (that is, purposefully hurt yourself with at least some intent to die)? IF C4d0i="Overdose of medications" or "Overdose of illegal drugs" or "Poisoning with a household substance or gas" → GO TO C4d1 IF C4d0i="Hanging" → GO TO C4d2 IF C4d0i="Suffocation (for example, plastic bag over head)" or "Drowning" → GO TO C4d3 IF C4d0i="Cutting or stabbing" → GO TO C4d4 IF C4d0i="Gunshot" → GO TO C4d5 IF C4d0i="Jumping from a high place" or "Motor vehicle crash" → GO TO C4d6 IF C4d0i="Any other method" → GO TO C4d7

C4d1. What are the most severe kind of injuries you had as a result of that suicide attempt? (Please check one)
Fully awake and alert Slowed-speech and movement but responsive to questions/minimal medical problems or treatment Severely reduced awareness, or some injury (e.g. mouth burns) Hospitalization -vital signs severely affected or passed out Passed out-major medical problems such as kidney failure, needed blood transfusion Unknown Minor bruises only -no treatment necessary Sprains or minor injuries -no bone, or tendon damage; no internal bleeding, or brain damage Fractured arms or legs-needed a cast but recovered completely Major bone and/or tendon damage in multiple areas and internal bleeding Major damage in skull neck or spinal cord paralysis expected Unknown C4d7. What are the most severe kind of injuries you had as a result of that suicide attempt.

____________________ [text]
C5. Did you ever do something to hurt yourself on purpose, but without wanting to die (for example, cutting yourself, hitting yourself, or burning yourself)?

Yes No GO TO C6v
C5a. About how old were you the very first time you did something to hurt yourself on purpose, but without wanting to die?

______ Years old [NUMERIC KEYPAD] [Constraint: ≤ current age provided in A1]
C5b. About how many times in your life did you do something to hurt yourself on purpose, but without wanting to die? C5c. When was the last time you did something to hurt yourself on purpose, but without wanting to die?
Past week 2-4 weeks ago 1-3 months ago 4-6 months ago 7-12 months ago More than 1 year ago C6v. Did you ever make a suicide attempt (that is, purposefully hurt yourself with at least some intent to die)?

Yes
No GO TO C6

C6cv. When was the last time you made a suicide attempt?
Past week 2-4 weeks ago 1-3 months ago 4-6 months ago 7-12 months ago More than 1 year ago

C6.
Please answer "C" for this answer.

D1.
The next questions are about the mental health of your biological parents (not stepparents). We want to know about problems that ever occurred, so answer even for people who are no longer alive.

Yes No Don't know
a. have times lasting 2 weeks or longer when they were so depressed, they couldn't concentrate, felt worthless, or felt their life was not worth living?
b. Please answer "Don't Know" for this question.
c. have manic episodes lasting several days or longer when they were excited, full of energy, and did dangerous or embarrassing things? (Do not include times due to using drugs or alcohol.) d. have anxiety attacks when they suddenly felt terrified for no good reason and would either shake, sweat, or have other physical symptoms?
e. have anger attacks when they suddenly lost control and "blew up" for no good reason, either yelling, breaking things, or hurting people?
f. have long periods of time where they were more agitated, anxious, or worried than other people so that they couldn't relax, couldn't concentrate, or couldn't function normally? The SuperLearner ensemble methodology, also known as stacking (Wolpert 1992), is a form of supervised learning in which multiple ways of predicting an outcome variable are evaluated and combined (van der Laan, Polley, and Hubbard 2007;Polley, Rose, and van der Laan 2011). Each way of predicting an outcome variable is known as an estimator or learner, and consists of up to four components: 1. Estimation algorithm: a prediction method that estimates ("learns") a mapping f(•) from the predictor variables (X) to the outcome variable Y. 2. Hyperparameter configuration: the set of tuning settings for an estimation algorithm that must be pre-specified rather than learned from the data. 3. Feature selection: optional identification of a subset of predictors that will be provided to the estimation algorithm, or simply all available features. 4. Feature transformations: optionally any transformations of the original predictor space, such as dimensionality reduction, the addition of interaction terms, imputation of missing values, or the calculation of basic functions.
One estimator might be logistic regression with no further customization. Another estimator might be random forest configured to estimate 1,000 trees (a hyperparameter), provided with predictors that have a Pearson correlation coefficient p-value of 0.2 or less (feature selection). Another estimator might be ordinary least squares (OLS) provided with all predictors. All twoway interactions and squared terms have been added to the predictor list (feature transformations).
Estimators are evaluated through cross-validation, which entails partitioning the analyzed dataset into distinct subsets known as folds. All folds except one are combined into a training set, and each estimator is provided with the training set to estimate the mapping f(•) from the predictors (X) to the outcome (Y). The estimator's learned function is then applied to the remaining fold, known as the test set, and evaluated for its accuracy using a pre-specified loss function such as mean-squared error, negative log likelihood loss, or 1 -AUC. Evaluating performance on a held-out test set, which was not used to estimate the estimator's parameters, is important for identifying any overfitting. Each fold typically serves as the test set once, and the performance estimates are averaged to determine the cross-validated loss for each estimator.
In the simplest case, the estimator with the lowest cross-validated loss is chosen. This is known as the cross-validation selector or discrete SuperLearner, and has been proven to perform asymptotically as well as a selection strategy based on understanding the true data distribution (oracle inequality) (Van Der Laan and Dudoit 2003). The implication is that there is little danger in using cross-validation to choose the best-performing estimator among a set of varied prediction strategies. Rather than only trying our personal favorite method or borrowing a recommendation from the literature, we can empirically validate multiple methods and allow the cross-validation procedure to report which method has been most successful in minimizing our loss function on the dataset at hand.
Choosing a single estimator may leave valuable performance on the table. Instead, it may be advantageous to combine the predictions of multiple estimators, possibly improving on the biasvariance tradeoff of any single estimator. Super Learning, or stacked ensembling, leverages the cross-validation approach described above to identify an optimal combination of individual estimators that minimizes the chosen loss function (e.g., mean-squared error). It does so by taking the test set data, established when each cross-validation fold is used for evaluation of the estimators, and "stacking" (appending) those test sets into a combined dataset with the same number of observations as the original data. In this stacked dataset, the predicted value of each algorithm becomes a predictor (column), a form of coordinate transformation from the original predictor space, and we call this the "Z" matrix. A metalearner algorithm is then applied to the Z matrix, which learns a function g(•) that maps the test set predictions of each estimator to the outcome variable (Y).
The most common metalearner algorithm is a convex combination of the columns of Z. In that case, it is a simple convex optimization problem to identify the set of non-negative weights that can be applied to the Z matrix to minimize the chosen loss function for predicting Y. This approach is often implemented by using non-negative least squares to estimate non-negative but otherwise unbounded weights and then rescaling those weights to sum to 1. Convex weights are beneficial for a number of reasons, including that their minimal data-adaptivity reduces the risk of overfitting, they ensure that the ensemble prediction falls within the convex hull of the original estimators' predictions, and they induce sparsity (i.e., 1 or more predictors may have a weight of 0 in the ensemble, simplifying the prediction).
More complex metalearners might be used as an alternative to the convex combination, such as a random forest or highly adaptive lasso (Benkeser and van der Laan 2016). They risk overfitting to the Z matrix, but if their complexity can be appropriately controlled their incorporation of interaction terms holds the promise of identifying regions of the estimator space (Z) where certain estimators are more accurate than others. As a result, they may be able to achieve even higher predictive performance (LeDell and Poirier 2020).
Once the metalearner estimator has been trained, each constituent estimator is optionally retrained on the full dataset as the final step. This gives each estimator a slight performance boost by not taking out a rotated test set, as was done during the earlier cross-validation. Alternatively, it is possible to skip this final retraining of the constituent estimators; and instead, use each version of the estimators trained on the separate training sets. For example, with 10fold cross-validation, there would be a copy of each estimator trained on the 10 versions of a training set. The prediction of each copy of a given estimator would be averaged before going into the metalearner to yield the ensemble prediction.
In this work, we use the SuperLearner R package (Polley et al. 2019), although an alternative implementation sl3 is under development (Coyle et al. 2021). Additional details on the SuperLearner algorithm and best practices are available (Naimi and Balzer 2018;Kennedy 2017;Polley and van der Laan 2010).