Newborn Cry Acoustics in the Assessment of Neonatal Opioid Withdrawal Syndrome Using Machine Learning

Key Points Question Can newborn cry acoustics serve as an objective biobehavioral marker of neonatal opioid withdrawal syndrome (NOWS)? Findings In this cohort study of 65 neonates with and without exposure to opioids, supervised machine learning methods identified a set of cry acoustic parameters that accurately predicted which infants received pharmacological treatment for NOWS, with an area under the curve of 0.90, accuracy of 0.85, sensitivity of 0.89, and specificity of 0.83. Meaning These results suggest that acoustic cry analysis using machine learning has potential as a measure of opioid withdrawal in neonates.


Introduction
Opioid use during pregnancy has reached epidemic proportions, with a 242% increase in the past 10 years, 1 resulting in increases in the prevalence of neonatal opioid withdrawal syndrome (NOWS), also known as neonatal abstinence syndrome. 2,3NOWS is a substantial public health problem, increasing from 1.6 per 1000 births in 2004 to 8.8 per 1000 births in 2016, 4 with associated hospital charges increasing from $190 million in 2000 to $720 million in 2009.Clinical signs of NOWS reflect opioid receptor expression 5 and include dysfunction of the central, autonomic, respiratory, and gastrointestinal systems (eg, tremors, excessive and/or high-pitched crying, and breathing and intestinal problems). 6cused attention on NOWS has highlighted the assessment and management of this condition, with implications ranging from immediate use of pharmacological treatment to subsequent development 7,8 and survival. 9Absent biological markers, the treatment of NOWS is based on observer-rated scales to evaluate NOWS severity and the need for pharmacological treatment.Until recently, the gold standard for NOWS assessment has been the Finnegan Neonatal Abstinence Scoring Tool (FNAST). 10The FNAST includes 21 items (score range, 0-46, with higher scores indicating more severe withdrawal symptoms) and was developed as a clinical aid to assess NOWS severity and guide pharmacological treatment.Although the FNAST, including various modifications, is used in 95% of institutions in the US, [11][12][13] the literature is replete with frustration and dissatisfaction with the FNAST, including the length, 14,15 subjectivity, 16 validity, and reliability 17 of the tool and the need to disturb infants to perform formal assessments. 18There are also concerns that the FNAST overestimates the need for pharmacological treatment by including signs that may not be clinically meaningful, resulting in increased length of hospital stay and hospital costs. 18e Eat, Sleep, Console (ESC) care tool 18,19 was developed as an alternative to the FNAST, with the goal of reducing pharmacological treatment.The ESC focuses on nonpharmacological strategies as first-line treatment for infants with NOWS.If the infant meets criteria for sleeping, eating, and consoling, pharmacological treatment is not initiated or escalated.Early reports from researchers using the ESC tool have been favorable in terms of decreasing the use of pharmacological intervention and reducing length of stay; however, most studies have been retrospective, few psychometric data are available, and there has been no evidence that one approach is better than the other. 20ying is a distinctive component in both the FNAST and ESC tools but is one of the most inadequately measured symptoms on the FNAST 21 because the health care professional has to judge whether the cry is high pitched (not defined) and/or whether the infant is inconsolable.
Inconsolability is also 1 of the 3 ESC criteria.A recent expert panel tasked with developing a standardized clinical definition of NOWS identified excessive crying as 1 of 5 clinical indicators used to support diagnosis. 22Crying may be important to the assessment of NOWS because infant cry characteristics reflect opioid receptor expression through the involvement of the brain stem (cranial nerves IX-XII), affecting the vocal tract, respiration, and gut. 23In addition, variation in cry acoustics (defined as the physical properties of sound) has been associated with the gene expression related to stress response systems. 24These pathways determine not only the regulatory behavior of cries, such as consolability, but also the acoustic characteristics of the cry (eg, pitch), many of which cannot be The purpose of the present cohort study was to evaluate the feasibility of using newborn cry acoustics as an objective biobehavioral marker of NOWS.We applied a machine learning approach to assess the accuracy with which acoustic cry characteristics predict the receipt of pharmacological treatment.

Study Design
This prospective controlled cohort study examined the association between acoustic cry characteristics and receipt of pharmacological treatment.

Measures
Demographic and medical information was collected from electronic medical records.Neonates exposed to opioids were monitored for 5 days for signs of NOWS using the FNAST administered every 3 hours by trained nursing staff.Pharmacological treatment (morphine) was initiated when the FNAST score was 8 or higher on 2 consecutive assessments or 12 or higher on a single assessment.
Neonates who did not meet criteria for pharmacological treatment by 5 days were discharged from the hospital.Healthy neonates were observed during routine handling (eg, diaper changes, bathing, or feeding).
Crying was recorded by attaching a digital audio recorder with an omnidirectional microphone to the side of the infant's crib at a standardized location oriented toward the infant's mouth.For neonates exposed to opioids, recording lasted from the beginning of the 5-day observation period until the infant was discharged; during this period, spontaneous and elicited cry episodes were recorded.For healthy neonates not exposed to opioids, crying was recorded during routine handling.
Episodes of cry vocalizations were identified from the audio recordings; start time of the episode was used to determine whether pharmacological treatment had begun.Recordings were screened for sounds that could interfere with the acoustic analysis (ie, adult talking or environmental noises).Two research assistants, blinded to the prenatal drug history of the infant, were trained to identify cries appropriate for acoustic analysis and achieved 89% agreement.
A total of 177 neonates were recruited in the study, and cry recordings were processed for 118 neonates, 31 of whom produced cry recordings that were inappropriate for acoustic analysis (eFigure in the Supplement).Another 10 neonates were excluded from the study because they had no usable cries before treatment was initiated, and an additional 12 were removed because of missing data.
The remaining 65 neonates included 19 infants with mothers receiving MAT who received pharmacological treatment for NOWS, 7 infants with mothers receiving MAT who did not develop NOWS, and 39 infants with no prenatal exposure to illicit substances.
Two acoustic analysis systems were used for the cry analysis: (1) a proprietary analysis software developed at Brown University (Brown analyzer) designed to perform cry analysis in infants 25 and ( including the opportunity to identify common acoustic parameters across analyzers (Figure 1).Both analyzers were run in 2 phases.First, a cepstral-based acoustic analysis was used to extract acoustic parameters in 12.5-millisecond frames.Second, the acoustic parameters were organized and summarized into cry utterances.Cry utterances are cries that occur during the expiratory phase of respiration and include both short (<500 milliseconds) and long (Ն500 milliseconds) utterances.
The Brown analyzer produced 61 acoustic characteristics per utterance and identified 29 155 utterances (20 477 short and 8678 long); the Soundgen analyzer produced 55 acoustic characteristics per utterance and identified 14 061 utterances (6509 short and 7552 long).Short and long utterances were analyzed separately.For each newborn or acoustic measure, we computed means and rates of missing data.A total count of utterances was computed for each infant.A filtered subset of acoustic characteristics was generated by removing variables with high rates (>60%) of missing data, near zero variance, 27 or excessive intercorrelation (r > .75), 27consistent with guidelines for machine learning feature filtering. 28

Statistical Analysis
We used cross-validated supervised machine learning methods in which random forest algorithms were trained to predict receipt of pharmacological treatment for NOWS using cry acoustics, and the accuracy of the algorithm was evaluated by generating a prediction on an unseen (or test) set.Leaveone-out cross-validation (LOOCV) was used to evaluate whether model predictions generalized to unseen cases.With this method, the test-train split (ie, the data split into a testing and training set) is repeated for every child in the data set such that the modeling procedure is performed for all except 1 child, and the held-out child is used to evaluate whether model predictions generalize to unseen   29 (Figure 2).

JAMA Network Open | Pediatrics
An ensemble feature selection method 30 was used to select acoustic cry variables where, within a single iteration of LOOCV, the Boruta algorithm 31 was repeated 100 times, and variables selected by Boruta on greater than 5 of 100 times were retained for subsequent classification.Boruta is a performant 32 algorithm that uses random forest to identify a subset of important predictors for a given outcome.Including feature selection within cross-validation allows for feature stability to be estimated, with 100% stability indicating that a given acoustic feature was always retained during feature selection.The relative contribution of each variable to the final prediction is measured with mean importance estimates.The mean of the predicted probabilities obtained from each model (Brown vs Soundgen and short vs long utterances) were used to generate a combined prediction of NOWS (Figure 1).Binary predictions were generated by finding the point on the receiver operating characteristic (ROC) curve exhibiting the shortest distance to perfect prediction. 33Standard diagnostic accuracy estimates were reported. 34del predictions were examined in 3 ways.First, we computed the accuracy of models comparing neonates treated for NOWS with a control group comprising neonates with mothers receiving MAT who did not develop NOWS and healthy neonates who were not exposed to opioids.
Next, we recomputed the accuracy estimates by comparing neonates who developed NOWS vs neonates with mothers receiving MAT who did not develop NOWS, then neonates who developed NOWS vs neonates with no prenatal illicit substance exposure.In this way, we were able to evaluate whether the results were biased toward either of the groups without NOWS.
In summary, we evaluated predictions of a single stacked model (ie, a meta-learner) consisting of 4 random forest classifiers (Figure 1) trained on outputs from different analyzers (Soundgen vs Brown) and utterances of different lengths (short vs long).Notably, feature selection preceded training of our random forest classifier and involved the use of the Boruta algorithm on a reduced set of acoustic features (devoid of features with near zero variance, high missing data rates, or excessive intercorrelation) (Figure 2).All analyses were performed using R software, version 4 (R Foundation for Statistical Computing). 35Primary analyses used the Boruta, 31 caret, 27 ranger, 36 epiR, 37 and cutpointr 33

Descriptive Data
The

Candidate Variable Selection
The ensemble Boruta algorithms identified 16 variables from the Brown and Soundgen analyzers as well as 8 variables from the short-utterance model and 8 variables from the long-utterance model, for a total 32 variables (Figure 3).The important variables were those with both high stability and high importance.For the Brown analyzer, 10 of the 16 variables had greater than 90% stability and similar importance estimates and were considered important variables.For the Soundgen analyzer, 8 of the 16 variables had greater than 90% stability, 7 of which had similar importance estimates (with the exception of mean amplitude envelope frequency) and were considered important variables.Of the 10 important variables from the Brown analyzer, 3 were related to high pitch (hyperpitch), 2 were related to the first formant frequency, 2 were related to energy, 2 were related to the number of cry utterances, and 1 was related to fricatives.From the Soundgen analyzer, among 7 important variables, 2 were related to number of utterances, 1 was related to the first formant frequency, 1 was related to the second formant frequency, 1 was related to amplitude (energy), 1 was related to spectral entropy, and 1 was related to spectral novelty.

Short-utterance model Stability, %
Mean importance estimates represent the relative contribution of a given variable to the final prediction across iterations of the cross-validation procedure.Variable stability was estimated by examining how frequently a given variable was retained for analysis during the feature selection step; for stability estimates, 100% indicates that a given variable was retained during all iterations of the cross-validation procedure.Both high importance and high stability are desirable.

Sensitivity Analyses
Predictions of the random forest classifiers remained accurate when excluding neonates with no prenatal illicit substance exposure (n = 39) from the ROC curve and confusion matrix (AUC, 0.85

Discussion
The assessment and management of NOWS has been problematic, 16,38 in part because extant assessments rely on the subjective observations of health care professionals. 10,39The findings of this cohort study suggest that acoustic cry analysis has potential as objective measure of opioid withdrawal in the neonate.Cross-validated random forest models trained using newborn cry acoustics were able to generate accurate predictions (AUC, 0.90; accuracy, 0.85; sensitivity, 0.89; specificity, 0.83).These results inform the feasibility of developing a NOWS algorithm based on cry acoustics that could improve the treatment and outcome of these infants.
Acoustic characteristics of the infant's cry have been associated with prenatal opioid exposure 40,41 and other medical conditions, including asphyxia, hyperbilirubinemia, trisomy anomalies, sudden infant death syndrome, 42 prenatal drug exposure, 41 autism, 43 preterm birth, 44 and other conditions, 45,46 but these studies typically examined a limited subset of acoustic characteristics (most notably, pitch).Pitch is our perception of the fundamental frequency caused by vibration of the vocal folds and has received the most attention because it has intuitive appeal and because variation in pitch can be discriminated by listeners.However, as a single parameter associated with numerous conditions, it has limited diagnostic utility when used in isolation.
Moreover, the reliability of listener ratings of pitch is low, and actual variation in fundamental frequency is high in relation to prenatal opioid exposure, even when measured through acoustic The triangle symbol indicates the point that was closest to perfect prediction (ie, values of 1 for sensitivity and 1 for specificity).In the precision-recall curve, precision is equivalent to positive predictive value, and recall is equivalent to sensitivity.AUC indicates area under the curve; ROC, receiver operating characteristic.
analysis. 40,41,47It is more likely that multidimensional aspects of cry acoustics have a role in the characterization of NOWS and other conditions, as suggested by a recent meta-analysis of cry characteristics and neurological dysfunction. 45 the 17 important variables identified in our analysis, 7 were measures of frequencies in the vocal tract, including the fundamental frequency (pitch).Other frequencies included the first and second formants, which are frequency peaks or resonances as the cry sound travels upward through the vocal tract.Changes in the first formant frequency have been found in infants who died of sudden infant death syndrome, 48 preterm infants, 49 infants with hyperbilirubinemia, 50 and infants with prenatal substance exposure. 41,51,52Changes in the second formant frequency have been reported in infants with prenatal substance exposure. 41,52Formant frequencies are determined by the shape and cross-sectional area of the vocal tract affected by cranial nerve activity.
Four of the important variables were the number of utterances (both short and long); these variables would generally correspond to the amount of crying, reflecting autonomic (including vagal) control.Cry utterance findings have been observed in preterm infants 48 and infants with prenatal substance exposure. 41,53Two related measures of amplitude or energy, representing the extent of acoustic information in different ranges in the spectral domain and influencing loudness, are also under autonomic control.These measures have been associated with preterm birth 48,49 and prenatal substance exposure. 41,48,51,53Fricatives are sounds caused by turbulent forced breath due to constriction of the vocal tract, which is affected by cranial nerve innervation of the vocal tract.
Spectral entropy is a measure of the uniformity or quality of the signal.Spectral novelty is the identification of new or unknown data that a machine learning system is not aware of during training. 54Changes in measures of fricatives, spectral entropy, and spectral novelty have not been reported in infants at high risk of developing NOWS.
Our work was made possible by the combination of 2 major advances: state-of-the-art signal processing algorithms to quantify acoustic cry characteristics and machine learning methods to handle large data sets, including the array of acoustical information generated by the cry analyzers, which can model nonlinear associations via the underlying tree-based algorithm.In addition, the use of model stacking allowed us to use multiple analyzers and validate acoustic parameters across various platforms.
The field of NOWS is somewhat disorganized due to inconsistency regarding how NOWS is assessed and managed, lack of a standard treatment protocol, and a movement toward nonpharmacological interventions.However, common to these assessment tools (the FNAST, ESC, or consensus clinical definition) is that they include crying as a major feature, but crying is not measured objectively.Our findings suggest that the cries of infants with NOWS have distinct characteristics, constituting a cry signature that includes 17 variables, most of which cannot be detected by human perception.Acoustic cry analysis reflects the pathophysiological features of withdrawal, with implications for the brain stem, autonomic nervous system, and gut, and could serve as an objective biobehavioral marker of NOWS.

Strengths and Limitations
This study has several strengths and limitations.The study involved the collection of more than 775 hours of audio recordings from neonates.Although we analyzed more than 2.5 hours of cry recordings, the number of neonates was small and confined to a single hospital.For this reason, we combined the 7 neonates exposed to opioids who did not meet criteria for treatment with healthy neonates who were not exposed to opioids into the same control group.Replication of these results in a larger multisite sample will be important.A larger sample would also allow for model probability calibration, thus producing an interpretable probability for the user (vs a binary prediction).The cries of the infants with NOWS that were collected with a recorder placed in the isolette resulted in substantial missing data and labor-intensive efforts to prepare cries for acoustic analysis.Cries of neonates exposed to opioids were recorded continuously, whereas cries of healthy neonates were recorded periodically; future work would benefit from fully standardizing the recording methods.It is also important to acknowledge that this approach was guided by analytic techniques rather than theory.We can only speculate about the mechanisms by which acoustic cry characteristics measure opioid withdrawal, and data-dependent approaches are sensitive to sample idiosyncrasies.

JAMA Network Open | Pediatrics
We used extensive internal validation, and the fact that the Brown and Soundgen analyzers identified similar acoustic cry characteristics is a favorable finding, although external validation will be necessary to address whether results generalize beyond the present sample.
Our findings are based on evaluating whether cry acoustics corresponded to a measure (the FNAST) that has known limitations.However, the fact that cry acoustics and FNAST scores are associated suggests that the FNAST (and, by implication, other approaches such as the ESC or consensus report) has identified crying as an important dimension of neonatal opioid withdrawal but lacks objective measurement.

2 )
an open-source analyzer developed for a broad range of applications, including analysis of human nonverbal vocalization (Soundgen 26 package for R software, version 4 [R Foundation for Statistical Computing]).Using multiple analyzers via model stacking (ie, training models in parallel and combining their outputs) allows for a more comprehensive investigation of acoustic parameters,

Figure 1 .
Figure 1.Model Stacking Diagram packages for R. Statistical tests were 2-tailed, with α = .05set as the significance threshold.Random forest classifiers used default hyperparameters from the ranger package (except for tree count [10 000 trees]).

Figure 2 .
Figure 2. Flowchart of Data Reduction Strategy

Figure 3 .
Figure 3. Acoustic Feature Importance and Stability Plots
The study was approved by the institutional review board of the Women & Infants Hospital of Rhode Island.Written informed consent was obtained; mothers provided consent on behalf of their infants after reviewing the consent form in the presence of trained research staff.Mothers consented prenatally, after delivery, or any time before the neonate would have met criteria for a diagnosis of neonatal abstinence syndrome; consent took place after hospital staff confirmed that mothers were capable of providing informed consent.This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline for cohort studies.