Development and Validation of a Deep Learning Method to Predict Cerebral Palsy From Spontaneous Movements in Infants at High Risk

Key Points Question What is the external validity of a deep learning–based method to predict cerebral palsy (CP) based on infants’ spontaneous movements at 9 to 18 weeks’ corrected age? Findings In this prognostic study of 557 infants with a high risk of perinatal brain injury, a deep learning–based method for early prediction of CP had sensitivity of 71%, specificity of 94%, positive predictive value of 68%, and negative predictive value of 95%. Prognosis of CP based on the deep learning–based method was associated with later functional level and CP subtype in children with CP. Meaning This study’s findings suggest that deep learning–based assessments could support early detection of CP in infants at high risk.

The present study is related to five previously published papers: Støen et al. 1 : This study (Støen et al. 1 ) did not assess a Machine Learning-based CP prediction but was a study of GMA and its predictive accuracy for CP. The present study harnessed video recordings, GMA classifications, and CP outcomes of the infant sample from Norway and United States collected by Støen et al. 1 Adde et al. 2 : This study (Adde et al. 2 ) evaluated a simple statistical method for Conventional Machine Learningbased CP prediction without assessing the external validity. The Machine-Learning method used was entirely different from the method presented in the present study. The present study harnessed video recordings, GMA classifications, and CP outcomes of the infant sample from Norway collected by Adde et al. 2 Pascal et al. 3 : This study (Pascal et al. 3 ) did not assess a Machine Learning-based CP prediction but assessed the prediction of CP using GMA. The present study harnessed video recordings, GMA classifications, and CP outcomes of the infant sample from Belgium collected by Pascal et al. 3 Aker et al. 4 : This study (Aker et al. 4 ) did not assess a Machine Learning-based CP prediction but assessed CP prediction using GMA. The present study harnessed video recordings, GMA classifications, and CP outcomes of the infant sample from India collected by Aker et al. 4 Ihlen et al. 5 : The present study and the study by Ihlen et al. 5 both harnessed the video recordings, GMA classifications, and CP outcomes of the infant sample from Norway and United States collected by Støen et al. 1 but the previous study of Ihlen et al. 5 evaluated a semiautomated Conventional Machine-Learning method for CP prediction, in contrast to the fully automated Deep-Learning method of the present study. The study of Ihlen et al. 5 neither assessed the external validity of the Conventional Machine-Learning method.
In the following section, we describe stepwise how the Deep Learning-based prediction model was developed: Step 1: The skeleton sequence was resampled to 30 Hz and a 5-point temporal median filter was applied to each skeletal coordinate time series. Subsequently, the skeleton sequence was centralized according to the median mid pelvis location and normalized by two times the trunk length of the infant (i.e., median distance from upper chest to mid pelvis).
Step 2: The skeleton sequence was divided into 5 second windows comprising (= 5 • 30 −1 = 150) time steps of (= 19) body keypoints (i.e., joints), each with (= 2) coordinates (i.e., , and , for keypoint at time step ). In each 5 second window, the infant skeletons were rotated spatially for vertical alignment of upper chest and mid pelvis in the first time step.
Step 3: Biomechanical properties, i.e., position , and velocity , of each skeletal keypoint and distance from the neighboring body keypoint , = , − , , were defined for each time step in a 5 second window and used as input variables to the Deep Learning-based CP prediction model.
Step 4: The processing of the input variables was performed by an ensemble of Graph Convolutional Networks (GCNs) (i.e., artificial expert instances) where the overall architecture is illustrated in eFigure 1. The configurations of the input branches, main branch, and pooling layer in eFigure 1 and their general properties were determined by K-Best Search 6 with the search space summarized in eTable 3. The performance of GCN architectures was evaluated by the area under the receiver operating characteristic curve (AUC) on internal validation folds of the dataset. All GCNs were optimized using He initialization 7 , Stochastic Gradient Descent with learning rate of 5 • 10 −4 and Nesterov momentum of 0.9, and batch size of 32 on an NVIDIA Tesla V100 GPU. Five second windows were randomly sampled from the skeleton sequence and data augmentation with scaling (0.7 -1. Step 5: The seven versions of the 10 obtained GCN models in eTable 4 constituted the 70 artificial expert instances. The artificial expert instances were utilized to yield CP predictions according to eFigure 1 on unseen skeleton sequences (14 697 5 second windows) of the test set with 2.5 seconds overlap between each 5 second window. The internal validity was evaluated using 7-fold cross-validation, with sensitivity fixed at the level of GMA (i.e., 70.0%). All values are provided in percentages, along with 95% confidence interval. Abbreviations: TP, true positives; FP, false positives; TN, true negatives; FN, false negatives; PPV, positive predictive value; NPV, negative predictive value.