Individual Risk Prediction for Sight-Threatening Retinopathy of Prematurity Using Birth Characteristics

This cohort study creates and validates an easy-to-use prediction model using only birth characteristics and describes a continuous hazard function for retinopathy of prematurity treatment.

R etinopathy of prematurity (ROP) is a potentially blinding disease, and screening programs for detecting sightthreatening ROP needing treatment have been established worldwide. 1,2 Infants with lower gestational age (GA) have a higher risk of sight-threatening ROP; in Sweden, the recommendation is to screen infants with GA less than 31 weeks and severely ill infants if older. Data are registered in the Swedish National Registry for Retinopathy of Prematurity (SWEDROP). Between 2008 and 2015, only 5.7% of screened infants in Sweden were treated for ROP. 3 Screening includes retinal examinations by specially trained ophthalmologists and is often stressful for the infant 4 ; without risk prediction, some infants may not be screened and treated at the appropriate time. Individualized risk estimates would allow for optimization of timing and frequency of the screening processes from the health care and economics perspectives. Improving the timing of screening visits could avoid unnecessary examinations of low-risk infants and optimize identification of those at high risk.
Risk and severity of ROP vary by prenatal and postnatal factors, 5 including poor prenatal and postnatal weight gain. For this reason, the prediction algorithm WINROP (weight, insulinlike growth factor 1, neonatal, ROP), which is based on accumulated postnatal weight gain, has been validated and broadly used. [6][7][8][9] Similar tools based on longitudinal postnatal weight gain also have been developed. [10][11][12][13] The objectives of this study were to create, then to internally and externally validate, and to describe the clinical implications of a prediction model for individual momentary and cumulative risks of ROP treatment based on birth characteristics alone, including infants born at GA less than 31 weeks.

Study Population
Infants born between January 1, 2007, and August 7, 2018, at GA less than 31 weeks and with completed ROP screening registered in SWEDROP 14 were included as part of the Swedish Neonatal Quality Register, 15 started in 2007, which has approximately 97% coverage and contains perinatal data, screening outcomes, and treatment information. 3 All data are registered through standardized protocols, in most settings by a trained pediatric ophthalmologist who has performed the screening examination. A validation of 85 randomly selected infants screened in 2018 showed 100% correctly reported values for variables used in this study. This retrospective cohort study was approved by the ethics committee at Uppsala University, Uppsala, Sweden, who also waived written informed consent because all the data were deidentified.

Model Development Group
In total, data for 8784 infants born between January 1, 2007, and October 31, 2017, were retrieved from SWEDROP for the prediction model development. Of those, data for 1372 of 8784 infants (15.6%) were excluded for having GA at least 31 weeks at birth, and 126 of 8784 infants (1.4%) were excluded for missing data. This left 7286 of 8784 infants (82.9%) eligible for the model development group. Of those, 6947 of 7286 infants (95.3%) had GA 24 to 30 weeks ( Figure 1).

Validation Groups
The group used for temporal validation consisted of infants born between November 1, 2017, and August 7, 2018, and registered in SWEDROP. Among infants born at GA 24 to 30 weeks, 308 of 323 (95.4%) were eligible and served as the validation temporal group (Figure 1).
The validation US group included 1485 of 1535 eligible infants (96.7%) born at GA 24 to 30 weeks from 12 US centers between 2005 and 2010 ( Figure 1). 16 The validation European group included 329 of 354 eligible infants (92.9%) born at GA 24 to 30 weeks from Freiburg, Germany, with retrospective screening data collected between 2011 and 2017 ( Figure 1). 17

Study Procedures
The estimation of GA was based on fetal ultrasonographic results. The chronological (postnatal) age, postmenstrual age, and GA are defined according to the American Academy of Pediatrics' issued policy. 18 An SD score (SDS) of expected reference weight (birth weight SDS [BWSDS]) was calculated based on GA, sex, and birth weight for all healthy singletons born at GA at least 24 weeks between 1990 and 1999 in Sweden and registered in the Medical Birth Register (800 000 healthy infants of approximately 1 million born). 19 Hence, BWSDS was not calculated for infants born at GA less than 24 weeks because of a lack of reference for this extremely preterm population. Infants born at GA less than 24 weeks are at high risk of severe ROP requiring treatment, partly owing to a larger proportion of avascular retinal area at birth, 20 and prediction models are not as useful in this cohort. Therefore, a simpler prediction model was developed for this group and is presented along with the results in eAppendix 1 (which references eFigures 9-11 and eTables 7 and 8) in the Supplement. Small for GA 19 was defined as BWSDS less than −2.

Study Outcome
The prediction model was developed to estimate risk for treatment of sight-threatening ROP. The International Classification of Retinopathy of Prematurity 21 and Early Treatment for Retinopathy of Prematurity (ETROP) 22 criteria for treatment were used.

Statistical Analysis
General Methodology Number and percentage are given for categorical variables; for continuous variables, the mean, SD, median, range, and interquartile range are provided, where applicable. For comparison between 2 groups, we used the Fisher exact test for dichotomous variables, Mantel-Haenszel χ 2 trend test for ordered categorical variables, and Mann-Whitney test for continuous variables. The Jonckheere-Terpstra test was applied for identifying trends between ordered categorical and continuous variables. The crude week-specific risk of ROP treatment (number of infants with the event divided by number of infants at risk) was analyzed based on postnatal age and postmenstrual age (GA plus postnatal age) by GA at birth. The modeling process consisted of (1) prediction model development, (2) internal and external validation, and (3) clinical implication. 23 The prediction model for ROP treatment, called DIGIROP-Birth (Digital ROP), was developed using Poisson regression for time-varying data, from which we obtained a continuous hazard function, h(t), describing momentary risk for ROP treatment. 24,25 From the hazard function, the survival function S(t) = e -∫0 t h (u) du and its complement, the cumulative risk function F(t) = 1 − S(t), were estimated. The 95% CI for F(t) was obtained via repeated sampling (1000 samples) of the model parameters from a multivariate normal distribution using a covariance matrix estimated by the Poisson regression models. Parameter estimates, SEs, and hazard ratios (HRs) with 95% CIs are presented. The predictive ability of the continuous cumulative risks was checked and was found to be similarly high after postnatal age 15 weeks (eFigure 1 in the Supplement). Given this information and the knowledge about the studied hazard function, the cumulative risks of ever needing ROP treatment during 20 postnatal weeks were used for interpretation.
All tests above were 2-tailed and conducted at the .05 significance level, with no adjustments for multiple comparisons. All analyses were performed using SAS statistical software, version 9.4 (SAS Institute Inc).

DIGIROP-Birth Prediction Model for GA 24 to 30 Weeks Development and Validation
Based on the crude risks for ROP treatment over time stratified by GA, we found that postnatal age was the most appropriate time axis. The final model for GA 24 to 30 weeks included the following: piecewise linear current postnatal age (break points, 8 and 12 weeks), piecewise linear continuous GA given in weeks and days (break point, 27 weeks), sex, piecewise linear BWSDS (break point, −1 SDS), postnatal age × piecewise linear GA interaction, sex × GA interaction, and postnatal age × piecewise linear BWSDS interaction. The break points for the variables were selected based on graphical review of univariable hazard functions. The final model was built by gradually expanding the models, starting only with postnatal age and further keeping interactions with P < .10.  Internal, temporal, and geographical external validations were performed. The model fit and adaptation were described by the area under the receiver operating characteristic curve (hereinafter referred to as AUC) overall, by calendar periods, and by race/ethnicity. We performed cross-validation and evaluated calibration plots; calculated sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV); and compared DIGIROP-Birth with 4 other published prediction models (CHOP-ROP [Children's Hospital of Philadelphia-ROP], 11 OMA-ROP [Omaha-ROP], 12 WINROP [weight, insulinlike growth factor 1, neonatal, ROP], 7 and CO-ROP [Colorado-ROP] 13 ) using GA, birth weight, and different weight gain variables in the algorithms, as described in more detail in eAppendix 2 in the Supplement.

Study Population
Birth characteristics for the whole SWEDROP cohort, model development group, and validation temporal group, as well as by maximum ROP stage, are listed in eTable 1 in the Supplement. Among 7609 patients, 4155 (54.6%) were boys, the mean (SD) GA was 28.1 (2.1) weeks, and the mean (SD) birth weight was 1119 [353] g. Of those born at GA at least 24 weeks, 1510 of 7255 (20.8%) were small for GA. In total, 354 of 7609 (4.7%) were born at GA less than 24 weeks, and 2806 of 7609 (36.9%) were born at GA 24 to less than 28 weeks. Birth characteristics were numerically balanced between the model development group and the validation temporal group. Birth characteristics for the validation US group and the validation European group are listed in eTable 2 in the Supplement. Momentary Individual Risk of ROP Treatment for GA Less Than 31 Weeks Figure 2A and B show crude week-specific risk of ROP treatment for the SWEDROP population. Table 1 lists the observed timing for ROP treatment applying postnatal age and postmenstrual age as time axes. The ROP treatment risk peaked at postnatal week 12 regardless of GA at birth, but no specific pattern by GA was seen for postmenstrual age.

ROP Treatment Incidence in Screened Infants
From the Poisson regression model based on the total SWEDROP population, including postnatal age and adjusting for GA, the risk for ROP treatment increased by 54% (HR, 1.54; 95% CI, 1.39-1.70) per week from postnatal weeks 8 through 12. Afterward, it decreased by 30% (HR, 0.70; 95% CI, 0.67-0.74) per week ( Figure 2C and eTable 4 in the Supplement).
Cumulative Individual Risk of ROP Treatment for GA 24 to 30 Weeks Table 2 summarizes the final DIGIROP-Birth model for ROP treatment in infants born at GA 24 to 30 weeks. The estimated cumulative risks were 60.0% and 35.1%, respectively, for a girl with BWSDS −3 and 0 born at GA 24 weeks and were 27.8% and 14.2%, respectively, if she was born at GA 25 weeks (Figure 3 and eFigure 2 and eTable 5 in the Supplement). Corresponding figures for a boy with the same background data were 57.7% and 33.4%, respectively, and 32.5% and 16.9%, respectively. Greater decreasing risk was observed for girls than for boys with increas-

Clinical Practice Implications of DIGIROP-Birth for GA 24 to 30 Weeks
Based on ROP treatment timing in the SWEDROP cohort (2007-2018) and the DIGIROP-Birth model, we compared the results with the current US recommendations, based on studies that are more than 20 years old, 2 for postnatal age and postmenstrual age at initial examination ( Table 1). The maximum age for estimated risk less than 0.001 corresponds well to the observed minimum age for ROP treatment, except for GA 24 weeks, for which a somewhat higher risk at a younger age was estimated. Recommending that the initial examination should start 1 week before the earliest observed ROP treatment per GA week in our cohort would potentially have avoided 14 867 of 135 061 visits (11.0%), assuming 1 visit per week. For GA of at least 27 weeks, with a ROP treatment incidence of 33 of 5398 (0.6%), the difference between the US recommendations and this study resulted in 14 066 of 93 052 examinations (15.1%) potentially being avoided.

Discussion
We have created and validated the DIGIROP-Birth prediction model, available free of charge online 26 based on 6947 infants born at GA 24 to 30 weeks, estimating the individual momentary and cumulative risks for ROP treatment. The model using only available data at birth but more advanced statistical meth-  ods was at least as accurate as 4 of the ROP prediction models now in use based on longitudinal weight measurements, which are not always readily available to ophthalmologists.
Surprisingly, the momentary risk of ROP treatment peaked at 12 weeks' postnatal age regardless of GA at birth, while no specific pattern was observed for postmenstrual age. This observation is particularly interesting because the ETROP study 27 found that the progression of prethreshold ROP was highly associated with postmenstrual age, similar to the finding in the CRYO-ROP (Cryotherapy for ROP) study 28 15 years earlier. However, it should be emphasized that infants included in the CRYO-ROP study were born at higher GA, and no GA-specific hazard functions were studied for ROP outcome. Other Swedish studies 29,30 have reported that lower GA at birth is associated with lower GA at treatment, but the momentary risk in relation to postnatal age and postmenstrual age was not analyzed. Recently, in a large North American cohort, the timing of ROP treatment was presented only in relation to postmenstrual age and not postnatal age. 31 The identification of a peak risk at 12 postnatal weeks in infants with GA less than 31 weeks might be clinically useful because it was recently shown that inadequate screening or treatment was identified in 11 of 17 cases with blindness from ROP (64.7%). 32 Hence, clinicians and parents could be alerted during this period to ensure that timely screening occurs to reduce the risk of blindness.
National patient registries are valuable sources for estimation of treatment risks. Herein, the DIGIROP-Birth model was compared with a validation US group and a validation European group and showed high predictive ability and generalizability both for individuals with the same and with different reported race/ethnicity.
The ROP prediction models may also be used to reduce screening frequency in infants at low risk. The latest US policy statement for ROP screening 2 was issued in 2018. The recommendations for the timing of the first examination were based on the CRYO-ROP study 28 published in 1991 and the LIGHT-ROP (Light Reduction in ROP) study 33 published in 1998. In those periods, fewer extremely preterm infants survived, more mature infants were treated, and treatment criteria were different from those used today. Based on the results of our study, if the initial examination was performed 1 week before the earliest observed postnatal age at ROP treatment, 14 867 of 135 061 stressful early examinations (11.0%) could be avoided (assuming 1 examination per week) compared with US recommendations. 2 For GA of at least 27 weeks, with a ROP treatment incidence herein less than 1%, 14 066 of 93 052 examinations (15.1%) could have been avoided while capturing all cases of ROP treatment (100% sensitivity). Notably, reaching 100% sensitivity in such models of real-life, large data sets is accompanied by low specificity. Based on approximately as large a cohort as in our study, the updated CHOP-ROP 11 model, which uses longitudinal weight data and birth data, achieved 11.2% specificity for 100% sensitivity and 36.4% specificity for 98.5% sensitivity; DIGIROP-Birth (using only readily obtained birth data) showed 19.0% specificity for 100% sensitivity and 53.8% specificity for 99.0% sensitivity.

Strengths and Limitations
The strengths of our study include the unique and complete cohort of preterm infants born in Sweden between January 2007 and August 2018. Also, our statistical model includes 3 basic measurements (GA, sex, and birth weight). The postnatal age for ROP treatment or censoring (discontinued followup) is included in the hazard function estimation but is not required as an input variable. Hence, the input data are simple, facilitating their general use, even though the method is more advanced, taking into account the underlying hazard function and the important interactions that contribute to adjustment of heterogeneity, which is novel in ROP research. The DIGIROP-Birth has shown strong predictive ability in internal, temporal, and geographical external validations. If found not acceptable in future validations among a population, a subgroup-specific model designed for optimal predictions in that population might be developed using our methods. Finally, DIGIROP-Birth has been shown to be equal to or better than 4 other ROP prediction models and is accessible online. 26 Our study has some limitations. One limitation is the use of registry retrospective data. However, the registry showed high coverage and successful validation of data for 85 randomly selected infants screened in 2018. In addition, infants born at GA less than 24 weeks could not be included in the prediction model because of the lack of a reference algorithm for birth weight, preventing BWSDS calculations. Given the small sample size, only a simple model could be developed for these infants, resulting in low predictive ability. Close monitoring of such infants is mandatory irrespective of calculated risk, making prediction models less important for this group.

Conclusions
We created and validated the DIGIROP-Birth model, an individualized early prediction model for infants with GA 24 to 30 weeks, which estimates momentary and cumulative risks for receiving ROP treatment based on simple birth characteristics. A surprising finding was that postnatal age was the best predictive variable for the temporal risk of ROP treatment. The DIGIROP-Birth model is an accessible online application that appears to be generalizable and to have at least as good test statistics as other models that require longitudinal neonatal data, which are not always readily available to ophthalmologists.