The Use of Postnatal Weight Gain Algorithms to Predict Severe or Type 1 Retinopathy of Prematurity

Key Points Question Do postnatal weight gain–based algorithms have the potential to identify infants with type 1 retinopathy of prematurity (ROP) or severe ROP? Findings This systematic review and meta-analysis that included 61 studies (>37 000 infants) found that weight gain–based algorithms have adequate sensitivity, ranging from 0.89 to 1.00, and negative likelihood ratios (<0.2). However, specificity and positive likelihood ratios were inadequate. Meaning This study suggests that weight gain–based algorithms have adequate sensitivity and negative likelihood ratios and provide reasonable certainty that type 1 ROP or severe ROP is unlikely to develop (ie, the algorithm is useful for ruling out the disease).


Introduction
Retinopathy of prematurity (ROP) is a disease of pathologic neovascularization affecting preterm infants. In early postnatal life, hyperoxia leads to suppression of vascular growth factors (phase 1).
Subsequently, as retinal hypoxia sets in, there is an upsurge of vascular growth factors leading to unregulated vasoproliferation (phase 2). 1 ROP either regresses spontaneously or continues to advance and can progress to cause retinal detachment and blindness if not detected and treated early. 2 Infants with lower gestation and lower birth weight have a higher risk of developing ROP.
Currently, at-risk infants are screened using repeated eye examinations (binocular indirect ophthalmoscopy [BIO]) starting at approximately 30 to 32 weeks' postmenstrual age and continuing until the retinal vasculature is fully mature (approximately 40 weeks' postmenstrual age). 3 However, only fewer than 10% of screened infants need treatment for ROP. 2 Animal experiments have demonstrated the importance of nutrition and insulinlike growth factor 1 (IGF-1) in the retinal vascular development. Insufficient activation of endothelial growth factor by IGF-1 can alter the development of the retinal vasculature. 4 In the clinical setting, low postnatal weight gain is considered as a surrogate marker for slower-than-expected increases in serum IGF-1 levels. 5 Based on this hypothesis, risk prediction models such as WINROP (Weight, IGF-1, Neonatal Retinopathy of Prematurity), 6 G-ROP (Postnatal Growth and Retinopathy of Prematurity), 7 PINT (Premature Infants in Need of Transfusion) ROP, 8 CHOP (Children's Hospital of Philadelphia) ROP, 9 ROPScore, 10 and CO-ROP (Colorado Retinopathy of Prematurity), 11 have been evaluated to see whether they can predict the development of significant ROP. These models have the potential advantage of reducing the number of BIO examinations. However, to our knowledge, these models have not been implemented in clinical practice because of their limited generalizability. 3 The rationale for this systematic review was to assess whether postnatal weight gain-based algorithms have the potential to predict the development of type 1 or severe ROP. This review gains further importance given the current COVID-19 pandemic, which has led to the curtailment of health services, including ROP screening programs, due to the limited availability of ophthalmologists and mobile screening teams. 12 The aim of this systematic review was to synthesize evidence by pooling the diagnostic accuracy indices for postnatal weight gain-based algorithms for predicting type 1 or severe ROP in preterm infants.

Design and Registration
Studies (PRISMA-DTA) guidelines. 14 The study protocol was registered in PROSPERO (CRD42020172874). 15

Search Strategy
A systematic search of the PubMed, MEDLINE, Embase, and Cochrane Library databases was performed to identify studies published between January 2000 and August 2021. PubMed was searched using the standard terminology (eMethods in the Supplement). Similar terminology was used while searching other databases. We also searched the Cochrane Library, ClinicalTrials.gov, grey literature (on OpenGrey, Google Scholar, and MedNar). No language restrictions were applied. The reference lists of all publications were searched manually for additional studies.

Data Extraction
Two reviewers (S.A. and S.D.) independently collected data from the included studies. Study data were further verified by one of the reviewers (S.R.) who had not collected the study data earlier. A total of 61 studies were included for the final meta-analysis. The following data were collected from each study: year(s) the study was conducted, country, gestational age at birth, birth weight, weekly weight gain, total sample size, follow-up rates, tool used to diagnose ROP (indirect ophthalmoscopy or wide-field digital retinal imaging), prospective or retrospective design, high-income or low-to middle-income country (World Bank list of economies), 16 and diagnostic indices (true positives, false positives, true negatives, and false negatives). The process of study selection is shown in the study flow diagram (eFigure 1 in the Supplement).

Criteria for Considering Studies for This Review
The outcomes examined were the sensitivity and specificity of postnatal weight gain-based algorithms to predict type 1 or severe ROP. 2 Studies that used type 1 ROP or severe ROP as the target disease were included.
Both prospective and retrospective studies that met the following criteria were included: (1) retinopathy screening for preterm infants and (2) the ability of weight gain-based screening algorithms to predict type 1 or severe ROP. The following standard definitions were accepted for type 1 ROP 2 : zone I (any stage ROP with plus disease), zone I (stage 3 with or without plus disease), or zone II (stage 2 or 3 ROP with plus disease).
Severe ROP was defined as any ROP in zone I, stage 2 ROP in zone II with plus disease, or any stage 3 ROP. The studies included described the diagnostic ability of weight gain-based algorithms to predict type 1 or severe ROP in comparison with the findings by ophthalmologists either by BIO or by wide-field digital retinal imaging, which were considered the reference standards. 3 Two reviewers (S.A. and S.D.) independently decided on the eligibility of studies for inclusion in the systematic review. Differences in opinion were resolved by discussion among all reviewers.

Quality Assessment
The quality assessment of included studies consisted of the following 4 domains according to the revised version of the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2): patient selection, index test, reference standard, and flow and timing. 17 In this review, index test refers to the "alarm" in any of the postnatal weight gain-based algorithms for type 1 or severe ROP. The included studies were assessed for risk of bias in each domain and for applicability concerns in the first 3 domains. characteristic (ROC) curve was generated to display the results of individual studies. The following cutoffs were used for interpretation of the area under the ROC values: low (0.5-0.7), moderate (0.71-0.9), or high (>0.9) accuracy. 19 The DOR is the ratio of the odds of positivity in the disease relative to the odds of positivity in those without the disease. The value of the DOR ranges from 0 to infinity, with higher values indicating better discriminatory test performance. A likelihood ratio of approximately 1 means the test result neither rules in nor rules out the condition. A PLR above 1 indicates increased evidence of disease; the higher from 1, the more chance of disease. An NLR below 0.1 is very strong evidence to rule out a disease. 20

Additional Analysis
There is uneven practice 21 with regard to oxygen saturation targeting, blood transfusion thresholds, and higher incidence of fetal growth restriction in low-and middle-income countries; these factors could have an impact on the performance of these algorithms. Hence, sensitivity analyses were conducted separately for high-income and low-and middle-income countries when possible.

Selected Studies
The electronic database search yielded 779 titles and abstracts and 3 additional studies by exploring additional sources. After removal of duplicates, 160 articles were screened based on the title and abstract, and 85 articles were selected. The full text was read for eligibility, of which 61 were included in the final analysis. The 61 studies (>37 000 infants) included WINROP (n = 36), 6,22-55 G-ROP (n = 9), 7,42,56-61 PINT ROP (n = 1), 8 CHOP ROP (n = 6), 9,30,55,[62][63][64] ROPScore (n = 5), 10,30,55,65,66 and CO-ROP (n = 4). 11, [67][68][69] Studies that used a different cutoff score for ROPScore were not included in the analysis. 64,70 The general characteristics of the studies included in the systematic review are reported in eTables 1 to 6 in the Supplement. The methodological quality and applicability of the included studies were assessed according to the QUADAS-2 guidelines. 17 All 61 studies evaluated one of the weight gain-based algorithms, and there was minimal risk of bias. However, the reference standards used in the studies were either BIO or wide-field digital retinal imaging.
For low-to middle-income countries, the pooled estimate from 12 studies (n = 2957) for the sensitivity of the WINROP algorithm was 0.85 (95% CI, 0.78-0.90), and the pooled estimate for the
The pooled DOR of the G-ROP algorithm to predict type 1 or severe ROP was 3523 (95% CI, 4-3 155 457). The summary PLR and NLR for index test fell in the left lower quadrant of the likelihood matrix ( Figure 3B).

Discussion
The incidence of blindness after ROP is increasing globally. Early identification and treatment of ROP are of paramount importance to prevent irreversible blindness. Current screening protocols that require frequent retinal examinations place an enormous workload on the health care system. The current COVID-19 pandemic has further strained the health care system owing to staff shortages, bed shortages, and limited availability of transport. Also, there is a risk of COVID-19 transmission to the preterm infants undergoing screening for ROP due to close contact with screening staff. The Royal  Ophthalmologist College has issued a statement in this regard to overcome the challenges by rationalizing the ROP screening criteria by using evidence-based weight gain-based algorithms. 12 Weight gain-based screening models, such as WINROP, G-ROP, PINT ROP, CHOP ROP, ROPScore, and CO-ROP, [6][7][8][9][10][11] have been evaluated in various studies to identify infants who should be referred for retinal examination. However, the 2018 policy statement from the American Academy of Pediatrics stated that the uses of such weight gain-based algorithms alone are not justified based on the current literature. 3 Our systematic review increases the evidence base in this area by including more than 37 000 preterm infants from 61 studies in diverse settings across the world.
The general interpretation of the results of diagnostic accuracy studies are as follows: PLRs of more than 10 or NLRs less than 0.1 generate large and often conclusive changes in the posttest probability, PLRs from more than 5 to 10 or NLRs from 0.1 to 0.2 generate moderate shifts in posttest probability, PLRs from more than 2 to 5 or NLRs from more than 0.2 to 0.5 generate small changes in posttest probability, and PLRs from more than 1 to 2 or NLRs from more than 0.5 to 1 alter posttest probability to a very small degree. 71,72 High sensitivity corresponds to a high negative predictive value and a low NLR and is the ideal property of a "rule-out" test. 73,74 Because all of the algorithms that we evaluated had an NLR of less than 0.2 and a high sensitivity of 0.89 to 1.00, they are useful to rule out type 1 ROP if the algorithm had "no alarm." Given that the sensitivity of G-ROP were better than WINROP and the sample size was larger, G-ROP may be more suitable as a rule-out test compared with WINROP. However, G-ROP is not widely evaluated outside high-income countries. High specificity corresponds to a high positive predictive value and a high PLR and is the ideal property of a "rule-in" test. 73,74 Because all of the algorithms had a very low specificity and a very low PLR of 1.5 to 2.5, they may not be suitable for ruling in the disease.
Although the overall sensitivity (0.89) and NLR (0.19) for WINROP were reasonably adequate, they were significantly lower than the validation studies that were performed in the early 2010. The sensitivity was 1.00 in a Swedish cohort 6 of 353 infants and a Boston cohort 22 of 318 infants. A multicenter study in the US and Canada showed that the sensitivity of the WINROP algorithm was 0.986. 24 The sensitivity values were lower in studies conducted in low-to middle-income countries.
The publication by Lundgren et al 35 in 2018 showed an associaton between the decrease in sensitivity and the change in oxygen target ranges that occurred in 2010. This could have been the explanation for the low sensitivity in low-to middle-income countries, where there is uneven practice with regard to oxygen targeting in preterm infants. [44][45][46][47][48][49][50][51][52][53][54][55] The G-ROP model was originally developed from a retrospective cohort of 7483 infants from 29 North American centers that showed a sensitivity of 1.00 to identify type 1 ROP. 7 This finding was further validated in a prospective cohort of 3981 infants in North America with a similar sensitivity of 1.00. 57 When validated outside the North American cohort, the sensitivity was 1.00 in a small cohort of infants from Japan, 56  There are a significant number of infants in low-and middle-income countries who develop type 1 ROP and who are more mature (>28 weeks) and heavier (>1050 g). 75 Hence, validating the G-ROP algorithm in these groups of infants will give additional information about the performance of the

Strengths and Limitations
This study has some strengths. To our knowledge, this is the first systematic review including a metaanalysis to evaluate the diagnostic accuracy of postnatal weight gain-based algorithms with a sample size of more than 37 000 infants. The study sample included infants from both high-income and lowto middle-income countries, and additional analyses were performed in these subgroups.
Furthermore, a standardized tool (ie, QUADAS-2) was used for the quality assessment. In addition, this systematic review and meta-analysis followed the recent PRISMA-DTA guidelines for transparent reporting.
This study also has some limitations. The weight gain-based algorithms rely strongly on accurate measurement of weight, which is very challenging in a neonatal intensive care setting and could potentially affect the performance of the algorithms. Apart from the index test, there were multiple heterogeneities present across the studies. These heterogeneities were partly explained by factors such as geographical background, workflow, and the choice of a reference standard, which in itself is known to have interobserver variability.

Conclusions
This systematic review and meta-analysis suggests that weight gain-based algorithms have adequate sensitivity and negative likelihood ratios to provide reasonable certainty in ruling out type 1 or severe ROP. Given the implications of missing even a single case of severe ROP, algorithms with very high sensitivity (close to 100%) and low negative likelihood ratio (close to zero) need to be chosen. These algorithms have the potential to reduce the number of unnecessary examinations for infants at lower risk of severe ROP. Future studies should endeavor to incorporate additional clinical parameters (eg, oxygen use and sepsis), which could potentially improve the diagnostic indices of these algorithms.