Developmental screening tests, even those meeting standards for screening test accuracy, produce numerous false-positive results for 15% to 30% of children. This is thought to produce unnecessary referrals for diagnostic testing or special services and increase the cost of screening programs.
To explore whether children who pass screening tests differ in important ways from those who do not and to determine whether children overreferred for testing benefit from the scrutiny of diagnostic testing and treatment planning.
Subjects were a national sample of 512 parents and their children (age range of the children, 7 months to 8 years) who participated in validation studies of various screening tests. Psychological examiners adhering to standardized directions obtained informed consent and administered at least 2 developmental screening measures (the Brigance Screens, the Battelle Developmental Inventory Screening Test, the Denver-II, and the Parents' Evaluations of Developmental Status) and a concurrent battery of diagnostic measures, including tests of intelligence, language, and academic achievement (for children aged 2½ years and older). The performance on diagnostic measures of children who failed screening but were not found to have a disability (false positives) was compared with that of children who passed screening and did not have a disability on diagnostic testing (true negatives).
Children with false-positive scores performed significantly (P<.001) lower on diagnostic measures than did children with true-negative scores. The false-positive group had scores in adaptive behavior, language, intelligence, and academic achievement that were 9 to 14 points lower than the scores of those in the true-negative group. When viewing the likelihood of scoring below the 25th percentile on diagnostic measures, children with false-positive scores had a relative risk of 2.6 in adaptive behavior (95% confidence interval [CI], 1.67-4.21), 3.1 in language skills (95% CI, 1.90-5.20), 6.7 on intelligence tests (95% CI, 3.28-13.50), and 4.9 on academic measures (95% CI, 2.61-9.28). Overall, 151 (70%) of the children with false-positive results scored below the 25th percentile on 1 or more diagnostic measures (the point at which most children have difficulty benefiting from typical classroom instruction) in contrast with 64 (29%) of the children with true-negative scores (odds ratio, 5.6; 95% CI, 3.73-8.49). Children with false-positive scores were also more likely to be nonwhite and to have parents who had not graduated from high school. Performance differences between children with true-negative scores and children with false-positive scores continued to be significant (P<.001) even after adjusting for sociodemographic differences between groups.
Children overreferred for diagnostic testing by developmental screens perform substantially lower than children with true-negative scores on measures of intelligence, language, and academic achievement—the 3 best predictors of school success. These children also carry more psychosocial risk factors, such as limited parental education and minority status. Thus, children with false-positive screening results are an at-risk group for whom diagnostic testing may not be an unnecessary expense but rather a beneficial and needed service that can help focus intervention efforts. Although such testing will not indicate a need for special education placement, it can be useful in identifying children's needs for other programs known to improve language, cognitive, and academic skills, such as Head Start, Title I services, tutoring, private speech-language therapy, and quality day care.
SCREENING TESTS, even those that meet standards for developmental screening test accuracy, produce failing scores for 15% to 30% of children who, on diagnostic testing, are not found to have disabilities.1,2 Such false-positive results are thought to substantially increase the cost of screening.3 Indeed, some researchers4 suggest that screening programs should be discontinued when false-positive rates are high. Because false-positive medical screens have been associated with lingering parental anxiety,5,6 troubling doubts are cast on the viability of universal developmental screening efforts. Concerns about the costs, mistakes, and effects of screening are implicated in the limited use of developmental measures among physicians, in opposition to recommendations for routine use of standardized tools by the American Academy of Pediatrics' Committee on Children with Disabilities.7- 9
Such gloomy conclusions about the value of screening illuminate critical questions for research. Are children overreferred by developmental screening tests actually normal or do they differ in important ways from children who pass screening tests? If so, are diagnostic workups on children with false-positive scores truly unnecessary or can they contribute meaningfully to patient care? These questions are addressed in the present study by reanalyzing existing data from screening test validation studies.
Subjects were a national sample of 512 parents and their children (age range of the children, 7 months to 8 years; mean, 52.7 months; SD, 19.78 months) participating in validation studies of various screening tests.2,10- 15 Subjects were 61% white, 23% African American, and 16% Hispanic or other ethnicity; 53% were male. Parents averaged 13 completed grades of school, and 14% had not graduated from high school. In comparing these characteristics with US population variables, the sample approximated national representativeness for minorities, levels of parental education, and family socioeconomic status.16
Sites included 4 day care centers (104 children) and 4 public school systems, including school-based Head Start and Even Start programs (408 children). Sites were selected to represent the main geographic regions of the United States: north (Plymouth, Mass), central (Denver, Colo, which is within 400 km of the geographic epicenter of the United States), south (Tampa, Fla, and Nashville, Tenn), and west (Carson City, Neb). Within each site, schools and programs were selected if they had a mix of children from various socioeconomic backgrounds, as determined by proportions participating in the federal free and reduced cost lunch program or by the presence or absence of federal day care subsidies.
At each site, psychological examiners or teachers recruited families largely by sending children home with consent forms and study materials. Nine families failed to return consent forms and were excluded from the study. Examiners were graduate-level school psychologists or educational diagnosticians skilled in test administration. Adhering to standardized directions, examiners administered developmental screening measures, ie, the Brigance Screens (n = 408),17 the Battelle Developmental Inventory Screening Test (n = 103),18 the Denver-II (n = 103),19 and/or the Parents' Evaluations of Developmental Status (n = 511).20Table 1 provides a brief description of each tool. All children except 1 were administered at least 2 developmental screening tests. Examiners were blinded to the goals of the study and to the results of 2 of the 4 screens (Parents' Evaluations of Developmental Status and Brigance Screens) because both were undergoing standardization and produced no normative scores.
Examiners also administered and scored in a standardized manner a battery of diagnostic measures. Screening tests and diagnostic measures were administered within 1 week and were given in alternating order. Diagnostic measures included tests of intelligence, language, and academic achievement (for children aged 2½ years and older). Test selection varied across and within studies based on children's ages, and instruments are listed in Table 1.
Criteria drawn from the Individuals With Disabilities Education Act for placement in early childhood and public school special education programs were applied to performance on diagnostic measures. These criteria were used because they are functional, reflect observable difficulties benefiting from age-appropriate instruction, and are linked to receipt of actual services. Table 2 shows the criteria used to determine the presence of disabilities.
The performance on diagnostic measures of children who failed screening but were not found to have a disability (false positives) was compared with that of children who passed screenings and did not have a disability on diagnostic testing (true negatives). t Tests, χ2 tests, analysis of covariance, and relative risk estimates were used to compare differences between groups.
Of the 511 children administered at least 2 screening tests, 216 (42%) had false-positive results (they failed 1 or more of the screens but were not found to have disabilities) and 219 (43%) had true-negative results (they passed all screens and were not found to have disabilities). Of the remaining 76, 44 (9%) had true-positive results (they failed 1 or more screens and had disabilities) and 32 (6%) had false-negative results (they passed all screens but had disabilities). The false-positive and false-negative rates are higher than those found in most screening test studies because they reflect the combined errors across all screens. Table 3 shows the rates for each screen. In comparing only those with 1 or more false-positive scores with those with true-negative scores, children with false-positive scores performed significantly lower on diagnostic measures of adaptive behavior (t428 = 5.56; P<.001), language (t429 = 6.96; P<.001), intelligence (t434 = 9.47; P<.001), and academic achievement (t348 = 7.57; P<.001). The quotients produced for each measure, ie, standard scores with means of 100 and SDs of 15, averaged 9 to 14 points lower for children with false-positive scores than for children with true-negative scores (>7 points difference represents half of an SD and an ecologically significant difference in classroom performance). When viewing the likelihood of scoring below the 25th percentile on diagnostic measures (the cutoff used for placement in remedial reading and math programs under Title I), children with false-positive scores had a relative risk of 2.6 in adaptive behavior (95% confidence interval [CI], 1.67-4.21) (42% vs 19% in the true-negative group), 3.1 in language skills (95% CI, 1.90-5.20) (32% vs 11% in the true-negative group), 6.7 on intelligence tests (95% CI, 3.28-13.50) (26% vs 9% in the true-negative group), and 4.9 on academic measures (95% CI, 2.61-9.28) (35% vs 9% in the true-negative group). Overall, 70% (n = 151) of the children with false-positive results scored below the 25th percentile on 1 or more diagnostic measures, in contrast with 29% (n = 64) of the children with true-negative scores (odds ratio, 5.6; 95% CI, 3.73-8.49). Table 4 shows performance on diagnostic measures across all 4 groups.
Do children with false-positive scores differ in other ways from children with true-negative scores? To test this, children with false-positive scores were compared on sociodemographic variables with children with true-negative scores. Children with false-positive scores were an average of 4 months older than those with true-negative scores (t434 = 2.57; P>.01), were more likely to be nonwhite (χ2 = 22.35; P<.001) (47% vs 25% in the true-negative group), and were more likely to have parents who had not graduated from high school (χ2 = 5.09; P<.02) (16% vs 9% in the true-negative group). Even after adjusting (via analysis of covariance) for sociodemographic differences between groups, performance on adaptive behavior, language, intelligence, and academic achievement of children with false-positive scores continued to be significantly lower than that of children with true-negative scores (F5,334 = 29.41, 22.43, 21.47, and 12.56, respectively; P<.001).
Children overreferred by developmental screens perform substantially lower than children with true-negative scores on measures of intelligence, language, and academic achievement—the 3 best predictors of school success. Overreferred children also carry more psychosocial risk factors, such as limited parental education and minority status.21 Thus, children with false-positive screening results are clearly an at-risk and underperforming group for whom diagnostic testing appears less an unnecessary expense and more a potentially beneficial service to the extent that testing is linked to needed intervention. Although such testing will not indicate a need for special education placement, it can be useful in determining educational objectives, individualizing instruction, and identifying programs known to improve language, cognitive, and academic skills, such as Head Start, Title I services, tutoring, private speech-language therapy, and quality day care. Table 5 provides a case study of the value of diagnostic testing with a child who had false-positive results on screening.
Because of the scarcity of diagnostic resources and the economic constraints of health care and education, it is worth noting that when physicians make referrals for diagnostic testing (eg, to developmental evaluation centers or to public schools), there are usually mechanisms in place that cull children who are likely to have false-positive results. For example, the public schools, under section 504 of the Rehabilitation Act of 1973, must devise needed modifications in the classroom and then view the viability of these on student performance before considering full developmental assessments. Similarly, under the Individuals With Disabilities Education Act part C (services for children from birth to the age of 3 years) and in most developmental evaluation centers, intake workers usually identify and refer likely at-risk but not disabled children to programs that do not require diagnostic eligibility (parenting classes, behavior modification training, quality preschool programs, and Head Start). Part C funding also provides for monitoring before providing diagnostic testing. Such approaches help ensure that precious diagnostic resources are allocated parsimoniously while still enabling at-risk children with false-positive scores on screening tests to receive needed, albeit limited, services. Thus, the knowledge that developmental screens produce many false-positive results (relative to medical screens) need not deter physicians from administering tests and making prompt referrals for children who perform poorly. Even so, high false-positive rates carry costs and in light of economic constraints in health care and education, it behooves professionals to use measures with as few false-positive results (and false-negative results) as possible.3
The results of this study suggest that the lingering anxiety reportedly experienced by many parents of children who receive false-positive scores on screening may be more real than apparent. Given that their children are likely to have lower scores on diagnostic tests and to carry more psychosocial risk factors, parental concern appears to be appropriate rather than problematic.
Limitations in the study were the use of a cross-sectional design. Future research should focus longitudinally on the outcomes of children with false-positive screening results to determine whether their lower scores in adaptive behavior, intelligence, academics, and language predict emerging disabilities or rather continued below average performance. Studies of differences in the intensity of intervention on children with false-positive scores might help elucidate the levels and types of service that ensure optimal outcomes. It would be helpful if screening tests themselves more readily identified potential false-positive results. The Brigance Screens and the Parents' Evaluations of Developmental Status, for example, have optional scoring criteria that can identify children with likely false-positive results and target them not for evaluations but rather to receive information handouts, advice, referrals to noncriterion-based services, and watchful waiting. Research is needed on the ability of other screening tests to perform this helpful function. Finally, replication studies should view the contribution of age differences. Because increased age is associated with increased rates of disabilities, age differences in the false-positive and true-negative groups may have contributed to the findings. On the other hand, a difference of 4 months is small and may have little practical significance.
Given their lower performance in critical developmental areas and the presence of psychosocial risk factors, children with false-positive scores clearly need unique attention from their primary health care giver. When these children are identified, clinicians should make use of diagnostic test results to actively promote optimal development,22 monitor progress, and make needed referrals. To do this, physicians need to become familiar with community resources, such as parenting classes, Head Start, Early Head Start, quality preschool and day care programs, services under the Individuals With Disabilities Education Act, private therapies, tutoring, summer school, literacy interventions, and other helpful services.
Accepted for publication August 25, 2000.
Corresponding author: Frances Page Glascoe, PhD, Department of Pediatrics, College of Medicine, The Pennsylvania State University, 25 Bragg Dr, East Berlin, PA 17316 (e-mail: Frances.P.Glascoe@Vanderbilt.edu).
Glascoe FP. Are Overreferrals on Developmental Screening Tests Really a Problem?. Arch Pediatr Adolesc Med. 2001;155(1):54–59. doi:10.1001/archpedi.155.1.54