[Skip to Navigation]
Sign In
December 7, 2009

Formal Production Features of Infant and Toddler DVDs

Author Affiliations

Author Affiliations: Georgetown University, Washington, DC (Ms Goodrich and Dr Calvert); and Otterbein College, Westerville, Ohio (Dr Pempek).

Arch Pediatr Adolesc Med. 2009;163(12):1151-1156. doi:10.1001/archpediatrics.2009.201

Objective  To describe how DVDs designed for very young children are constructed, focusing on the formal production features used to present the program content.

Design  Descriptive study of the concentrations of perceptually salient, nonsalient, and reflective formal features.

Participants  Fifty-nine DVDs designed for children younger than 3 years.

Main Exposure  The presence and absence of specific formal features.

Outcome Measures  Concentrations of reflective (singing, rhyming, camera zooms, and moderate character action), perceptually salient (rapid pacing, fast action, camera cuts, sound effects, character vocalizations, and visual special effects), and nonsalient (low-action sequences, narration, and dialogue by men, women, or children) formal features.

Results  Programs were composed of high concentrations of perceptually salient features, such as rapid pace and camera cuts, which are difficult even for older children to understand. Reflective features, which provide opportunities to rehearse content, were relatively rare. Character action was typically nonsalient. The DVDs used speech only 24% of the time and failed to selectively use speakers, such as choosing a child over an adult for dialogue and narration, which garners slightly older children's visual attention.

Conclusions  Producers who claim that their programs are educational should pay more attention to how they transmit content. Most programs directed at infants and toddlers rely on perceptually salient features like rapid pacing and camera cuts, which may elicit attention and interest but are most likely very difficult for a young audience to understand.

An explosion of DVD products directed at infants and toddlers has taken place with little understanding of whether any kind of meaningful learning can occur in this audience.1 Despite a recommendation from the American Academy of Pediatrics2 that children younger than 2 years should not be exposed to screen media, about half of children younger than 1 year are watching television or DVDs.3 Research of kindergartners demonstrated that comprehension requires children to decode the symbolic system, known as formal production features, used to present content.4 In this study, we analyze the kinds of formal features in DVDs directed at infants and toddlers.

Formal features are audio-visual production features that structure, mark, and represent content.5 Using Berlyne’s6 collative properties of perceptually salient, attention-getting stimuli (eg, movement, contrast, change, incongruity, and novelty), Huston and Wright5 classified some formal features as perceptually salient and others as nonsalient. Perceptually salient features include rapid pace (ie, frequent scene and character changes), fast action, frequent camera cuts, sound effects, character vocalizations, visual special effects, and prominent foreground music. By contrast, nonsalient features include dialogue, narration, slow pace, low action, and background music. In an analysis of formal features in television programs directed at preschool and grade-school children, Huston and colleagues7 extracted a third kind of feature: reflective. Reflective features, which present content in a way that provides children time to think about, review, and process the information, include singing, moderate character action at the speed of a walk, and camera zooms.

Huston and Wright5 proposed the exploration-to-search model in which young children were predicted to attend to content initially based on its association with perceptually salient qualities. With age and experience, habituation to perceptually salient qualities was expected to take place, and attention and interest were predicted to shift to the informative plot elements of a program, particularly dialogue.

Research has documented the attention-getting power of perceptually salient over nonsalient formal features. Audio features, such as sound effects, character vocalizations, and foreground music, were particularly effective at eliciting attention from viewers who were not looking at the screen.4,8 Young viewers are also differentially attentive to characters, preferring to look when children rather than adults are onscreen.9

The links among features, attention, and comprehension revealed fewer developmental differences than predicted by the exploration-to-search model. Children from kindergarten through middle childhood attended to perceptually salient audio techniques, like character vocalizations, which then improved their comprehension of the contiguously presented content.4 Wright and colleagues10 found that fast-paced programs were marginally more attention-worthy compared with slow-paced programs for children of kindergarten or first-grade age but not third- or fourth-grade age. Rapidly paced programs, however, were very difficult for younger children to process, as they had to integrate frequent changes in time, place, and characters to understand the content. Stories had slower paces than magazine programs, the latter containing discrete vignettes that are not organized around an overriding plot.

Research also documented that preschool-aged children could understand content associated with reflective features. For instance, content presented by a camera zoom, which takes children from a whole to a part perspective, was better understood than when the same content was presented by a camera cut, which required children to fill in the gap in visual perspective.11 Singing, which provides a rehearsal mechanism, assists children's verbatim memory of content12 as do rhymes,13 though deeper comprehension of the meaning of the passage is notably absent. Moderate character action, classified as both a salient and a reflective feature, provided a way for children to think about content in a visual, iconic mode that was associated with improved comprehension.4

These early studies took place when programs were not available for infants and toddlers, yet this very young age group is the one that should be most affected by perceptual salience. Indeed, the use of certain production techniques may improve or disrupt early learning from videos. Eye-tracking research suggests that the skills to use cuts to guide visual attention and to integrate information across cuts are not present in 1-year-old infants but increase with age, suggesting that such skills may need to be learned through experience with viewing and/or gained through developmental processes.14 Sound effects can help infants remember content, but adding music to a production disrupts infants' memory, presumably because it overloads their limited cognitive processing system.15

Based on what we know about preschool programs and the qualities that make content attention-worthy and understandable for preschoolers and infants, the current study describes the formal features in DVDs directed at infants and toddlers. We expected a well-designed program for a very young child to consist of the following:

  • Reflective features, such as singing, rhyming, moderate character action, and camera zooms, that assist learning rather than perceptually salient features, like rapid pacing and camera cuts, that are difficult to understand.4,7,10-14

  • Perceptually salient audio features like sound effects and vocalizations to cue attention to dialogue and narration.4,9

  • Fewer low-action than moderate-action sequences, since kindergarteners are less attentive when characters do not move.4

  • More narration and dialogue by children than by women or men, since children are more attentive to on-screen children.4,9

  • More use of reflective and more judicious use of perceptually salient and informative features for DVDs that had educational consultants than those without consultants.



An Internet search was conducted from the fall of 2007 through the spring of 2008 to compile a comprehensive list of DVDs designed for children younger than 3 years, including those in Garrison and Christakis' sample.1 All companies found in our search were included in our sample. Two titles were randomly chosen to represent that brand whenever the company produced a series.

Fifty-nine DVDs (31 brands) were examined (mean length, 38.12 minutes; standard deviation [SD], 15.58 minutes). All were in a magazine format, composed of discrete vignettes that were not connected in any meaningful way. Thirty-one DVDs had a live format, 7 were animated, and 21 had a mixed format. The packaging typically included educational claims (mean, 12.6 per video) (Deborah L. Linebarger, PhD, et al, unpublished data, 2009). Seventy-three percent of the DVD titles, such as Baby Einstein, implied that the product was educational. The age range for videos was 0 to 72 months. Thirty-three videos had specific target ages (mean, 25.15 months; mean, 6.91 months for beginning use; mean, 43.63 months for stopping use); 16 had a targeted minimum age but no maximum age (eg, ≥3 months; mean minimum age, 11.56 months); 4 used only general categories or a combination of general and age categories (eg, babies and toddlers; infancy to age 4); and 4 videos did not specify any age.

Formal features coding system

The formal features coding system created to score television programs designed for older children7 was updated and modified to assess infant- and toddler-directed media. Table 1 provides definitions of all features coded.

Table 1. 
Taxonomy and Definitions of Formal Features
Taxonomy and Definitions of Formal Features

In the original coding system, action focused on the amount of movement on the screen and required the presence of a character to perform an action. Object action (eg, toys) was added owing to the high frequency with which no characters were present in this sample. The levels of character or object action were no characters or objects on screen (nonsalient); stationary characters or objects (nonsalient); stationary vigorous character or object movement, such as waving arms (nonsalient); moderate character or object movement through space at the speed of a walk (reflective); and rapid character or object movement through space, such as running (high salience). Action was scored continuously, prioritizing the highest level of action occurring at any point. Character action took precedence over object action because children understand human actions better than object actions.16

Pace consisted of scene changes and character/object changes. Object changes were added to the infant and toddler coding system. For scene changes, which were measured discretely, each scene was scored as being new or familiar. Character and object changes marked the appearance or exit of characters or objects within a scene. Rapid rates of scene and character changes are considered perceptually salient.

Visual features involved camera-editing techniques, including cuts (perceptually salient), zooms (reflective), and visual special effects (perceptually salient). Auditory features were scored in 3 passes: (1) singing (reflective), rhyming (a new reflective feature added for this sample), and vocalizations (perceptually salient); (2) dialogue (nonsalient); and (3) foreground music (perceptually salient), background music (nonsalient), sound effects (perceptually salient), and narration (nonsalient). Visual and audio features involved both continuous and discrete measures.


DVDs were converted into MPEG (Moving Picture Experts Group) files for use with the coding software, The Observer XT. Each formal feature pass contained several subcategories, which were assigned to corresponding keys on the computer keyboard. When a given key was pressed, The Observer XT recorded which particular subcategory occurred and kept track of either when it occurred (for point events) or for how long (for state events).

Interobserver reliability

There was a primary coder and a secondary reliability coder for each pass. The secondary coder scored 20% of the sample. The κ values were 0.76 for action, 0.80 for pace, 0.75 for visual features, 0.77 for the first auditory pass, 0.87 for the second auditory pass, and 0.80 for the third auditory pass.


Analysis overview

Descriptive statistics (Table 2) and Pearson product moment correlations were calculated to assess concentrations of reflective, perceptually salient, and nonsalient formal features. Analysis of variance and t tests were run to compare differences in mean levels of concentrations of certain features such as pace, action, dialogue, narration, and music. For post hoc analyses, a conservative P value (P < .001) was used to control for type I errors. We compared features in DVDs when educational consultants were present or absent, which yielded no significant differences.

Table 2. 
Frequency of Perceptually Salient, Nonsalient, and Reflective Formal Features
Frequency of Perceptually Salient, Nonsalient, and Reflective Formal Features

Pace and action macro features

Contrary to prediction, the infant and toddler DVDs were rapidly paced. As seen in Table 2, a mean rate of 3.21 scene changes occurred per minute. The rate of character change was 4.42 changes per minute, well above the rate of 2.26 character changes per minute reported in programs designed for older children,10 and the object change rate found in infant DVDs was an additional 1.96 changes per minute. Approximately 78% of the sample had at least 1 scene change per minute, with only 32% falling below the rate of 1.46 scene changes per minute reported for slow-paced magazine format programs designed for a kindergarten and grade-school audience.10 We also compared our infant and toddler DVDs with current educational DVDs and programs designed for preschoolers, including those from the Public Broadcasting Service Ready to Learn initiative. The infant DVDs were more rapidly paced than the preschool educational DVDs or television programs (Maureen Ryan, BA, unpublished data, 2009).

The DVDs used significantly more character action than object action (t58 = 9.12, P < .001). A within-subjects analysis of variance conducted with category of character action (low, moderate, and rapid) as the independent variable and proportion of time as the dependent variable yielded a main effect for category of character action (F2,116 = 394.61, P < .001). As expected, post hoc t tests indicated that the reflective feature of moderate character action, about the speed of a walk, was used to present content more so than perceptually salient rapid action at the speed of a run (t58 = 2.78, P = .007). However, low-action sequences occurred significantly more often than moderate character action (t58 = 20.16, P < .001) (Table 2 and Table 3).

Table 3. 
Means and Confidence Intervals for Significant Action Effects
Means and Confidence Intervals for Significant Action Effects

A within-subjects analysis of variance performed on the categories of object action also yielded a significant main effect (F2,116 = 73.59, P < .001). Low object action sequences occurred significantly more often than moderate action (t58 = 8.65, P < .001), but moderate object action did not occur more than rapid object action (Table 2 and Table 3).

Visual and auditory micro features

The infant and toddler DVDs relied more on perceptually salient visual features than on reflective ones; camera cuts were highly concentrated in the DVDs (mean, 6.06 per minute; range, 0.07-39.45 per minute) (Table 2). Three DVDs had a rate of more than 20 cuts per minute. Visual special effects occurred at a rate of 3.35 per minute. By contrast, camera zooms that provide focus and the opportunity for reflection on the content were notably absent, occurring only 5% of the time on average.

We expected considerable amounts of singing and rhyming to enhance the comprehensibility of the content. Contrary to prediction, 32.2% of the videos had no singing and 60% contained no rhymes. As seen in Table 2, singing occurred on average only 13% of the time. Rhyming was virtually absent, though a highly variable feature. The average frequency of rhymes for the entire sample was 6.54 per video. Among the 24 DVDs that contained rhymes, the number ranged from 1 to 107.

Among nonsalient auditory features, dialogue occurred on average only 9% of the time, and narration occurred on average 15% of the time. Based on prior research, we predicted higher proportions of child dialogue and narration than adult dialogue and narration. That prediction was not supported. More nonsalient background music occurred than perceptually salient foreground music (t85 = −3.28, P = .002).

Correlational analyses

To examine concentrations of formal features, we correlated perceptually salient features, reflective features, and nonsalient features. The presence of reflective features was not associated with other reflective features on the screen. Instead, singing was correlated with a number of perceptually salient features, namely the rates of camera cuts (r = 0.38, P = .003), new scenes (r = 0.29, P = .03), familiar scenes (r = 0.39, P = .002), and character changes (r = 0.46, P < .001) and with nonsalient character action (r = 0.42, P = .001). Rhyming was also correlated with camera cuts (r = 0.26, P = .04) and nonsalient features of adult male dialogue (r = 0.57, P< .001) and adult male narration (r = 0.41, P = .001). Moderate character action was negatively associated with the perceptually salient feature of object changes (r = −0.40, P = .002). Camera zooms were correlated with character changes (r = 0.31, P = .02), nonsalient action (r = 0.26, P = .05), and adult female narration (r = 0.41, P = .001).

Both male and female adult dialogue, nonsalient features, were positively correlated with the perceptually salient feature of camera cuts (r = 0.41, P = .001 with adult male dialogue; and r = 0.38, P = .003 with adult female dialogue), but also with the reflective feature of singing (r = 0.27, P = .04 with adult male dialogue; and r = 0.35, P = .007 with adult female dialogue). Perceptually salient audio features that can alert children to important content were expected to occur with dialogue and narration in infant- and toddler-directed media. That prediction was partially supported. Sound effects were correlated with child (r = 33, P = .01) and adult male (r = 0.41, P = .001) dialogue, and vocalizations were correlated with adult female dialogue (r = 0.65, P < .001). Perceptually salient audio features never highlighted narration. Rapid character movement was associated with camera cuts (r = 0.26, P = .05), sound effects (r = 0.27, P = .04), and character changes (r = 0.40, P = .002). Sound effects were also associated with rapid camera cuts (r = 0.40, P = .002).


The purpose of this study was to describe the concentrations of formal features being used in infant- and toddler-directed DVDs. Based on prior research, we predicted that well-designed infant and toddler videos would use high concentrations of reflective features as well as restricted and selective use of perceptually salient and nonsalient features. Using these guidelines, we found the DVDs to be poorly designed, with little attention paid to previous knowledge about how children process the symbolic formal features used to carry the content.

In prior research about feature constellations in programs designed for kindergarten and grade-school children, programs that used a format with live characters had lower concentrations of perceptually salient formal features than animated programs did.7 Our sample was almost all a live format, yet these DVDs relied heavily on perceptually salient formal features to present content. In particular, frequent camera cuts and rapid pacing, which require children to fill in gaps in time and space, were the norm in these programs. Although prior research documents that children attend during rapid pace10 and camera cuts,9 young children also have difficulty when trying to understand content presented with these features.10,11,14 Anderson,17 who was a key consultant in creating the successful educational preschool television program Blue's Clues, reports that no more than 3 camera cuts were used for a 30-minute episode to make the content comprehensible. By contrast, the infant and toddler DVD sample had an average of 6 cuts per minute. Moreover, the infant DVDs were more rapidly paced than the stimuli used for kindergarten and grade-school children in prior research and current educational preschool television programs and DVDs (M. Ryan, unpublished data, 2009).10

We expected considerable use of singing, rhyming, moderate character action, and long camera zooms because reflective features provide focus and emphasis on content.7 No singing occurred in almost one-third of the videos and 60% had no rhyming. Camera zooms occurred only 5% of the time. Singing and rhyming provide ways for children to rehear and rehearse content, which can make that information memorable.12,13 Similarly, camera zooms move from a whole to a part, providing a focus for what children should look at on the screen, which assists their learning.11 Moderate character action, which tends to be understandable to kindergarteners, did occur more often than rapid action, but not as much as low-action sequences, in which characters virtually stand still and convey little visual information to support the verbal content.4 The failure to use reflective features judiciously translates into programs that are difficult for older children, let alone very young children, to understand.

Important program information is often carried through dialogue (speakers are on screen) and narration (speakers are off screen) that tends to occur in low-action sequences. Our findings indicate a heavy reliance on low-action sequences in infant DVDs, which tend not to garner the attention of slightly older kindergarteners.4 Although infancy is a sensitive period to learn language, speech and narration only occurred about 24% of the time. Moreover, the preference of preschool-aged children to attend more to children than to adult dialogue and narration4 was ignored in DVDs designed for infants and toddlers, with women, men, and children equally likely to speak.

Audio features such as sound effects and character vocalizations that can create attentional orienting responses to important content4,8,15 were correlated with dialogue, but they did not highlight narration. These DVDs also used more nonsalient background music than perceptually salient foreground music. Even so, given infants' problems in processing content presented with music, it is unclear if combining speech with music, which typically took place during background music, could also overwhelm their developing information-processing systems.

It is striking that objects were often the focus of the DVDs. Because infants learn the intended behaviors of a person trying unsuccessfully to put an object together on a video but not the same exact behavior of a machine,16 the use of object action in the infant and toddler sample could potentially impede social and cognitive learning.18

The main limitation of this research involves our sparse knowledge about what very young children understand when they view videos, though the work with infant imitation after exposure to brief experimental stimuli is promising.15,19 Some infant and toddler DVDs are also better made than others. We compared formal features in DVDs that had educational advisors vs those without and found no differences in production practices.

In conclusion, prior research has documented the explosion of DVDs marketed to very young children.1 Our results suggest that these videos rely heavily on the use of perceptually salient production features that may get infants to look but are probably very difficult for them to understand. Moreover, these productions do not take sufficient advantage of reflective features that could potentially help very young children understand content. Based on our knowledge of children's comprehension of television, we find that most DVDs created for very young children are poorly designed.

Correspondence: Sandra L. Calvert, PhD, Georgetown University, Children's Digital Media Center, Department of Psychology, 309 White Gravenor, 37th & O Streets NW, Washington, DC 20057 (calverts@georgetown.edu).

Accepted for Publication: June 18, 2009.

Author Contributions:Study concept and design: Calvert. Acquisition of data: Goodrich, Pempek, and Calvert. Analysis and interpretation of data: Goodrich, Pempek, and Calvert. Drafting of the manuscript: Goodrich, Pempek, and Calvert. Critical revision of the manuscript for important intellectual content: Goodrich, Pempek, and Calvert. Statistical analysis: Goodrich, Pempek, and Calvert. Obtained funding: Calvert. Administrative, technical, and material support: Pempek and Calvert. Study supervision: Pempek and Calvert.

Financial Disclosure: None reported.

Funding/Support: This research was supported by grant 0623871 from the National Science Foundation to Dr Calvert. The contents of this document were also developed in part under a cooperative agreement between the US Department of Education, the Corporation for Public Broadcasting, and the Public Broadcasting System for the Ready to Learn Initiative (PR No. U295A050003).

Disclaimer: This content does not necessarily represent the policy of the Department of Education and should not be assumed to be endorsed by the federal government.

Additional Contributions: Jessica Walsh, BA, Georgia Papatheodorou, Christina Baker, MA, Laura Larson, BA, Marta Perez, Yevdokiya Yermolayeva, BA, and Katrina Pariera, MA, assisted as coders for this research.

Garrison  MMChristakis  DA A Teacher in the Living Room? Educational Media for Babies, Toddlers, and Preschoolers.  Menlo Park, CA The Kaiser Family Foundation2005;
American Academy of Pediatrics, Media education.  Pediatrics 1999;104 (2, pt 1) 341- 343PubMedGoogle ScholarCrossref
Rideout  VHamel  E The Media Family: Electronic Media in the Lives of Infants, Toddlers, Preschoolers and Their Parents.  Menlo Park, CA Kaiser Family Foundation2006;
Calvert  SLHuston  ACWatkins  BAWright  JC The relation between selective attention to television forms and children's comprehension of content.  Child Dev 1982;53 (3) 601- 610Google ScholarCrossref
Huston  ACWright  JC Children's processing of television: the informative functions of formal features. Bryant  JAnderson  DR Children's Understanding of Television Research on Attention and Comprehension. New York, NY Academic Press, Inc1983;35- 68Google Scholar
Berlyne  DE Conflict, Arousal, and Curiosity.  New York, NY McGraw Hill1960;
Huston  ACWright  JCWartella  EA  et al.  Communication more than content: formal features of children's television programs.  J Commun 1981;3132- 48Google ScholarCrossref
Calvert  SLScott  MC Sound effects for children's temporal integration of fast-paced television content.  J Broadcast Electron Media 1989;33 (3) 233- 246Google ScholarCrossref
Schmitt  KLAnderson  DRCollins  PA Form and content: looking at visual features of television.  Dev Psychol 1999;35 (4) 1156- 1167PubMedGoogle ScholarCrossref
Wright  JCHuston  ACRoss  R  et al.  Pace and continuity of television programs: effects on children's attention and comprehension.  Dev Psychol 1984;20 (4) 653- 666Google ScholarCrossref
Salomon  GCohen  D Television formats, mastery of mental skills and the acquisition of knowledge.  J Educ Psychol 1977;69 (5) 612- 619Google ScholarCrossref
Calvert  SL Impact of televised songs on children's and young adults' memory of educational content.  Media Psychol 2001;3 (4) 325- 342Google ScholarCrossref
Johnson  JLHayes  DS Preschool children's retention of rhyming and nonrhyming text: paraphrase and rote recitation measures.  J Appl Dev Psychol 1987;8317- 327Google ScholarCrossref
Kirkorian  HLAnderson  DRKeen  R Looking at Sesame Street: age differences in eye movements during video viewing.  Poster presented at: The Biannual International Conference on Infant Studies March 27-29, 2008 Vancouver, Canada
Barr  RLinebarger  D Effects of sound effects and music on imitation from television during infancy.  Paper presented at: The American Psychological Society May 22-25, 2008 Chicago, IL
Meltzoff  AN Imitation as a mechanism of social cognition: origins of empathy, theory of mind, and the representation of action. Goswami  U Blackwell Handbook of Childhood Cognitive Development. Malden, MA Blackwell2002;6- 25Google Scholar
Anderson  DRKirkorian  H Looking at Sesame Street: age differences in eye movements during video viewing.  Paper presented at: The Children's Digital Media Center Advisory Board Meeting May 29, 2009 Washington, DC
Fenstermacher  SBrey  ESalerno  K  et al.  Educational content in infant-directed media content.  Paper presented at: The International Communication Association May 21-25, 2009 Chicago, IL
Barr  RF Attention and learning from media during infancy and early childhood. Calvert  SLWilson  BJ The Handbook of Children, Media, and Development. Malden, MA Blackwell Publishing2008;143- 165Google Scholar