Figure 1. Screening form.
Figure 2. Linear regression graphs. A, Graph demonstrating fundamental frequency in boys. Notice significant changes at ages 12 and 16 years. B, Graph demonstrating fundamental frequency in girls. Notice significant changes at ages 11 and 14 years.
Maturo S, Hill C, Bunting G, Ballif C, Maurer R, Hartnick C. Establishment of a Normative Pediatric Acoustic Database. Arch Otolaryngol Head Neck Surg. 2012;138(10):956-961. doi:10.1001/2013.jamaoto.104
Author Affiliations: Department of Otolaryngology, San Antonio Military Medical Center, San Antonio, Texas (Dr Maturo); Uniformed Services University of the Health Sciences, Bethesda, Maryland (Dr Maturo); Department of Otology and Laryngology (Drs Hill and Hartnick) and Voice and Speech Laboratory (Mr Bunting and Ms Ballif), Massachusetts Eye and Ear Infirmary, Harvard Medical School, Boston; and Center for the Clinical Investigation, Brigham and Women's Hospital, and Harvard Catalyst, The Harvard Clinical and Translational Science Center (Ms Maurer), Boston.
Objectives To establish a normative pediatric acoustic database and to analyze the acoustic characteristics of the age groups studied.
Design Prospective gathering of acoustic parameters on healthy children aged 4 to 18 years.
Setting An outpatient pediatric otolaryngology clinic.
Patients A total of 335 children (165 girls and 170 boys) were evaluated.
Main Outcome Measures Normative values were obtained for the acoustic parameters studied.
Results Discrete fundamental frequency changes occurred at ages 11 and 14 years in girls and ages 12 and 16 years in boys. Values for jitter percentage, shimmer percentage, and noise to harmonic ratio fell within the normative thresholds of adult values.
Conclusions This is the largest pediatric computerized voice analysis database in the English language. This database has been designed to develop an age- and sex-based growth chart to track the developing pediatric voice as it changes with maturation. A distinct vocal profile of girls and boys is evident, with key changes noted at critical periods of development and with significant differences among fundamental frequency between and within sexes. A comprehensive database can be used to help aid future voice therapy and phonosurgical strategies and provide the foundation for future studies into the development of the pediatric voice as it matures into adulthood.
Speech and voice development in children has a significant impact on social and educational maturation. The inability to communicate affects the psychological and emotional well-being of children and their families.1,2 The prevalence of voice disorders among children ranges between 3% and 10%.3,4 Studies have shown a higher incidence in boys (7.5%) over girls (4.6%).4,5
Knowledge of the anatomic development of the pediatric vocal structures and how these changes in anatomy affect the acoustic and aerodynamic qualities is critical to the relatively new discipline of pediatric laryngology. Although it has been established that the vocal folds lengthen with age and that the ratio of the cartilaginous to membranous vocal fold changes with maturation, it remains unclear how much, if at all, the change in the microstructure of the vocal fold lamina propria is reflected by the different acoustic and aerodynamic measurements.6 Many of the theories of vocal mechanics have been transferred from adult studies with little focused research on the developmental changes occurring in the first 2 decades of life.
The tools for diagnosis and treatment of childhood voice disorders include self-reported and proxy quality-of-life questionnaires, clinician's auditory perception, indirect or direct laryngoscopy, and computer voice analysis. Subjective perceptory evaluation makes comparison difficult, while laryngoscopy in children can be a challenge. There is good literature on normative voice data on adults, but this is not transferrable to a younger population.7 Objective computer voice analysis is easily obtainable, yet comprehensive normative childhood values are lacking.
The goal of this research was to establish a normative voice database of primary English-speaking children between the ages of 4 and 18 years. By establishing a comprehensive normative database, we believe that this will provide an objective means to identify and optimally treat childhood voice disorders. In addition, we hope to provide some insight into when childhood acoustic and aerodynamic voice properties change. Finally, from identifying the age ranges when these changes occur, we would hope to provide a foundation for future anatomic laryngeal models that may help correlate changes in the acoustic and aerodynamic vocal properties with developmental anatomic maturation.
This study was approved by the institutional review board of the Massachusetts Eye and Ear Infirmary. Patients between the ages of 4 and 18 years were recruited from an outpatient pediatric otolaryngology clinic. A “yes” answer to any question on the screening form (Figure 1) excluded the child from the study. Children also had to have normal hearing as reported by parent or guardian; if there was concern for hearing abnormalities, audiograms were evaluated. Informed consent was obtained. Informed consent was obtained in children older than 12 years. Each parent completed a Pediatric Voice-Related Quality-of-Life (PVRQOL) questionnaire. The PVRQOL is a validated instrument that can be used to identify children with abnormal voices.8,9
Voice recordings were made in a quiet room using a Dell (Dell Inc) Optiplex 960 personal computer (Microsoft Windows XP Professional Version 2002) with an Intel Core Duo 2 CPU (3.1 GHz, 1.94 GB of RAM [random access memory]). Background noise levels were not obtained, yet the room was soundproofed. Children were fitted with a headset-mounted microphone that was placed 3 cm from the right oral commissure at approximately a 45° angle. The Multi-Dimensional Voice Program (MDVP) Model 5105 software option for the Computerized Speech Laboratory Model 4500 (KayPENTAX) was used to analyze the more widely used clinical variables: fundamental frequency, jitter percentage, shimmer percentage, and noise to harmonic ratio. Subjects were asked to sustain the phrase “ah” at a comfortable pitch and volume using a normal speaking voice for over 4 seconds. After 3 rounds of practice, the fourth production of “ah” was recorded. The recording began approximately 1 second after the child started, and the recording was stopped prior to the end of phonation. Thus, the middle 3.5 seconds of the voiced segment was selected. The reason for this is that the beginning and final segments of voice samples are the most unstable because of the aerodynamic and muscular parameters present during vocal onset and offset.10
The Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) rating form was used by a senior speech pathologist to evaluate randomly selected MDVP recordings to ensure that normal voices were being tested. The CAPE-V form is a tool developed by the American Speech-Language Hearing Association for auditory perceptual assessment of voice. Its primary purpose is to describe the severity of auditory-perceptual attributes of a voice problem in a way that can be communicated among clinicians.
Descriptive statistics were performed by sex and age group. Selected data were plotted using bar graphs for visual inspections. Group comparisons were performed using either t tests or Wilcoxon Rank Sum tests. A simple linear regression graph of fundamental frequency was created for each sex, but these graphs were not accurate fits over the entire age range. A nonparametric smooth to the data (locally weighted scatterplot smoothing) was applied to determine where the breaks manifested. On the basis of the comparisons of the model standard error and visual fit, the best fit piecewise regression model was created. All statistical analyses were conducted using SAS v9.2 statistical software (SAS Institute Inc).
A total of 335 children (165 girls and 170 boys) were evaluated. Each child had a normal voice, as reported by the child and parent or guardian. The most common presentations to the clinic were for evaluation of adenotonsillar hypertrophy (n = 96 [29%]), ear infections (n = 32 [10%]), and nasal obstruction/adenoid hypertrophy (n = 26 [8%]).
Thirty-one children (9%) were siblings of patients being evaluated in the clinic. Ten percent (n = 34 [20 boys and 14 girls]) of voice recordings were randomly analyzed with the CAPE-V form and found to be normal or with only minimal irregularities consistent with normal voices. Results for the selected measurements and the PVRQOL survey are depicted in the Table.
From the piecewise linear regressions, we found statistically significant changes in fundamental frequency occurring at ages 12 and 16 years in boys (P < .001) and at ages 11 and 14 years in girls (P < .001) (Figure 2). When comparing differences between sexes, a significant (P < .005) difference was seen in mean fundamental frequency at age 11 between boys and girls. The overall mean jitter percentage for girls (1.24%) was statistically significantly lower (P < .004) than for boys (1.51%). A similar finding was present with shimmer percentage (P < .001), with the girl's overall mean value of 2.81% vs the boy's mean value of 3.21%.
Fundamental frequency is the vibratory frequency of the vocal fold or the number of glottic cycles per second. The range for normative women is 165 to 255 Hz, with a mean of 217 Hz, while for men the range is 85 to 155Hz, with a mean of 116 Hz.7 Our data confirm previous studies describing a decreasing fundamental frequency with increasing age in both boys and girls.11- 13 Our analysis demonstrates the attainment of the normal adult female fundamental frequency occurring around age 14 years, with a transition period beginning around age 11 years. In boys, the adult normal frequency is reached around age 16 years, with a transition period beginning around age 12 years. In boys a much larger frequency decline is seen over the transition period than in girls. Our fundamental frequency results are consistent with previous reports demonstrating that vocal mutation occurs at an age younger than 14 years, but our data further illustrate that this may in fact occur 2 to 3 years earlier in both boys and girls.7
Previously reported normative data for the CSL system in primarily English-speaking children quote a mean fundamental frequency of 279 Hz for all children excluding boys older than age 12 years, leading to the conclusion that girls of any age and boys younger than 12 years have the same vocal profiles.11 Our results show more age-specific mean fundamental frequencies and suggest relative points in time when voice changes occur. For example, we found a statistically significant difference between 11-year-old boys and girls. Along these lines, the mean fundamental frequency in girls between the ages of 8 and 12 years is approximately 253 Hz, while for boys of the same age, the mean fundamental frequency is lower at 231 Hz. Our data reinforce previous smaller studies demonstrating the true variability in child fundamental frequency, where the average child may not change much from year to year.11,14- 17 For example, there were no statistically significant differences within sexes among 12- and 13-year- old children. Yet our gathering of a wide age range of children on a single voice analysis system highlights that around ages 11 and 14 years in girls and ages 12 and 16 years in boys, there are significant changes within sexes.
Although we have shown the approximate ages when changes of fundamental frequency occur between sexes, there remains the question of what causes these changes. It is well established that the vocal folds undergo histologic changes from birth to adulthood, where the vocal fold matures from a monolayer to a trilayer structure.18- 21 It is assumed that girls and boys undergo the same vocal fold maturation development from a histologic standpoint in terms of developing a 3-layered structure; yet it is still unclear what causes the changes in frequency seen between boys and girls prior to adulthood. Most likely, differences occur because of a combination of factors such as variation in vocal fold length and sex-specific morphologic and metabolic changes dependent on hormonal influences.8,11,17- 21 We are hopeful that new technological advances, such as noninvasive high-resolution imaging, will help aid in answering this question, which has up to now been dependent on postmortem tissue specimens, which are difficult to obtain in the pediatric population.
“Jitter percentage” measures the percentage of variation in cycle-to-cycle wave frequency, while “shimmer percentage” measures the percentage of variation in cycle-to-cycle wave amplitude. Low jitter and shimmer percentage values are associated with an ability to maintain periodic vibration, while an increase is associated with a hoarse voice. Noise to harmonic ratio is the index associating the noise to harmonic component of the acoustic wave. Increased noise is due to turbulent airflow produced around the glottal opening during phonation suggesting a voice abnormality.22 Our noise to harmonic ratio data were consistent with previous pediatric reports.11,13 Noise to harmonic ratio appeared to be consistent among all age groups studied. Our results fall within the quoted CSL adult normative threshold value of 0.19. Our overall jitter and shimmer percentage values were similar to those previously reported.11,13,22 In the study by Tavares et al,13 4- to 11-year-old children showed no significant jitter percentage differences, with a normal value under 1.8%. They also demonstrated no age or sex differences in shimmer percentage, but the overall value was higher, with a normal value being between 4% and 5%.13 Ferrand et al,22 in an analysis of 80 children younger than 10 years, did not show any differences in jitter and shimmer percentages among boys and girls, with findings similar to those reported by Nicollas et al.12 Our data show a statistically significant overall mean difference in jitter and shimmer percentages between boys and girls. This is most likely owing to the wide age range of boys and girls studied, where there is a significant difference shown among the mean fundamental frequency. Previous studies of adults have reported that jitter and shimmer values for women are less than those for men, which may be because of higher motor unit firing rates associated with higher fundamental frequency or vocalis muscle differences in men and women.17,23 There is considerable variability in previous studies on whether children have similar or higher jitter or shimmer values than adults.24,25 Our data suggest that children and adults have similar values, since normative adult thresholds defined by the CSL system include values of 1.04% and 3.81% for jitter and shimmer percentage, respectively.
One limitation of this study is the possibility that we recorded children with laryngeal pathologic conditions. We believed that proposing flexible fiberoptic laryngoscopy to study patients and their parents would prohibit the gathering of our data in a timely manner and add significant expense. Flexible fiberoptic laryngoscopy can be a temporarily anxiety-provoking invasive examination, and it is unlikely that we would have been able to accrue a large sample size if we insisted on laryngeal examination. We did not believe that direct visualization of the glottis was necessary to rule out laryngeal pathology if the child and parent had no voice complaints. Furthermore, we believed that using the validated PVRQOL survey and the CAPE-V form would help ensure that we were in fact testing normal voices. The PVRQOL results are consistent with reported normative values.8,9 Inherent limitations of the study were related to young age and cooperation. Subjects were encouraged to perform the voice tasks, but no child was forced to complete any task against their will. Finally, it is difficult to ascertain the impact that various otolaryngologic diagnoses have on voice analysis measurement. We realize that tonsil and adenoid hypertrophy could have an effect on the voice analysis, yet given that many healthy, normal children have adenotonsillar hypertrophy, it is difficult to exclude this segment entirely.
It should also be noted that we used a distinct proprietary computerized system. Although this system is widely used among speech and voice researchers, one must use caution when comparing findings with historical data. The various proprietary programs have different algorithms that may not provide comparable data to other systems. Also, the normative values quoted by these systems are based on adults only. Finally, caution should be applied when comparing our data with the system-defined adult normative data because there may have been differences in data-gathering techniques.
In conclusion, we have reported, to date, the largest normative voice database among English-speaking children. Our data demonstrate that there is no “typical” fundamental voice for a given age, but that there are discrete periods when changes are seen in both boys and girls. Hopefully, these data can be used as a reference, as more speech, voice, and hearing research studies are conducted on children. Furthermore we hope that these data can serve as a functional basis for developmental investigations into the maturing pediatric larynx.
Correspondence: Christopher Hartnick, MD, Department of Otology and Laryngology, Massachusetts Eye and Ear Infirmary, Harvard Medical School, 243 Charles St, Boston, MA 02114 (Christopher_Hartnick@meei.harvard.edu).
Submitted for Publication: November 13, 2011; final revision received March 29, 2012; accepted August 8, 2012.
Published Online: September 17, 2012. doi:10.1001/2013.jamaoto.104
Author Contributions: Drs Maturo and Hartnick had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: Maturo, Bunting, Ballif, and Hartnick. Acquisition of data: Maturo and Hill. Analysis and interpretation of data: Maturo, Bunting, Maurer, and Hartnick. Drafting of the manuscript: Maturo, Bunting, Ballif, and Hartnick. Critical revision of the manuscript for important intellectual content: Maturo, Hill, Bunting, Maurer, and Hartnick. Statistical analysis: Maturo, Maurer, and Hartnick. Obtained funding: Hartnick. Administrative, technical, and material support: Maturo, Hill, Bunting, and Ballif. Study supervision: Maturo, Bunting, and Hartnick.
Financial Disclosure: None reported.
Previous Presentation: This article was presented at the American Society of Pediatric Otolaryngology 2012 Annual Meeting; April 21, 2012; San Diego, California.