Consonant-vowel-consonant word scores for 15 subjects using the Advanced Bionics Corporation Hi-Focus electrode array with positioner, CII Bionic Ear behind-the-ear speech processor, and Hi-Resolution sound processing strategy (CII) and 15 subjects using the Cochlear Corporation Nucleus 24 electrode array, ESPrit 3G behind-the-ear speech processor, and Advanced Combination Encoder speech coding strategy (3G). Each circle indicates the score for 1 subject.
Test performance for subjects using CII and 3G devices as a function of type of test (see legend to Figure 1 for description of devices). Asterisk indicates a significant difference in performance. Error bars indicate ±1 SD. HINT indicates Hearing In Noise Test sentences; CUNY, City University of New York sentences; AzBio, AzBio sentences; and SNR, signal-to-noise ratio.
Test performance for subjects using CII (A) and 3G (B) devices as a function of type of test (see legend to Figure 1 for description of devices). Asterisk indicates a significant difference in performance. Error bars indicate ±1 SD.
Difference in percent correct scores for sentences at +10 dB signal-to-noise ratio and at 54 dB SPL (sound pressure level) in quiet in subjects using CII (A) and 3G (B) devices (see legend to Figure 1 for description of devices). Individual bars indicate a single subject's difference score.
Mean scores for the measure of robustness, ie, the averaged performance in difficult listening situations (low sound pressure level and in noise) divided by performance in quiet and multiplied by 100 for subjects fit with the CII and 3G devices (see legend to Figure 1 for description of devices). The asterisk indicates a significant difference in performance.
Spahr AJ, Dorman MF. Performance of Subjects Fit With the Advanced Bionics CII and Nucleus 3G Cochlear Implant Devices. Arch Otolaryngol Head Neck Surg. 2004;130(5):624-628. doi:10.1001/archotol.130.5.624
To determine if subjects who used different cochlear implant devices and who were matched on consonant-vowel-consonant (CNC) identification in quiet would show differences in performance on speech-based tests of spectral and temporal resolution, speech understanding in noise, or speech understanding at low sound levels.
The performance of 15 subjects fit with the CII Bionic Ear System (CII Bionic Ear behind-the-ear speech processor with the Hi-Resolution sound processing strategy; Advanced Bionics Corporation) was compared with the performance of 15 subjects fit with the Nucleus 24 electrode array and ESPrit 3G behind-the-ear speech processor with the Advanced Combination Encoder speech coding strategy (Cochlear Corporation).
Thirty adults with late-onset deafness and above-average speech perception abilities who used cochlear implants.
Main Outcome Measures
Vowel recognition, consonant recognition, sentences in quiet (74, 64, and 54 dB SPL [sound pressure level]) and in noise (+10 and +5 dB SNR [signal-to-noise ratio]), voice discrimination, and melody recognition.
Group differences in performance were significant in 4 conditions: vowel identification, difficult sentence material at +5 dB and +10 dB SNR, and a measure that quantified performance in noise and low input levels relative to performance in quiet.
We have identified tasks on which there are between-group differences in performance for subjects matched on CNC word scores in quiet. We suspect that the differences in performance are due to differences in signal processing. Our next goal is to uncover the signal processing attributes of the speech processors that are responsible for the differences in performance.
The cochlear implants available in the United States differ in terms of signal processing strategies and in the hardware that implements those strategies. It is likely that some of these differences in hardware and software lead to differences in subject performance. Herein we report the outcomes of an experiment in which the perception of speech, voice, and music was assessed (1) in subjects who used the Advanced Bionics Corporation (Sylmar, Calif) Hi-Focus electrode array with positioner, CII Bionic Ear behind-the-ear speech processor, and Hi-Resolution sound processing strategy (CII subjects) and (2) in subjects who used the Cochlear Corporation (Englewood, Colo) Nucleus 24 electrode array, ESPrit 3G behind-the-ear speech processor (3G), and Advanced Combination Encoder (ACE) speech coding strategy (3G subjects). Our aim was to find aspects of speech understanding that are better conveyed by one speech processing scheme than the other and to use that information to design better speech processors.
To achieve this goal, we matched CII and 3G subjects on single-syllable word scores in quiet (consonant-vowel-consonant [CNC] words). We then tested these subjects on material that targeted detailed spectral resolution (vowel recognition and place of consonant articulation), detailed resolution of low frequencies (voice discrimination and melody recognition), sentence understanding in noise, and sentence understanding at low signal levels. At issue was whether devices that provided equal levels of speech understanding in quiet would provide different levels of speech understanding for specific frequency-based or temporally based tasks or for speech understanding in more difficult listening environments.
It is relevant to point out differences in our design and the usual design for a device comparison experiment, ie, that of a prospective randomized clinical trial. In a randomized clinical trial the issue is whether one device provides, on average, a higher level of performance than another device. A common outcome measure is the mean speech perception score for the population samples. When using this research design, biographic variables, such as duration of deafness, preimplant speech understanding, and duration of device use, are relevant because they are correlated with performance on tests of speech understanding.
In our experimental design, subjects who used the CII and 3G processors were matched on performance in quiet, ie, on CNC word score, and for age. The CNC word test has proved to be sensitive to performance level and serves as a reasonable criterion for matching subjects. Age is a relevant matching factor because there is evidence that age has a large effect on speech perception in noise.1 Other biographic variables known to affect performance are less likely to be an issue in this design because biographic variables affect performance and performance level was our primary matching criterion. That said, matching on age and CNC word scores yielded groups that did not differ, on average, in terms of duration of deafness, preimplant hearing, preimplant speech understanding, and duration of experience with electrical hearing.
By matching subjects on a very difficult test of speech understanding in quiet, ie, CNC words, we biased our experiment against finding any difference in performance on other test materials. However, we did find differences in performance and those outcomes are described below.
This study was approved by the institutional review board at Arizona State University. All subjects gave informed consent prior to participation.
Fifteen subjects who used the CII processor and 15 subjects who used the 3G processor were selected for this study. All CII subjects were implanted with the Hi-Focus electrode array with positioner and used the CII Bionic Ear behind-the-ear speech processor with Hi-Resolution system. The number of active electrodes varied from 14 to 16. The pulse rates varied from 725 Hz to 5155 pulses per second (pps)/channel (mean [SD], 211 ). Six subjects were programmed in continuous interleaved sampler (CIS) mode. Nine were programmed in paired CIS mode.
All of the 3G subjects were implanted with the Nucleus 24 electrode array and used an ACE encoding scheme with either an 8- or 12-of-20 channel-picking scheme. The pulse rates varied between 500 and 1200 pps (mean [SD], 913 ). All subjects used the ESPrit 3G behind-the-ear speech processor.
All subjects were asked to use the program they most commonly used and were not allowed to change programs, or device settings, during the course of the experiments. Subjects were recruited by letter from clinics in the United States and Canada. Eleven US clinics and 3 Canadian clinics provided subjects for this project. All subjects were flown to Arizona for testing. All subjects were tested by a single examiner using one set of test equipment.
The 2 groups of 15 subjects each were created from a larger sample of 59 subjects. The first matching variable was CNC score. As shown in Figure 1, scores were matched within 4 percentage points (2 words on a 50-item CNC list). The mean score was 66% correct for both groups. The range of CNC word scores was 46% to 86% correct for CII subjects and 46% to 88% correct for 3G subjects. The second matching variable was age. Because we had a larger sample of 3G subjects than CII subjects, we found the closest age matches for the CII subjects from the larger group of 3G subjects. The mean (SD) age of subjects in the 2 groups did not differ significantly (CII: 56.1 [12.2] years; range, 36.3-83.4 years; 3G: 56.7 [13.5] years; range, 28.9-75.7 years). Although we did not explicitly match on other biographic variables, there were no differences between the groups in the mean (SD) duration of experience with electrical stimulation (CII: 1.46 [0.38] years; range, 1.0-2.5 years; 3G: 2.03 [1.53] years; range, 0.2-5.0 years), duration of deafness (CII: 17.5 [19.7] years; range, 0.2-53.9 years; 3G: 14.1 [10.4] years; range, 1.1-32.8 years), preimplant pure-tone thresholds for the implanted ear, averaged over 0.5, 1, and 2 kHz (CII: 100.8 [9.24] dB HL [hearing level]; range, 83-113 dB HL; 3G: 100.3 [16.24] dB HL; range, 60-120 dB HL), and the preimplant level of speech understanding (CNC words) (CII: 9.79 [11.6] words; range, 0-36 words; 3G: 4.86 [4.75] words; range, 0-12 words).
To ensure that subjects from one group did not have more experience with the test materials than subjects from the other group, a new battery of tests was created.
AzBio Sentences. Five talkers (2 male and 3 female) recorded 500 sentences ranging in length from 6 to 10 words. The talkers were instructed to speak in a conversational fashion and were instructed to avoid the "clear speech" mode used for the Hearing In Noise Test (HINT) sentences and the exaggerated "clear speech" mode used for the City University of New York (CUNY) sentences. All sentences were normalized to be of approximately equal sound pressure level (SPL). The sentences then were processed using a 5-channel cochlear implant simulation and presented to 10 normal-hearing subjects for identification. Mean percent correct scores were calculated for each of the 500 sentences. Based upon those intelligibility scores, lists of 40 sentences each were constructed using sentences produced by 4 talkers (2 male and 2 female). The mean intelligibility of the lists (for normal-hearing subjects listening to the 5-channel simulation) was 89% correct. The lists differed in intelligibility by less than 1%.
One 40-sentence list was used for each of the following conditions: sentences in quiet at 74, 64, and 54 dB SPL and sentences in noise at +10 and +5 dB SNR (signal-to-noise ratio). Subjects were asked to repeat back the sentences and were encouraged to guess when unsure. All individual sentences were scored as words correct and an overall percent correct score was computed.
Talker Discrimination.2 Subjects were asked to discriminate between male and female voices and between same-sex voices. The stimuli were drawn from a digital database developed at the Speech Research Laboratory at Indiana University, Bloomington. A total of 108 words produced by 5 males and 5 females were selected. Subjects were presented with pairs of words. Within each condition, half of the pairings were produced by the same talker and half were produced by different talkers. The words in the pairings always differed, eg, one male talker might say "ball" and the other male talker might say "brush." Across the different talker pairs, each talker was paired with every other talker an equal number of times. Participants responded "same" or "different" by pressing 1 of 2 buttons. Responses were scored as the percent of correct responses. Chance is 50% correct on this task.
Melody Recognition. Each subject selected 5 familiar melodies from a list of 33 simple melodies. Each melody consisted of 16 equal-duration notes and synthesized with MIDI software that used samples of a grand piano.3 The frequencies of the notes ranged from 277 to 622 Hz. The average note was concert A (440 Hz) ± 1 semitone. The melodies were created without distinctive rhythmic information. Each subject selected 5 familiar melodies from the larger list. Following familiarization with the melodies, the melodies were put into a randomized test sequence and subjects were asked to identify the melodies. After the presentation of a melody subjects responded by pressing a button from a list containing the 5 preselected melodies. Chance is 20% correct on this task.
Vowel Recognition Without Duration Cues. Thirteen vowels were created with KLATT software4 in /bVt/ format (bait, Bart, bat, beet, Bert, bet, bit, bite, boat, boot, bought, bout, but). The vowel presentations were brief (90 milliseconds) and of equal duration so that vowel length would not be a cue to identity.5 There were 5 repetitions of each stimulus. The order of the items was randomized in the test list.
Consonants in /e/ Environment. Twenty consonants were recorded in /eCe/ format, eg, "a bay," "a day", "a gay." A single male talker made 5 productions of each token. The pitch and vocalic portion of each token was intentionally varied. The order of items was randomized in the test list.
Standard Test Material. (1) CNC words: All subjects were tested with the same 50-item CNC word list in our laboratory; (2) HINT: all 250 HINT sentences were presented in quiet in random order6; (3) CUNY: CUNY7 sentences were presented in quiet and at +10 dB SNR. Two lists (24 sentences) were used in each condition. All standard test materials were taken from commercial CD recordings or from the original sources (HINT sentences).
Noise. The noise was 4-talker babble from an Auditec CD. The noise started 100 milliseconds before the onset of the signal and ended 100 milliseconds after the end of the signal.
New Measures of Performance in Difficult Listening Conditions. Many subjects achieve high levels of performance on sentences in quiet but perform less well in difficult listening conditions, eg, in noise or when speech is presented at a low level. We created 2 measures to quantify performance in these difficult listening conditions. One measure, termed balance, is the difference in scores in noise (AzBio sentences at 74 dB SPL with +10 dB SNR) and at a low input level (54 dB SPL). A small difference score indicates that a subject is able to understand speech with similar accuracy in the 2 difficult listening conditions, ie, performance is balanced in the 2 conditions. A large difference score indicates that a subject understands speech much better in 1 of the 2 conditions.
To provide a measure of performance in the 2 difficult listening conditions (AzBio sentences at +10 dB SNR and AzBio sentences at 54 dB SPL) relative to performance in quiet, scores for the difficult conditions were averaged and then divided by performance in quiet. This number was multiplied by 100 to create a score for a measure we term robustness. A high score on the robustness index indicates little drop in performance between the 74 dB, quiet condition and the 2 difficult listening environments. A low score indicates a large drop in performance between the 74 dB, quiet condition and 1 or both of the difficult listening environments.
As shown in Figure 2, CII subjects recognized sentences at +5 dB SNR with higher accuracy than 3G subjects (CII mean [SD], 36.7% [19.5%]; 3G mean [SD], 14.4% [12.2%]; F1,23 = 11.0, P = .003, power = .89). At +10 dB SNR, CII subjects recognized sentences with higher accuracy than 3G subjects (CII mean [SD], 51.6% [23.9%]; 3G mean [SD], 32.9% [20.3%]; F1,28 = 5.33, P = .03, power = .61). As shown in Figure 3, CII subjects recognized brief synthetic vowels with higher accuracy than 3G subjects (CII mean [SD], 67.8% [16.4%]; 3G mean [SD], 50.4% [13.3%]; F1,28 = 10.15, P = .004, power = .87).
As shown in Figure 4, CII subjects tended to have similar scores in noise and in the low signal level condition, ie, the majority of scores differed by 20 percentage points or less. In contrast, 3G subjects generally evidenced a greater difference between conditions—for 4 of the 15 subjects the scores differed by greater than 50 percentage points. In most cases performance in noise was poorer than at a low signal level. These outcomes can be summarized in the following way: The CII subjects showed a more "balanced" performance in the 2 difficult listening situations.
The scores in Figure 4 do not take into account the level of performance in quiet. The measure of robustness, shown in Figure 5, takes this into account. On this measure, CII subjects scored higher than 3G subjects (CII mean [SD], 75.2% [17.0%]; 3G mean [SD], 60.8% [12.6%]; F1,28 = 6.90, P = .01, power = .72).
For all other conditions, ie, consonant place, manner and voicing, sentences in quiet, voice discrimination, and melody recognition, the differences in levels of performance between groups were not significant.
For most tasks, the group mean scores for the 2 groups did not differ significantly. However, group mean scores did differ significantly on 3 tasks and 1 measure: recognition of brief vowels, recognition of relatively difficult sentences at +10 and +5 dB SNR, and on the measure of robustness. We believe that the differences in performance are due, most likely, to differences in signal processing. Other accounts, eg, an unrepresentative sample of patients in one group, or failure to match patients on a variable that effects performance in noise to a different extent than performance in quiet, seem, to us, less likely.
At this early point in our research, we have achieved our first goal—the identification of tasks on which there are between-group differences in performance. The reasons, in terms of processor design or strategy implementation, for the between-group differences are not clear. We will know a great deal more when we have tested subjects who use the MedEl Combi 40+ cochlear implant with TEMPO+ behind-the-ear speech processor (MedEl Corporation, Durham, NC). From a signal processing standpoint this system functions more like the CII device than the 3G device. If differences in performance are attributable to differences in signal processing, then individuals using the TEMPO+ device should behave in a fashion similar to CII subjects.
Finally, our results speak only to those who score in the upper half of the population of individuals who receive implants, ie, those with 45% CNC scores or better. We chose not to include subjects with test CNC scores lower than 45% in our analyses because when tested in noise with difficult sentence materials, the subjects' scores "fell to the floor" and we lost the ability to accurately measure the drop in performance.
Corresponding author and reprints: Michael F. Dorman, PhD, Department of Speech and Hearing Sciences, Arizona State University, Tempe, AZ 85287-0102 (e-mail: firstname.lastname@example.org).
Submitted for publication September 3, 2003; final revision received February 6, 2003; accepted February 10, 2004.
This study was presented at the Ninth Symposium on Cochlear Implants in Children; April 25, 2003; Washington, DC.