Perceived overall voice quality (A), breathiness (B), roughness (C), and brokenness (D), before and after treatment with botulinum toxin type A (Botox), compared with age- and sex-matched healthy control subjects. Posttreatment response varied markedly as a function of pretreatment severity and specific attributes. ADSD indicates adductor spasmodic dysphonia; and VAS, visual analog scaling.
Perceived overall speech fluency (A), dysfluent syllables (B), tension struggle (C), and spasms (D), before and after treatment with botulinum toxin type A (Botox), compared with age- and sex-matched healthy control subjects. Posttreatment response varied markedly as a function of pretreatment severity and specific attributes. ADSD indicates adductor spasmodic dysphonia; and VAS, visual analog scaling.
Cannito MP, Woodson GE, Murry T, Bender B. Perceptual Analyses of Spasmodic Dysphonia Before and After Treatment. Arch Otolaryngol Head Neck Surg. 2004;130(12):1393-1399. doi:10.1001/archotol.130.12.1393
To evaluate expert listeners’ perceptions of voice and fluency in persons with adductor spasmodic dysphonia (ADSD) before and after treatment with botulinum toxin type A (Botox), as a function of initial severity of the disorder (while controlling for patients’ age at injection).
Simple before-and-after trial with blinded randomized listener judgments.
Ambulatory care clinic at a single medical center.
Forty-two consecutive patients with ADSD who underwent examination, with a 3- to 6-week follow-up, after initial botulinum toxin type A injection. There were also 42 age- and sex-matched healthy control subjects.
Injections of botulinum toxin type A into the thyroarytenoid muscle(s).
Main Outcome Measures
Computer-implemented visual analog scaling judgments of voice quality and speech fluency made by expert listeners under psychoacoustically controlled conditions.
Response to botulinum toxin type A varied markedly as a function of pretreatment severity of ADSD. More severe initial symptoms exhibited greater magnitudes of improvement. Patients with mild dysphonia did not exhibit pretreatment to posttreatment change. Following treatment, voice and fluency remained significantly (P<.05) poorer in ADSD than in healthy speakers. Older patients exhibited less improvement than younger patients when the effect of initial severity was statistically controlled.
Voice quality and fluency improved for most patients following treatment, but older patients and those with milder dysphonia exhibited the least optimal responses to the procedure. Patients who were profoundly impaired demonstrated the greatest amount of improvement. Computer-implemented visual analog scaling provided a reliable clinical tool for determining treatment-related changes in those with ADSD.
Adductor spasmodic dysphonia (ADSD) is an uncommon voice disorder characterized by intermittent breaks in voicing associated with overadduction of the vocal folds, in a context of strained-strangled voice quality, and dysfluent effortful speech production. Adductor spasmodic dysphonia is described as a focal laryngeal dystonia, and extrapyramidal system dysfunction is frequently inferred, although immediate causation or localization of neuropathological features in most cases remains unknown. Adductor spasmodic dysphonia has proved to be particularly resistant to behavioral voice therapy. Despite various surgical options (eg, sectioning of a recurrent laryngeal nerve), injection of the true vocal fold(s) with botulinum toxin type A (Botox) is generally regarded as the preferred method of treatment.1 Botulinum toxin type A injection weakens neuromuscular contraction of the vocal folds by interfering with release of acetylcholine at the neuromuscular junction. This induced partial paralysis reduces the effect of adductor spasms on voice production and, following a brief period of intense breathiness, benefit may last up to 3 months, after which the effects of the toxin gradually subside and reinjection is necessary.2
The efficacy of botulinum toxin type A injection for ADSD is well established.3,4 Recent studies, however, have demonstrated that during postinjection periods when benefit is considered optimal, acoustic indexes continued to significantly differentiate ADSD from control speakers.5,6 This was especially true of phonatory aperiodicity in sustained vowels and connected speech. Although acoustic analysis has been useful in quantifying primary characteristics and response to treatment, perceptual assessment by trained clinical observers also has been an important and long-standing method for elucidating the disorder. The relationship between acoustic measurements and perceived voice qualities is neither obvious nor straightforward; yet, how speech sounds to normal listeners remains an important functional metric for initial characterization of a speech disorder and for determining the benefit of treatments.
Various perceptual procedures, documenting improvement, have included blinded dichotomous judgments by experienced clinicians for randomized pretreatment-posttreatment speech sample pairs7 and 7-point ordinal severity ratings.4,8 Adams et al9 used direct magnitude estimation to demonstrate decreased voice spasms and increased breathiness in sustained vowels following treatment. Langeveld et al5 recently used visual analog scaling (VAS) to document the optimal response of connected speech to botulinum toxin injections in patients with ADSD. These studies,5,9 however, did not report statistical difference testing for perceptual data between ADSD subjects and healthy controls, nor did they examine the role of pretreatment severity or aging as potential influences on treatment outcome.
Two studies have examined the influence of initial severity of ADSD on treatment outcome, based on patient self-ratings, but their findings were equivocal.7,10 Ford et al7 reported a significant negative relationship between initial severity and outcome, with the patients with more severe dysphonia exhibiting better response to botulinum toxin type A injection. Lundy et al10 reported that patients with mild to moderate ADSD exhibited significantly better outcomes than those initially rated as having severe ADSD. Reasons for these disparate findings are unclear. Given that ADSD patient self-ratings significantly overestimate blinded judgments of trained clinical observers3,11 and that untrained listeners are less reliable than trained clinicians,12 a more objective approach to perceptual analysis is warranted. Interestingly, both studies7,10 also reported a negative relationship of age with treatment outcome, congruent with expectations of poorer voice quality and fluency in healthy older speakers.13,14 However, age-matched control group comparisons were not included, and potential interactions between aging and symptom severity were not statistically controlled.
This study applies a new computer-implemented method for VAS, using expert listeners as judges, to evaluate the response of ADSD voice and fluency to botulinum toxin type A injection as a function of pretreatment severity while controlling for the effects of aging. The advantages of VAS are that it offers higher resolution and tends to be more reliable than traditional ordinal scales.15 We also compared pretreatment and posttreatment voice and fluency, across differing severity levels of ADSD, with that of age- and sex-matched healthy controls, to provide an appropriate reference against which to evaluate posttreatment outcomes. While dysfluency (in addition to voice quality) has long been considered an important aspect of ADSD,16 systematic analyses of dysfluency in response to botulinum toxin type A have been lacking, although VAS judgments of dysfluency are strong predictors of clinician ratings of overall symptom severity.17
The patients were 42 English-speaking adults with ADSD, ranging in age from 22 to 79 years (Table 1). An otolaryngologist (G.E.W.) and a speech pathologist (T.M.), following flexible endoscopy and voice examination, diagnosed all participants as having ADSD. Assessment confirmed the presence of abnormal spasms and hyperfunction of the laryngeal musculature during speech in the absence of other structural laryngeal pathological features. Patients were judged to be free of laryngeal tremor and did not exhibit movement disorders elsewhere in the body. They had no history of botulinum toxin type A treatment or laryngeal surgery, and exhibited ADSD symptoms for at least 1 year before the study. All patients underwent electromyographic-guided transcutaneous injection of botulinum toxin type A into either the left or both vocal folds, with dosages varying from 1.25 to 30 U.
Clinical severity before botulinum toxin type A injection was determined by consensus of 2 speech pathologists (M.P.C. and B.B.) who listened to recorded speech samples. A 5-point ordinal scale was used (0 indicated absence of dysphonia; 1, mild dysphonia; 2, moderate dysphonia; 3, severe dysphonia; and 4, profound dysphonia).6,17 Patients were assigned to subgroups for analysis according to the severity ratings.
Speech samples consisted of readings of “The Rainbow Passage”18 as part of a routine assessment battery, recorded in a sound-treated room. Patients were recorded within 2 weeks before their first botulinum toxin type A injection and again from approximately 2 to 6 weeks following the injection. This period was selected because it spans a time when botulinum toxin type A–induced breathiness and other adverse effects diminish and patients report peak benefits following initial injection.19,20 Forty-two healthy English speakers, matched to the ADSD patients by age and sex, also were recorded to serve as controls. Thus, each severity subgroup of ADSD speakers was compared with its own set of matched healthy controls. Control speakers achieved a score of 0 on the severity scale.
Listeners consisted of 12 adults who composed 2 expert panels of certified speech-language pathologists, each with more than 5 years’ clinical experience, specializing in disorders of either voice (n = 6) or fluency (n = 6). Different listeners participated in each panel, which excluded the study authors. All listeners passed a hearing screening before the experimental session. Study data were collected under institutional review board approval.
Speech samples were recorded using a head-mounted microphone (Sony ICM 50) and an audiocassette recorder (Nakamichi CR5A), then digitized using a computer speech laboratory (model 4300B; Kay Elemetrics CSL) at a sampling rate of 20 kHz. A total of 126 paragraph readings (42 speakers [pretreatment and posttreatment] and 42 matched controls) served as stimuli for perceptual analyses. Custom VAS software was used to present stimuli and record listeners’ responses. A control terminal located outside a sound booth output the speech signals from a 16-bit digital sound card to a programmable attenuator (Tucker Davis model PA4; Tucker-Davis Technologies, Gainesville, Fla) and preamplifier (Crown D-75; Crown International, Elkhart, Ind), and presented them in the sound booth via a loudspeaker positioned about 1 m from the listener. Speech signal intensity was adjusted during playback to achieve an average level for conversational speech.21
Each listener was tested in the sound booth, seated before a response terminal. The response interface consisted of 4 vertically oriented histograms of 100-mm height arrayed on a monitor screen, labeled with the voice or fluency attributes to be scaled (Table 2). The end points of the histograms also were labeled (eg, extremely good/extremely poor or absent/pervasive). Listeners indicated their scaling judgments by positioning a mouse-driven cursor at the desired level of each histogram. Higher values corresponded to more normal (good) performance. The program automatically converted the visual output to a numerical value ranging from 0 to 100, and returned this value to a file on the control computer. Listeners were given printed task instructions and definitions, and were instructed to judge each speech sample in terms of its magnitude of deviation from their own internal standard of normalcy. Preliminary training enabled listeners to become facile with the interface, gave them a common frame of reference in terms of the range of vocal behaviors to be scaled, and offset potential end effects by encouraging use of the full range of the scale. Listeners were blinded to the speech sample conditions and heard the speech stimuli in randomized order.
Intralistener reliability was evaluated by randomly replaying 18 speech samples from the treatment conditions (6 pretreatment, 6 posttreatment, and 6 control) during an experimental run. Interlistener reliability was evaluated by comparing all of the scaling values obtained from the 6 listeners for each attribute in each experiment. Observed reliability coefficients were high and positive (P<.001) for all voice and fluency attributes (Table 3).
The mean of 6 listeners’ scaling judgments for a given speech sample on a specific attribute served as the observational unit (dependent variable) in the analysis. This yielded 504 observations per listening experiment (42 speech samples × 3 treatment conditions × 4 scaling attributes). To evaluate the statistical significance of the effects of treatment (pretreatment, posttreatment, and control) as a function of patient severity, a 4-way analysis of covariance with repeated measures was used. Severity was included as a between-subjects factor. The 4 speech attributes nested within each of 2 listening experiments (Table 2) also composed within-subjects factors. Patient age at injection was the covariate. Tukey Honestly Significant Difference tests were used for post hoc means comparisons. An overall significance level of P<.05 (95% confidence interval) was used.
Statistically significant main effects were obtained for listening experiment, treatment condition, and initial severity (P<.001 for all). The covariate of age was also significant (P<.02), and was significantly related to the main effect of treatment (P = .01).
There also was a significant 4-way interaction of experiment × attribute × treatment × severity (P<.001) that accounted for 72% of the total variance. Post hoc means comparisons, therefore, focused on this interaction (Table 4 and Table 5). Each ADSD severity group was compared with itself (pretreatment vs posttreatment) and with its specific subset of age- and sex-matched healthy controls. Control groups did not differ significantly from each other on any of the speech attributes (P>.10 for all). Four-way interaction plots for the voice and fluency attributes (Figure 1 and Figure 2, respectively) show control groups clustered tightly (overlapping) in the upper right corner of each panel. Adductor spasmodic dysphonia severity groups demonstrated a wide range of values before and after treatment. The magnitudes of posttreatment improvement increased markedly as pretreatment severity increased.
Before injection, all of the ADSD severity groups differed significantly from their matched controls on the 4 voice attributes. Following injection, this was also the case, with the exception of the mild ADSD subgroup, which did not differ from the matched controls on brokenness. The mild subgroup did not improve significantly after treatment on any voice attribute, but became significantly more breathy. Significant improvement following injection was observed for the moderate subgroup on the attributes of roughness and brokenness, while the severe and profound ADSD subgroups improved significantly on overall voice quality, roughness, and brokenness, but not on breathiness.
Before injection, the 4 severity subgroups were perceived to have less overall fluency and more spasms and tension struggle than their matched controls. Following injection, the mild, severe, and profound subgroups remained significantly less fluent than controls for these attributes (the moderate subgroup did not differ from controls on overall fluency). Significant pretreatment to posttreatment injection improvement was observed for overall fluency, spasms, and tension struggle for all but the mild severity subgroup. For dysfluent syllables only, the profound group improved significantly after treatment.
To evaluate the relationship between aging and the effect of treatment, a composite score was obtained by calculating the mean of the 8 voice and fluency variables for each participant in each condition. Pretreatment to posttreatment change scores were then computed for the ADSD speakers. A significant multiple linear regression (P = .006) was observed between these difference scores (dependent variable) and the predictor variables of severity and age. Severity explained 36% of the variance in preinjection to postinjection change, while age accounted for an additional 11%. The milder the initial severity, the smaller the clinical change; however, the older the ADSD speaker, the less improvement. More important, 9 ADSD speakers became slightly worse on the composite measure after injection; 5 of these individuals were older than 70 years.
The present findings are similar to those of perceptual studies of ADSD, in which overall severity of dysphonia, strained-strangled voice, and aperiodicity were observed to be prominent features of untreated ADSD and to diminish following botulinum toxin intervention, whereas breathiness increased.8,9 These findings also concur with acoustic studies in which aperiodicity and voice breaks diminished following botulinum toxin (Botox) injection, yet the ADSD voice continued to differentiate from the voice of healthy controls.5,6,8 Present outcomes, however, varied markedly as a function of initial severity and specific voice and fluency attributes. Notably, on 5 of 8 perceived attributes, all ADSD severity groups differed significantly from their matched healthy controls following and preceding treatment. The mean overall voice quality in ADSD patients, for example, remained 42% of the scale below normal levels. Posttreatment deficits of this magnitude obtained from expert listeners under carefully controlled experimental conditions underscore the limitations of the treatment. Botulinum toxin type A treatment did not result in a normal voice; in fact, there remained substantial roughness and breathiness after injection for most of the patients in the study.
Similar to Ford et al,7 in this study, greater pretreatment severity predicted greater magnitude of posttreatment response. Only the mild ADSD group did not exhibit statistically significant improvement on any attribute. For them, breathiness worsened by 12% of the scale at 3 to 6 weeks following injection. In contrast, Lundy et al10 reported that patients with milder dysphonia had the best postinjection outcomes. That study, however, did not report pretreatment values. The outcome measure was absolute posttreatment level of performance rather than magnitude of pretreatment to posttreatment change. By that criterion, the mild group in the present study may also be described as having the best outcome. Such a conclusion is unwarranted in view of the findings (ie, no improvement or worsening of performance from preinjection to postinjection) for mild ADSD. In treatment outcome research, it is important to distinguish between amount of improvement and absolute posttreatment performance levels, because it is the combined information that enables a complete evaluation of effectiveness. Failure of the mild group to exhibit significant improvement cannot be explained away based on ceiling effects, because these speakers remained significantly impaired relative to their controls on 5 of the unchanged measures (in addition to breathiness). On overall voice quality, for example, following injection, the mild ADSD mean was more than 10 SDs below the healthy control group’s mean.
Apart from mild ADSD, botulinum toxin type A treatment substantially enhanced perceptions of voice and fluency for most patients, especially those with more severe dysphonia. The positive response to botulinum toxin type A of the profound group was markedly greater than that of any other group. Adductor spasmodic dysphonia is an extremely heterogeneous diagnosis.22 Rosenfield23 has argued that ADSD may result from various causes and that idiopathic cases do not necessarily represent a neurological diagnosis of focal dystonia. It seems plausible that the profound extreme of the severity continuum may represent true neurogenic laryngeal dystonia, making such patients more optimal candidates for the procedure. The persistence of residual glottal squeezing (brokenness, roughness, and dysfluency), despite improvement following botulinum toxin type A injection, suggests the need for continued exploration of alternative or adjunctive treatment modalities, such as voice therapy combined with botulinum toxin type A.24
Breathiness occurred before injection in the present ADSD speakers and persisted well beyond 2 weeks’ postinjection. Some breathiness may be present in those with untreated ADSD12,25 as a result of adductor-abductor incoordination or compensatory widening of the posterior glottis in an effort to maintain airflow across the hyperadducted vocal folds. Langeveld et al5 also reported preinjection breathiness without significant posttreatment change. Increased breathiness postinjection, observed only in the mild ADSD group, was likely an effect of the toxin-induced vocal fold paresis and may have been perceptually more salient in this subgroup because of the comparatively limited occurrence of other ADSD features. Because these were initial injections, it is plausible that reduction of dosage in subsequent injections may ameliorate the breathiness effect.
Age at injection also significantly influenced response to treatment, with older subjects experiencing less benefit. Vocal performance is known to decline as a function of normal aging due to changes in the viscoelastic properties of the vocal folds and other laryngeal biomechanical changes.14 The present findings support previous reports based on patient self-ratings,7,11 and extend the age effect to fluency and voice quality. The clinical implications are clear: older ADSD patients exhibit less perceptible improvement than younger ones, making them less optimal candidates for botulinum toxin type A treatment.
Fluency experts perceived all ADSD severity groups to have significantly more overall dysfluency than matched healthy controls before botulinum toxin type A injection, corroborating the findings of Cannito et al.17They also found fluency to improve significantly following botulinum toxin type A injection (except in those with mild ADSD), but to remain significantly poorer than normal. Interestingly, stutterlike dysfluent syllables were perceived only in the severe and profound groups. Aronson et al16 hypothesized that these are a reaction to the laryngeal spasms rather than a primary deficit. If true, dysfluencies should decrease with vocal spasms following treatment. In the present study, this was the case for profound ADSD, but the severe group did not improve.
The present findings highlight the heterogeneity inherent in ADSD. Pretreatment severity of ADSD is a major source of heterogeneity that affected treatment outcome and should be controlled in future studies. The influence of aging should also be carefully considered. The present study examined changes following the first injection. It will be important to systematically evaluate the perceived effects of subsequent botulinum toxin type A injections on connected speech in those with ADSD. The computerized VAS method used in this study has been shown to be reliable and sensitive, and should prove useful for these applications.
Correspondence: Michael P. Cannito, PhD, School of Audiology and Speech Pathology, The University of Memphis, 807 Jefferson Ave, Memphis, TN 38105 (firstname.lastname@example.org).
Submitted for Publication: December 1, 2003; final revision received May 7, 2004; accepted August 19, 2004.
Funding/Support: This study was supported by grant 1-R15-DC/OD02299-01A1 from the National Institute on Deafness and Other Communication Disorders, National Institutes of Health, Bethesda, Md (Dr Cannito).
Acknowledgment: We thank Edward Brainerd, MS, for the development of the VAS computer software; Gerald Studebaker, PhD, for guidance on the setup and calibration of the instrumentation for the listening experiments; Corrine Ethington, PhD, for statistical consultation; and the 12 anonymous voice and fluency experts who participated in this study.