Skirko JR, Weaver EM, Perkins J, Kinter S, Sie KCY. Modification and Evaluation of a Velopharyngeal Insufficiency Quality-of-Life Instrument. Arch Otolaryngol Head Neck Surg. 2012;138(10):929-935. doi:10.1001/2013.jamaoto.122
Author Affiliations: Department of Otolaryngology–Head and Neck Surgery, University of Washington (Drs Skirko and Weaver), and Division of Pediatric Otolaryngology (Drs Perkins and Sie) and Childhood Communication Center (Ms Kinter), Seattle Children's Hospital, Seattle, Washington.
Objectives To modify the existing 45-item Velopharyngeal Insufficiency (VPI) Quality-of-Life (QOL) instrument (VPIQL), to assess the modified instrument for reliability, and to provide further validation.
Design Validation convenience sample from a previously conducted pilot study.
Setting Two academic tertiary referral medical centers.
Participants Deidentified data were used from 29 patients with VPI and 29 control patients aged 5 to 17 years and their parents.
Main Outcome Measures Patients and parents completed the VPIQL and a generic pediatric QOL instrument (Pediatric Quality of Life Inventory, Version 4 [PedsQL4-0]). Twenty-two items were removed from the VPIQL for ceiling effects, floor effects, and redundancy to produce the modified instrument: the VPI Effects on Life Outcomes instrument (VELO). The VELO was tested for internal consistency (Cronbach α), discriminant validity (paired t test with control patients), and concurrent validity (Pearson correlation with the PedsQL4-0). These analyses were also completed for the parents.
Results The 45-item VPIQL was reduced to the 23-item VELO, which had excellent internal consistency (Cronbach α, .96 for parents and .95 for patients with VPI). The VELO also discriminated well between the patients with VPI and the control patients, with a mean (SD) score that was significantly lower (worse) for patients with VPI (67.6 [23.9]) than for control patients (97.0 [5.2]) (P < .001). The VELO total score was significantly correlated with the PedsQL4.0 (r = 0.73) among the patients with VPI. Similar results were seen in parent responses.
Conclusions The VELO is a 23-item QOL instrument that was designed to measure and follow QOL in patients with VPI, with less burden than the original VPIQL. The VELO demonstrates internal consistency, discriminant validity, and concurrent validity with the PedsQL4-0.
Health-related quality of life (QOL) refers to the judgment of value that is placed on a patient's health-related experiences. Quality-of-life instruments can be categorized as generic or condition specific. Generic QOL instruments are able to capture QOL differences in children with a wide variety of difficulties. Condition-specific QOL instruments are tailored to measure how the condition affects children's QOL and are better able to detect changes in QOL that are important to patients.1 Velopharyngeal insufficiency (VPI) is a condition that affects speech, swallowing, and many psychosocial aspects of a child's life in a way that is different from other conditions. Children with VPI report a lower (or worse) QOL than peers without VPI.2 Generic QOL instruments may not be sensitive to these differences. Accurately measuring QOL in children with VPI is an area in need of further research.
One condition-specific measure that has been developed for children with VPI is the Velopharyngeal Insufficiency Quality-of-Life instrument (VPIQL), which was developed to capture the many ways that VPI affects children's lives. It was developed from focus groups composed of patients with VPI and their parents, with input from otolaryngologists and speech and language pathologists who had extensive experience caring for these children (K.C.Y.S., oral communication, December 2008). Developing content in this way is a crucial step in developing a QOL instrument and gives the VPIQL content validity.3 The development and initial limited validation produced an instrument with 48 items (or questions) that were organized into 7 domains. While the VPIQL was developed for this population with tailored content, its length, with 48 items, may render the instrument too burdensome for routine use. An ideal instrument would balance 2 competing interests: it would be short enough to minimize patient and family burden, while being long enough to fully capture all of the items relevant to VPI-specific QOL. The goals of this study were to condense the VPIQL and to evaluate the resulting shortened instrument in terms of reliability, discriminant validity, concurrent validity, and construct validity.
Focus groups were conducted to develop a list of items (questions) to measure the way that VPI affects children's lives. Focus group participants included patients with VPI, their parents, and a moderator (pediatric otolaryngologist, pediatric otolaryngology fellow, or speech and language pathologist). The content was recorded during the focus groups, and the moderator ensured that everyone's thoughts could be expressed. Individual focus groups were conducted until the group was not adding new items, and new focus groups were repeated until most items discussed were repeated twice (thematic saturation). This approach resulted in 3 focus groups, after which a national panel of clinicians who manage VPI reviewed the content in 2003 and additional items were added.2 This process produced a list of 48 items that were organized into 7 domains, including speech limitation, swallowing problems, situational difficulty, emotional impact, perception by others, activity limitation, and caregiver impact.
This study used data from the pilot study of the VPIQL, which was previously described.2 Briefly, patients aged 5 to 17 years with VPI diagnosed by an otolaryngologist or speech pathologist were recruited at 1 of 2 centers. Additional study participants were recruited from a retrospective review of administrative data sets. Potential study patients were identified by the International Classification of Diseases, Ninth Revision, code for VPI (750.29). Medical records were reviewed to ensure that inclusion criteria were met. A total of 29 patients with VPI were enrolled after informed consent was obtained. The VPI group had a mean age of 8.7 years (range, 5-15 years) and included 15 boys and 14 girls (Table 1). To test discriminant validity, 29 control patients without VPI and their parents were also randomly enrolled from the clinical practices. Additional inclusion criteria (for patients with VPI and control patients) included being a native speaker of the English language. Exclusion criteria for control patients included a previously diagnosed speech or language disorder or prior pharyngeal or laryngeal surgery. Deidentified data from this pilot study were collected after institutional review board approval was obtained from the University of Washington, Seattle, the University of Utah, Salt Lake City, and the University of Wisconsin, Madison.
The VPIQL and the Pediatric Quality of Life Inventory, Version 4.0 (PedsQL4.0), were completed by both patients and parents, with parents assisting their children as necessary. The questionnaires were completed at 1 time point.
The VPIQL is a 48-item VPI-specific QOL instrument with 7 domains, each including 3 to 10 items. The domains include speech limitation (9 items), swallowing problems (3 items), situational difficulty (10 items), emotional impact (9 items), perception by others (7 items), activity limitation (5 items), and caregiver impact (5 items). The domains can be thought of as different dimensions or elements of health-related QOL.4 The caregiver impact domain is included only in the parent version. Respondents are asked, “In the past 4 weeks, how much of a problem has your child had with [. . .]?” Items are presented with a response format of a 5-point Likert-type scale ranging from never to almost always. The instrument score is the average of all items and is converted to a 0 to 100-point scale, with 0 representing worse QOL. The domain scores are the average of all items in the domain, similarly converted to a 0 to 100-point scale. The VPIQL was previously shown to have discriminant validity, with lower QOL among patients with VPI than among control patients, and parents were shown to be adequate proxies for children's responses, using the data presented herein.2
The VPIQL was modeled after the PedsQL4.0, which is a 23-item validated generic pediatric QOL instrument.5,6 The items are organized into 4 domains (physical functioning, emotional functioning, social functioning, and school functioning). Because the VPIQL was modeled after the PedsQL4.0, they have similar prompts and Likert scale response formats. The PedsQL4.0 is also scored on a 0 to 100-point scale, with 0 representing worse QOL.
The item reduction analyses were conducted using responses from patients with VPI and their parents, with a number of analyses to identify redundant and poorly functioning questions. The statistical attributes of the VPIQL items were analyzed to identify large floor or ceiling effects. Items were marked for potential elimination if the endorsement frequency (proportion answering never) was greater than 50% or if the item-total correlation was less than 0.70. To identify potentially redundant items, the remaining items were tested for item-item correlation greater than 0.80. Internal consistency with Cronbach α was also calculated with the removal of each of the remaining items. There was no significant increase in α (no significant improvement in reliability without a given item), so no additional items were marked for elimination.
Each of the items marked for potential elimination was reviewed by a panel of clinicians managing VPI (2 pediatric otolaryngologists and 1 speech and language pathologist), and items were removed only if consensus was obtained. The panel reviewed the item to ensure that the content of the item being removed was still captured in the remaining items. For items marked for potential elimination because of item-item correlation, the items were reviewed by the panel to ensure that they contained related content.
The VPIQL was reviewed for readability to identify problematic items and wording. Readability was assessed by determining the Flesch-Kincaid Grade Level for the instrument, domains and individual items. The Flesch-Kincaide Grade Level is a formula that is used to provide an estimate of the average number of years in school that are required to understand a piece of written material.7 Items above the third-grade level in the youth version of the instrument and the sixth-grade level in the parent version were reviewed for potential rewording. Each item was reviewed to determine whether it contained individual words above the third- and sixth-grade levels for the youth and parent versions, respectively, using a standardized vocabulary list.8 Potential changes to the instrument were reviewed by the panel to obtain consensus.
Reliability and validation testing was conducted on the modified instrument. The reliability of an instrument is the degree to which repeated iterations of the instrument yield the same result.1 In this study, it was assessed by internal consistency testing using the Cronbach α.9 The Cronbach α was calculated for the reduced instrument and domains for all patients with VPI and then for subgroups of patients with VPI aged 5 to 9 years and 10 to 15 years. A Cronbach α of greater than 0.70 was considered acceptable.3
Validity testing, in general, assesses the extent to which the instrument is measuring what it purports to measure.10 There are a number of specific methods of validation, and using these methods can be thought of as accumulating evidence to support an instrument's validity. This study used a variety of analyses for validation. Discriminant validity tests an instrument's ability to detect a difference in QOL among patients with VPI and control patients. The primary analysis tested for a difference between mean total scores with a t test, and the secondary analysis tested for a difference between mean domain scores. P < .05 was considered statistically significant. Only discriminant validity used data from control patients and parents. Concurrent validity seeks to correlate the QOL instrument to another QOL instrument that has already undergone rigorous validation. It was assessed by calculating the Pearson correlation between the modified instrument total score and the PedsQL4.0, with a correlation of greater than 0.50 considered sufficient because it accounts for 25% of the variance in the modified instrument score. A correlation that is too high, eg, above 0.9, would suggest that the new instrument adds little information over the existing instrument. To further validate domain scores, the secondary analysis included correlation between domains, including VPIQL–emotional impact with PedsQL4.0–emotional functioning; VPIQL–perception by others with PedsQL4.0–school functioning; and both VPIQL–situational difficulty and VPIQL–perception by others with PedsQL4.0–social functioning.
Establishing construct validity involves a process of hypothesis testing of theorized associations.11 Principal factor analysis was conducted for construct validation on both VPI patient responses and parent responses. Principal factor analysis clusters items that are statistically related. More specifically, it is a method of identifying the underlying structure of the variance in item responses. The underlying statistical structure often suggests content domains of related items. The analysis produces factors (or latent variables) around which the item responses vary. Factors are sequentially analyzed and retained in the model (explaining less variance with each additional factor) until the latent variables do not significantly add to the model. The resultant factor loadings can be interpreted as the correlation of the QOL item to the underlying factor. Orthogonal varimax rotation was conducted, keeping factors with loadings of greater than 1.0. A scree plot of eigenvalues was reviewed to ensure that the appropriate number of factors were retained in the final model. The factor loadings of items after orthogonal varimax rotation were compared with the proposed (hypothesized) content domains, and factor loadings of greater than 0.5 were considered relevant.12
Assessment of parental response as a proxy for VPI patient response was assessed by testing the difference between the parent-reported total score and the VPI patient total score using the paired t test. This analysis was repeated for each domain score. To test the interrater reliability (comparing parents and patients), we calculated the intraclass correlation coefficient (ICC) for the total score as well as for each domain. An ICC of greater than 0.5 indicates at least moderate agreement. Because the parent-patient interrater reliability might be different for younger patients than for older patients, we divided the patients with VPI into those 9 years or younger and those 10 years and older.
The item reduction process identified 23 items for potential elimination. After review by the panel, 22 items were eliminated, which resulted in a 23-item instrument for patients with VPI and a 26-item instrument for parents. One item was retained to allow 3 items in the swallowing domain despite low item-total correlation. Many of the items were marked for potential elimination by multiple techniques. The overall composition of the modified VPIQL instrument, the VPI Effects on Life Outcomes (VELO), is the same as the original version, with questionnaires being administered to both children and their parents. The VELO instrument has 6 domains, including speech limitation (7 items), swallowing problems (3 items), situational difficulty (5 items), emotional impact (4 items), perception by others (4 items), and caregiver impact (3 items, answered only by parents). The domain of “Activity Limitation” was eliminated, with the 5 items each being eliminated from the instrument. In addition to the poor performance of the individual items, the initial subscale had a Cronbach α of .48 for parents and .69 for patients. The content of the individual items in this domain was retained in the remaining items. The instrument's initial prompt, response format, and scoring were not changed.
In the youth version that was administered to children, several words above the third-grade reading level, including nasal, depressed, abnormal, and perception, were identified. Items were edited to avoid these and other problematic words. The words difficult and difficulty appeared in several items and significantly increased the reading level for these items. The word difficult was rated at a third-grade reading level,8 but difficult and difficulty were changed to hard and trouble in the youth version to improve readability. One item was also identified that did not clearly match the response format. This item also had marginal performance on internal consistency testing and was correlated with several other items at the 0.75 level as well. It was removed after panel consensus. After modifications, the youth version had a Flesch-Kincaid Grade Level of 2.5, with 4 items having grade 4 or higher, and the parent version had a Flesch-Kincaid Grade Level of 3.7, with 5 items having grade 7 or higher. All parent items with grade 7 or higher had the word difficult or difficulty.
The Cronbach α for the modified instrument total score was .96 for parents and .95 for patients with VPI, and each domain had a Cronbach α greater than .70 (Table 2). The VELO also had adequate internal consistency for each age group, with a Cronbach α of .96 for patients aged 5 to 9 and .95 for patients aged 10 to 15 years.
The parent-reported mean (SD) VELO score was significantly lower for patients with VPI than for control patients (61.4 [21.4], 98.1 [4.0], P < .001). Similarly, the VPI patient-reported mean (SD) VELO score was significantly lower than that of the control patients (67.6 [23.9], 97.0 [5.2], P < .001). Lower scores indicate a worse QOL. Each of the VELO domains also had discriminant validity (P < .01) (Table 3).
Both parent-reported and patient-reported VELO total scores were significantly correlated with the PedsQL4.0 scores (Pearson correlation coefficient, r = 0.78, P < .001, and r = 0.73, P < .001, respectively). Secondary analysis of hypothesized domain correlations showed that VELO–emotional impact and PedsQL4.0–emotional functioning were sufficiently correlated for parent reports (r = 0.59) but not for patient reports (r = 0.42). Similarly, VELO–perception by others and PedsQL4.0–school functioning were sufficiently correlated for parent reports (r = 0.52) but not for patient reports (r = 0.45); VELO–situational difficulty was sufficiently correlated with PedsQL4.0–social functioning (r = 0.56 and r = 0.55 for parent reports and patient reports, respectively) as was VELO–perception by others (r = 0.63 and r = 0.68).
Factor analysis of VPI patient responses resulted in a 4-factor solution that explained 77.5% of the variance in VELO responses. The parent's responses initially resulted in a 5-factor solution, with the fifth factor having an eigenvalue of 1.07 and with 1 item loading on this factor. A 4-factor solution was chosen, as the fifth factor was associated with only 1 item. The 4-factor solution of parent responses explained 75.1% of the variance. The factor loading after varimax rotation largely followed hypothesized domains (Table 4), although items from several domains loaded on the same factor. Among patients with VPI and parents, speech limitation items loaded on several factors, although factor 2 in the VPI patient responses had fairly high loading for all items except speech question 7. Swallowing difficulty items loaded on the same factor and were associated with several of the speech limitation items for both groups. Among patients with VPI, the situational difficulty items and the emotional impact items loaded highly on the same factor (factor 1), while among parents, emotional impact items and perception by others items loaded on the same factor (factor 1). Caregiver impact items loaded highly on the same factor as situational difficulty. Overall, the items largely loaded on the hypothesized domains, with the speech limitation items loading on several different factors. Among parent responses, factor 1 represents emotional impact and perception by others; factor 2 represents situational difficulty and caregiver impact; factor 3 represents swallowing problems; and factor 4 represents speech limitations. Among patient responses, factor 1 represents situational difficulty and emotional impact; factor 2 represents speech limitation and swallowing problems; and factors 3 and 4 represent perception by others and situational difficulty. Oblique rotations were also attempted in case the underlying factors were correlated, which did not significantly change the interpretation of the factor loadings.
Parent ratings of their VPI patient's QOL are analogous to a second rater for the patient's VPI, and we compared parent ratings with patient ratings with a test of interrater reliability using the ICC. Parents reported a lower or worse mean (SD) VELO total score (61) than patients with VPI (68 , P = .05), which was driven largely by the 2 domains of speech limitation and situational difficulty. Despite this difference in mean scores, the parent proxy report is reasonable, with an ICC greater than 0.6 for the total score as well as for the domains. To ensure that there was not a difference between the proxy reliability of younger and older patients with VPI, the ICC was calculated for those up to 9 years old and for those 10 years and older. The ICC may be smaller for the older group but was only less than 0.6 for the domains of emotional impact and perception by others. The sample size in these subgroups may limit the interpretation of the age-specific ICC.
This study provided an important step in the refinement of a QOL instrument for evaluating children with VPI. Most previous studies related to VPI have used postoperative perceptual speech analysis (by speech and language pathologists) or closure of the velopharyngeal orifice by endoscopic examination as their primary surgical outcome. There is a paucity of patient-reported outcomes of validated condition-specific functional status or QOL. Aside from the VPIQL, the Pediatric Voice Outcome Survey (PVOS) has been used in a small study of patients with VPI (n = 12) and was found to be responsive to changes in QOL after surgical correction.13 The PVOS is a 4-item instrument that was modified from the adult version and validated in a general pediatric otolaryngology patient population.14,15
While the PVOS has the advantage of low patient time burden, with just 4 items, it likely does not measure many of the issues that are important to children with VPI. Conversely, the 48-item VPIQL is too long for routine use. The 48-item VPIQL was modified to preserve content validity. The item reduction analysis was conducted to reduce the patient burden (from 45 items to 23 items for patients with VPI and from 48 items to 26 items for parents), while maintaining important concepts and content. With the elimination of poorly performing items, the domains initially established were largely retained.
Ensuring the readability of an instrument is an important and recommended step3 that is sometimes overlooked when a new instrument is being developed. In addition to improving the readability of the instrument, the process of review and panel discussion helps to ensure thorough and thoughtful review of each and every item for content and wording. The modifications to the VPIQL (48-item instrument) to produce the new VELO will hopefully improve the functioning of the instrument in future studies.
The internal consistency testing with the Cronbach α shows that the instrument, as well as all of the domains, appears internally reliable. The original 48-item instrument had an overall α of .97 for both patients with VPI and parents, indicating redundancy. The Cronbach α for the total instrument may still indicate redundancy, but the current length is necessary to achieve adequate content. Because repeated measures will be necessary for future longitudinal studies, test-retest reliability should be conducted in future studies. The initial study of the 48-item VPIQL used 1 time point, so test-retest reliability could not be conducted. Future test-retest reliability will ensure that item scores (and domain scores) are stable enough to analyze changes in QOL. The internal consistency testing done in this study is an important first step in reliability testing.
The modified instrument (VELO) retained its ability to detect differences in QOL among patients with and without VPI (discriminant validity). The instrument total score retained discriminant validity, as did all of the domains. Also, the VELO was shown to have concurrent validity (correlation with a previously validated instrument) with the generic pediatric QOL instrument, which helps to show that the VELO is measuring QOL, though in a way more specific to VPI. Condition-specific QOL measures have been shown to be better able than generic instruments to detect change (responsiveness) in QOL, which is an important goal for this instrument. Pretreatment and posttreatment longitudinal measurements were not collected in this sample, so responsiveness testing with these data was not possible. Future responsiveness testing will be important to determine whether the VELO will be useful for outcomes studies.
The factor analysis conducted in this study provides some first steps toward construct validation. Construct validity seeks to confirm hypothesized correlations related to the responses. Factor analysis is a statistical tool that analyzes the underlying association among a group of variables.16 When used with a priori hypotheses, it allows content validation of an instrument's domains, showing that item responses are correlated along the hypothesized domains. If an instrument measured only 1 domain, the hypothesis would be that all items would load on 1 factor. In our analysis, the factor loadings largely followed the hypothesized content domains, although some of the domains showed overlap in the underlying factor. The domains of situational difficulty, emotional impact, and perception by others may all draw from an underlying domain of psychosocial difficulty. Adequate sample size for factor analysis is typically described as 10 times the number of items,3 so these results should be interpreted with caution and need to be repeated in future studies. When factor analysis is conducted in future studies, a larger sample size will be essential to further understanding of the underlying associations.
Criterion-related validation (validation against a “gold standard” measure) is also necessary with this instrument. While no true criterion standard exists for VPI, perceptual speech analysis is the most widely accepted and used measure in the diagnosis of VPI,17,18 and validation against this measure should be conducted. We did not have access to the perceptual speech analysis results for this cohort.
This analysis supports parent proxy assessment of VPI-specific QOL. Parents report worse QOL related to speech limitation and situational difficulty (Table 5 and Table 6), which might reflect different emotional reactions by patients and parents when the patients are faced with these difficulties. Despite the lower reported VELO score among parents, the interrater reliability for parents and patients with VPI is adequate (Table 6). These data support the initial research and discussion of parent proxy for patients with VPI in the initial 48-item VPIQL study.2
Understanding and measuring QOL are important for understanding and advancing the treatment of children with VPI. Having a rigorously tested and refined instrument is necessary to measure patient-centered outcomes. The VELO has been refined to reduce the time burden on participants and to improve readability, while maintaining content validity. Future studies should be conducted to test this instrument further and are currently under way by our group. This work will provide a foundation for future investigations of the impact of VPI and treatment outcomes with a focus on a patient-centered measure.
Correspondence: Jonathan R. Skirko, MD, MHPA, MPH, Department of Otolaryngology–Head and Neck Surgery, University of Washington, 1959 Pacific St NE, PO Box 356515, Seattle, WA 98195 (firstname.lastname@example.org).
Submitted for Publication: April 24, 2012; accepted May 14, 2012.
Author Contributions: Dr Skirko had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: Skirko, Weaver, Kinter, and Sie. Acquisition of data: Perkins and Kinter. Analysis and interpretation of data: Skirko, Weaver, and Sie. Drafting of the manuscript: Skirko and Kinter. Critical revision of the manuscript for important intellectual content: Skirko, Weaver, Perkins, Kinter, and Sie. Statistical analysis: Skirko and Weaver. Obtained funding: Weaver. Administrative, technical, and material support: Weaver, Perkins, Kinter, and Sie. Study supervision: Weaver, Perkins, and Sie.
Financial Disclosure: None reported.
Previous Presentation: This article was presented at the American Society of Pediatric Otolaryngology 2012 Annual Meeting; April 22, 2012; San Diego, California.
Additional Contributions: Susan Thibeault, PhD (University of Wisconsin), and colleagues assisted in providing the deidentified data from the initial pilot study. Dr Thibeault also assisted with the modification of the VPIQL. Carole Hooven, PhD (University of Washington), assisted with the development of this project to modify the 48-item VPIQL as well as with the item reduction and readability portions of the project.