The Health Utilities Index Mark 3 (HUI3) was the only instrument able to discriminate based on site. EQ-5D indicates EuroQol Questionnaire; SG, standard gamble; TTO, time trade-off; and VAS, visual analog scale. Error bars indicate SD.
aP = .02.
Because the scores are not normally distributed, the assumption of linear regression was not met. Therefore, we used the Spearman rank order correlation coefficient (ρ) to test whether the utility scores monotonically increase over time. Circles represent individual scores.EQ-5D indicates EuroQol Questionnaire; HUI3, Health Utilities Index Mark 3; SG, standard gamble; TTO, time trade-off; and VAS, visual analog scale.
Noel CW, Lee DJ, Kong Q, Xu W, Simpson C, Brown D, Gilbert RW, Gullane PJ, Irish JC, Huang SH, O’Sullivan B, Goldstein DP, de Almeida JR. Comparison of Health State Utility Measures in Patients With Head and Neck Cancer . JAMA Otolaryngol Head Neck Surg. 2015;141(8):696-703. doi:10.1001/jamaoto.2015.1314
Accurate measurement of health state utilities (HU) is the cornerstone for cost-utility analyses and the valuation of quality of life for given health states. Current indirect methods of HU derivation lack face validity for patients with head and neck cancer. The appropriateness of these measures compared with direct methods, such as the standard gamble (SG), time trade-off (TTO), and visual analog scale (VAS), have not been assessed in this patient population.
To assess the convergent and construct validities of 5 different HU derivation methods in patients with head and neck cancer.
Design, Setting, and Participants
In a cross-sectional study, we recruited 100 consecutive patients with squamous cell carcinoma of the upper aerodigestive tract treated in the outpatient surgical oncology clinics of the Princess Margaret Cancer Centre from August 1 through October 31, 2014. We enrolled patients with a minimum of 3 months of follow-up after completion of treatment and no evidence of recurrent or metastatic disease. Participants completed SG, TTO, and VAS exercises, the EuroQoL instrument (EQ-5D), and the Health Utilities Index Mark 3 (HUI3) questionnaire. Data analysis was performed November 1 through December 15, 2014.
Head and neck cancer and HU measures.
Main Outcomes and Measures
We assessed convergent validity of the 5 HU instruments through Spearman rank order correlation assessment. We determined construct validity through a priori hypotheses relating HU scores with clinical indexes of disease severity.
The SG and TTO measures generated higher mean (SD) utility scores (0.91 [0.17] and 0.94 [0.14], respectively) than the VAS, EQ-5D, and HUI3 (0.76 [0.19], 0.82 [0.18], and 0.75 , respectively) (P < .001). The maximum score of 1.0 was reported in 60 of 99 cases (61%) for the SG and 75 of 99 cases (76%) for the TTO (a significant ceiling effect), in contrast to 5 of 99 cases (5%) for the VAS, 29 of 99 cases (29%) for the EQ-5D, and 6 of 99 cases (6%) for the HUI3. The VAS showed strong correlations with the EQ-5D (ρ = 0.63 [P < .001]) and HUI3 (ρ = 0.50 [P < .001]), and the HUI3 strongly correlated with the EQ-5D (ρ = 0.67 [P < .001]), whereas the SG and TTO generally correlated poorly with other HU measures (ρ range, 0.19-0.29) and with one another (ρ = 0.21 [P < .001]). The VAS, EQ-5D, and HUI3 were able to discriminate between participants who underwent salvage surgery compared with those who underwent primary surgery (mean [SD] utility scores, 0.48 [0.13] vs 0.76 [0.20] [P = .006], 0.62 [0.17] vs 0.83 [0.19] [P = .004], and 0.37 [0.29] vs 0.78 [0.22] [P = .004], respectively). Mean EQ-5D utility scores monotonically increased over time since completion of treatment (0.26 [P = .01]). The HUI3 yielded lower utility values for participants with laryngeal cancer (mean [SD], 0.59 [0.29]). The SG and TTO measures frequently generated utility scores that contradicted our hypothesized expectations.
Conclusions and Relevance
Indirect HU measures may be more reflective of the health status of patients with head and neck cancer than direct measures. Current instruments lack face validity for attributes germane to this population.
The treatment of squamous cell carcinoma of the head and neck often requires multimodal therapy, with implications for resources and quality of life (QOL).1 Treatment-related toxic effects often lead to considerable morbidity, such as functional deficits in speech and swallowing.2 With the recent movement toward value-based care and the inclusion of resource and economic outcomes in clinical trials, appropriate utility measures for this population are needed.
Cost-utility analyses are based on the premise that an incremental (or decremental) cost of any new modality of treatment must be weighed against the incremental (or decremental) health benefit that the new modality offers. Health state utilities (HU) measure the effectiveness of any given treatment modality by adjusting the life expectancy of these patients by their QOL as a result of the treatment. The value that one ascribes to a particular health state typically varies from 0 (death) to 1 (perfect health). Elicitation of utilities has traditionally used direct methods, such as the time trade-off (TTO), standard gamble (SG), and visual analog scale (VAS).3- 5 Indirect methods of generating HU involve standardized questionnaires, such as the EuroQoL Questionnaire (EQ-5D)6 and the Health Utilities Index Mark 3 (HUI3).7
At present, whether HU is best elicited directly or indirectly remains controversial. From an economics perspective, the SG and TTO are the most accepted methods because they are based implicitly on utility theory and involve an inherent gamble or trade-off. However, indirect methods are used more frequently owing to their ease of administration and stronger correlation with other measures of health states.8
Regardless of the method used, the generated HU must be reliable and capable of discriminating between different disease severities. Investigators have documented that the direct methods of estimating HU often yield higher health ratings than the indirect methods.9 Moreover, a recent systematic review10 showed that cost-effective analysis within the realm of head and neck cancer frequently yields conflicting findings and that the scarcity of comparative QOL data poses a unique challenge. Although other disciplines have made large efforts to assess the validity of various utility instruments, comparable literature in head and neck cancer is limited.11- 14 A recent study, however, suggests that innumeracy in patients with head and neck cancer may result in inaccurate HU scores.15
At present, several large economic analyses are being conducted that use the aforementioned utility instruments, despite the fact that their consistency has never been fully evaluated.10 Therefore, the aims of our study were (1) to determine the convergent validity of the utility values generated by 5 different HU instruments for patients who have been treated for squamous cell carcinomas of the head and neck and (2) to evaluate the construct validity of these instruments using a priori hypotheses that relate utilities with clinical variables believed to affect one’s health state.
From August 1 through October 31, 2014, patients with squamous cell carcinoma of the upper aerodigestive tract were recruited consecutively from outpatient surgical oncology clinics at the Princess Margaret Cancer Centre in Toronto, Ontario, Canada. All participants underwent treatment of their primary disease with surgery or radiotherapy from 3 months to 3 years previously. Participants with metastatic or recurrent disease, non-English speakers, and those with known cognitive conditions were excluded from the study. The study received research ethics board approval from the University Health Network, and we obtained informed written consent from all the participants.
Trained interviewers (C.W.N, D.J.L.) administered a 20-minute survey to eligible participants. The interviewers encouraged the participants to seek clarification as needed. The survey consisted of a demographic and socioeconomic questionnaire, the indirect HU questionnaires (EQ-5D and HUI3), and exercises for direct HU derivation (SG, TTO, and VAS).
The SG offers participants a gamble between 2 alternatives. In the first alternative, participants have the option of staying in their current heath state for the rest of their lives. In the second alternative, participants have the option of taking an imaginary pill that has a chance of rendering perfect health, but with a risk for death.16 Participants were asked to decide between the 2 alternatives. The risk for death with the imaginary pill was varied in an iterative fashion until participants were indifferent between the 2 alternatives. The HU on the SG is the lowest accepted indifference probability. Options were presented verbally and on paper to aid participant comprehension.
The TTO involves participants imagining the following 2 hypothetical life courses: (1) remain in their current health state for 10 years or (2) accept a shorter life span in perfect health.17 Options were presented verbally and on paper for comprehension. Participants were instructed to indicate the longest amount of time they would be willing to trade off to have perfect health. The HU was calculated as the proportion of years in perfect health.
As part of the EQ-5D, participants were asked to place a mark on a feeling thermometer from 0 (worst imaginable health state) to 100 (best imaginable health state). The rating on the VAS was then converted to a score from 0 to 1.
All participants also completed the EQ-5D, which consists of items pertaining to 5 attributes (mobility, self-care, usual activities, pain and/or discomfort, and depression and/or anxiety). Participants were asked to select 1 of 5 levels for each attribute.18
All participants completed the HUI3. This instrument consists of 8 attributes (vision, hearing, speech, ambulation, dexterity, emotion, cognition, and pain and/or discomfort) with 5 or 6 levels per attribute on an ordinal continuum, which allows for the grading of 972 000 discrete health states. Scores are assigned to each attribute and are combined using the following formula,19 where u indicates each attribute:
Utility = [1.371 (u1 × u2 × u3 × u4 × u5 × u6 × u7 × u8)] − 0.371.
We abstracted demographic, pathologic, and treatment-related details from the participants’ medical records. Clinical measures collected include tumor site, tumor stage, treatment modality, time since completion of treatment, and the presence of a feeding and/or a tracheotomy tube.
We performed data analysis from November 1 through December 15, 2014. Descriptive statistics were provided. Based on the Kolmogorov-Smirnov test,20 none of the 5 HU scores were normally distributed. As a consequence, nonparametric tests were used for the correlative analysis. We used Spearman rank order correlation coefficients (ρ) to determine correlations between utilities and the Kruskal-Wallis test to compare utility scores in subgroups. The former test examined associations between continuous variables. Coefficients of greater than 0.60, 0.40 to 0.59, 0.21 to 0.39, and 0.20 or less were considered a strong, a moderate, a weak, and no correlation, respectively.
The minimal clinically important difference for each HU instrument was calculated through a distribution-based method. We defined the threshold of discrimination for each HU instrument as half of the standard deviation for the generated HU score.21
The construct validity of the 5 HU instruments was assessed by relating utility scores with clinical indexes of disease that are known to have a negative effect on one’s health state and/or QOL.22 We hypothesized a priori that salvage surgery, the addition of chemotherapy, the presence of a tracheotomy and/or a gastrostomy feeding tube, an advanced tumor stage (III or IV), the site of the primary tumor (larynx compared with the oropharynx and oral cavity), and less time since completion of treatment would be associated with lower HU scores.22- 24 Statistical analysis was conducted using SAS (version 9.3; SAS Institute Inc) and R (R Foundation).
One hundred twenty-six patients who met the inclusion criteria were invited to participate. Of those approached, 100 (79.4%) completed the survey. Seventy-five participants were male (75%) and 76 participants were white (76%). The mean age was 61 (range, 53-69) years. Sixty-seven participants (67%) had primary tumors confined to the oral cavity. Forty-five tumors (45%) were of an advanced stage (III or IV). Most participants (54 [54%]) underwent primary surgery, followed by surgery with adjuvant radiotherapy (21 participants [21%]) and primary radiotherapy (12 [12%]). Further information pertaining to the demographics and clinical characteristics of this population can be found in Table 1.
All participants completed the EQ-5D and the HUI3. One participant declined to complete the SG and TTO owing to personal religious beliefs. All 5 HU measures were found to be nonnormally distributed using Kolmogorov-Smirnov test statistics (TTO, 0.43 [P < .001]; SG, 0.33 [P < .001]; VAS, 0.16 [P = .009]; EQ-5D, 0.16 [P = .009]; HUI3, 0.16 [P = .01]).
The mean (SD) HU scores derived from the SG and TTO were similar at 0.91 (0.17 [range, 0.2-1.0]) and 0.94 (0.14 [range, 0.3-1.0]), respectively. A score of 1.0, equivalent to full health, was reported in 60 of 99 cases (61%) for the SG and in 75 of 99 cases (76%) for the TTO. Conversely, the remaining HU instruments gave the following lower mean (SD) HU scores with wider ranges: VAS, 0.76 (0.19 [range, 0.2-1.0]); EQ-5D, 0.82 (0.18 [range, −0.07 to 1.0]); and HUI3, 0.75 (0.25 [range, −0.06 to 1.0]). Only 5 participants (5%) reported a score of full health for the VAS; 29 participants (29%), for the EQ-5D; and 6 participants (6%), for the HUI3. Within participants, HU derived from the VAS, EQ-5D, and HUI3 were significantly lower than HU derived from the SG and TTO (P < .001). Using a distribution-based method, the minimal clinically important difference for each HU instrument was 0.08 for the SG, 0.07 for the TTO, 0.10 for the VAS, 0.09 for the EQ-5D, and 0.12 for the HUI3.
We found only a moderate correlation among the HU scores derived through the 5 different measures. The EQ-5D and the HUI3 were strongly correlated with one another (ρ = 0.67 [P < .001]) (Table 2). The VAS showed strong correlations with the EQ-5D (ρ = 0.63 [P < .001]) and HUI3 (ρ = 0.50 [P < .001]). The SG and TTO demonstrated weak correlations with each other (ρ = 0.21 [P < .001]) and with other HU instruments (ρ range, 0.19-0.29).
Table 3 demonstrates HU by disease severity using the different utility measures.22- 25 The HU scores for salvage surgery using the VAS, HUI3, and EQ-5D were significantly lower compared with primary surgery (0.48 vs 0.76 [P = .006], 0.62 vs 0.83 [P = .004], and 0.37 vs 0.78 [P = .004], respectively). Similarly, the use of chemotherapy was associated with lower HU scores than no chemotherapy for the VAS (0.66 vs 0.77 [P = .03]) and EQ-5D (0.76 vs 0.83 [P = .04]), although the latter did not represent a meaningful clinical difference. When the SG was used, more advanced oral cavity tumors (T3 and T4) had lower HU scores than did smaller tumors (0.87 vs 0.95 [P = .007]). None of the HU instruments identified any significant differences in participants who had a tracheotomy and/or a feeding tube present compared with those who did not. When stratified by the site of the primary tumor, the HUI3 yielded lower utility values for participants with laryngeal cancer (0.59) compared with those with oropharyngeal or oral cavity cancers (0.76 and 0.78, respectively [P < .01]) (Figure 1). Utility scores monotonically increased with time since completion of treatment using the EQ-5D (0.26 [P = .01]) and HUI3 (0.19 [P = .06]), although the latter was not statistically significant (Figure 2). The SG and TTO frequently generated HU scores that contradicted most of our hypothesized expectations.
In the present study, we evaluated the performance of 5 different HU instruments. We showed that HU derived from the VAS, EQ-5D, and HUI3 are significantly lower than those derived from the SG and TTO, a finding that is consistent with those for other cancers and chronic disease processes.9 The tendency for the VAS to produce lower HU scores than the SG and TTO is well accepted, as is the observation that indirect methods typically yield lower HU values than direct methods.5,26 The reason for the latter observation may be explained by methodologic differences between the 2 measures. Direct methods capture the values that patients assign to their own health state. In contrast, indirect methods are derived from algorithms that attribute a utility score to one’s health state based on the values of the general public. This distinction is meaningful because investigators have demonstrated that patients in general ascribe a higher QOL to their own condition than would be ascribed by the general population.27,28
Direct methods also imply an element of risk-taking behavior. For risk-averse participants, direct methods such as the SG or TTO may not be ideal methods of deriving utility scores. In this instance, the risk of dying from the use of a pill or the notion of a TTO may seem unrealistic for the participants to consider accurately, especially given that all participants were in disease remission and had promising chances of long-term survival. In future studies, inclusion of more moderate health states, such as different levels of functional disability in direct measures, may be more useful. This method is referred to as chaining and has been implemented to determine HU for patients in other disciplines.3,4,29- 31
Another corollary to the risk-averse nature of the participants is a ceiling effect that we observed in the HU scores derived by SG and TTO. The maximum score of 1.0 was reported in 61% of cases for SG and 76% of cases for TTO, in contrast to 5% of cases for VAS, 29% of cases for EQ-5D, and 6% of cases for HUI3. Given that SG and TTO produced so many ceiling values, we are not surprised that these instruments were not able to discriminate between different health states in a population of patients with head and neck cancer.
A noticeable drawback of the SG and TTO methods was the poor correlation with other utility measures. The VAS, EQ-5D, and HUI3 showed stronger correlation with one another than with the SG and TTO. The poor correlation with these methods and others indicates that perhaps the SG and TTO methods lack convergent validity and that these exercises may not be ideal for patients with head and neck cancer.
In the present study, we failed to confirm construct validity with all of the various constructs. Although the type of treatment (salvage vs primary surgery), addition of chemotherapy, tumor site, and time since completion of treatment seemed to differentiate participant groups using some utility measures, other clinical variables, such as tumor classification and presence of a tracheotomy or feeding tube, did not show a strong correlation with the utility instruments. This finding may be attributable to the fact that health status may correlate better with other demographic or psychosocial factors. Furthermore, the clinical indexes chosen may not be representative of what a participant takes into account for directly determining health status. Of the 5 measures, the EQ-5D and HUI3 seemed to discriminate between various disease severities more consistently than the other measures. In most instances, the mean utility scores of the EQ-5D and HUI3 followed our hypothesized expectations. This result is in contrast to our finding for the SG and TTO, which routinely contradicted our hypotheses.
Using distribution-based methods, we concluded that the minimal clinically important difference for our 5 HU instruments varied from 7% to 12%. This result was comparable to the range of 5% to 10% for patient-reported outcomes in a population with laryngeal cancer.32 In all but 1 instance, the statistically significant variation in HU scores also represented a clinically meaningful difference. Future work would benefit from anchor-based methods, which were not in the scope of our current investigation.
Of the indirect measures, the HUI3 seems to have better face and content validity for patients with head and neck cancer. This multidimensional HU derivation method captures several functional impairments, such as speech, hearing (in the case of participants who receive chemotherapy), and mobility. Participants with laryngeal cancer, for example, demonstrated lower utility scores than those with oropharyngeal cancer or oral cavity cancer using the HUI3, perhaps owing to the fact that this instrument has a speech dimension.33 However, several other attributes that are particularly germane to the population with head and neck cancer, such as swallowing function, neck mobility, and physical deformity, are noticeably absent from these instruments, perhaps suggesting the need for a disease-specific instrument that includes these attributes.
This study has a number of limitations. First, our sample size was relatively small, which could account for the fact that although several of our measures approached significance, many fell just below the threshold of P < .05. Second, our cohort was heavily skewed to patients with cancer of the oral cavity and who underwent surgery, which may affect the generalizability of our results to other populations with head and neck cancer. Third, all questionnaires were administered in the same order to all of our study participants, which may have biased our comparison measures through fatigue. Nevertheless, administration order has been found to have a marginal effect on health-related QOL measurements.34,35 Furthermore, our study is cross-sectional, and future studies would benefit from baseline and longitudinal follow-up. Finally, given the suboptimal construct validity demonstrated by all 5 HU instruments, correlations with demographic, psychosocial, or other QOL questionnaires, such as those with a swallowing domain, may be indicated in future studies.36 Although anchoring the scores to a QOL questionnaire specific to head and neck cancer was beyond the scope of this investigation, doing so would help to delineate which functional aspects of head and neck cancer affect an individual HU state. Given that the existing HU measures lack dimensions for swallowing, neck and shoulder function, and physical deformity, a new HU instrument with these dimensions may be warranted for patients with head and neck cancers.
The marked differences between the HU values generated by the SG, TTO, VAS, EQ-5D, and HUI3 have important health policy implications. To our knowledge, this study is the first to demonstrate that SG and TTO generate significantly higher mean HU scores than the VAS, EQ-5D, and HUI3 in a population with head and neck cancer. In addition, our findings also demonstrate that the VAS, EQ-5D, and HUI3 showed stronger correlation with one another than with the SG and TTO. Despite limitations in face validity, indirect methods for utility derivation (HUI3 and EQ-5D) also seem to be more capable of discriminating utility differences between subsets of patients with head and neck cancer and correlate well with each other when compared with direct methods (SG and TTO). As a result, indirect measures may be more appropriate for cost-utility determinations in prospective head and neck cancer trials. Furthermore, the lack of face validity of existing instruments suggests a need for more comprehensive HU derivation questionnaires with attributes germane to the population with head and neck cancer.
Submitted for Publication: March 5, 2015; final revision received May 20, 2015; accepted June 2, 2015.
Corresponding Author: John R. de Almeida, MD, MSc, Department of Otolaryngology–Head and Neck Surgery, Princess Margaret Hospital–University Health Network, 610 University Ave, Room 3-955, Toronto, ON M5G 2C4, Canada (email@example.com).
Published Online: July 23, 2015. doi:10.1001/jamaoto.2015.1314.
Author Contributions: Mr Noel and Dr de Almeida had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Noel, Irish, Goldstein, de Almeida.
Acquisition, analysis, or interpretation of data: Noel, Lee, Kong, Xu, Simpson, Brown, Gilbert, Gullane, Huang, O’Sullivan, Goldstein.
Drafting of the manuscript: Noel, Lee, Simpson, Huang, de Almeida.
Critical revision of the manuscript for important intellectual content: Kong, Xu, Brown, Gilbert, Gullane, Irish, Huang, O’Sullivan, Goldstein, de Almeida.
Statistical analysis: Noel, Kong, Xu, Huang, de Almeida.
Administrative, technical, or material support: Noel, Lee, Simpson, O’Sullivan, de Almeida.
Study supervision: Xu, Brown, Gilbert, Gullane, Irish, de Almeida.
Conflict of Interest Disclosures: None reported.