Funk GF, Karnell LH, Smith RB, Christensen AJ. Clinical Significance of Health Status Assessment Measures in Head and Neck CancerWhat Do Quality-of-Life Scores Mean?. Arch Otolaryngol Head Neck Surg. 2004;130(7):825-829. doi:10.1001/archotol.130.7.825
To determine the magnitude of clinically significant differences in domain scores for a quality-of-life questionnaire specific to head and neck cancer; and to demonstrate a clinically relevant method of presenting head and neck cancer–specific quality-of-life data using cutoff scores and clinical anchors.
Anchor-based and distribution-based techniques for determining clinically significant differences in health-related quality-of-life scores were used.
University-based tertiary care hospital.
A total of 421 patients with head and neck cancer enrolled in a longitudinal outcomes project.
Main Outcome Measures
The Head and Neck Cancer Inventory; clinical anchor health status in the domains of speech, eating, and social disruption; and distribution-based clinically significant score differences.
Clinical anchor health states representing incremental levels of dysfunction were significantly correlated with domain scores for eating, speech, and social disruption. The anchor-based clinically important difference magnitudes were consistent with the values obtained using distribution-based techniques. For mean domain scores (minimum, 0; maximum, 100), differences of approximately 4, 10, and 14 or greater represented small, intermediate, and large clinically significant differences, respectively. Stratifying mean domain scores into low (0-30), intermediate (31-69), and high (70-100) categories allowed presentation of the health-related quality-of-life data in a clinically relevant format.
This study provides benchmarks for small, intermediate, and large clinically significant changes in scores and demonstrates the presentation of health-related quality-of-life data in a clinically useful format.
An increased awareness of the need to incorporate health-related quality-of-life (HRQOL) information into the outcome assessment of patients with head and neck cancer (HNC) has been demonstrated by the number of recent publications addressing this issue and by a recent international conference devoted to the measurement and reporting of quality of life in HNC held in McLean, Va, in October 2002. The HRQOL outcomes are particularly important in HNC management because the various treatment modalities may offer equivalent survival but potentially different HRQOL outcomes.1 There are currently a variety of validated, HNC disease-specific quality-of-life surveys in use.2 As the use of these instruments in clinical practice and research increases, and as HRQOL scores are more frequently reported as clinical outcomes in the literature, it will be important for clinicians and researchers to be able to interpret these scores in a clinically useful way. For that to happen, two fundamental conditions must be met. First, data must be presented in formats that are clinically meaningful. The comparison of mean HRQOL scores representing abstract domains of health functioning rarely imparts a clinically useful message. Second, the magnitude of clinically significant intragroup changes or intergroup differences in HRQOL scores needs to be determined and reported. The simple reporting of a difference in HRQOL scores between 2 groups of patients rarely gives the clinician information that can be acted on.
A considerable amount of research has addressed methods by which clinical significance can be derived from HRQOL survey scores.3 In general, these methods fall into 2 groups: (1) anchor-based and (2) distribution-based. Anchor-based methods link HRQOL scores to clinical anchor health states (eg, the average HRQOL score for patients with a tracheostomy is determined). Anchor-based methods also frequently involve determining the difference in HRQOL scores between patients in different anchor health states (eg, the difference in HRQOL scores between patients taking a full diet vs a soft diet). For a clinical anchor to be used in the evaluation of HRQOL scores, 2 criteria must be met: (1) the clinical anchor must be associated with the HRQOL score and (2) the anchor health state must be clinically understandable or relevant to those interested in interpreting the HRQOL scores.4
Distribution-based or standardized approaches are based primarily on statistical distributions rather than linking scores to an external anchor. The most commonly used of these is the effect size (ES),4 which is defined as the difference score divided by the standard deviation of the baseline scores: ES = [(mean HRQOL score 2 − mean HRQOL score 1)/standard deviation of baseline HRQOL scores for the population of interest]. Cohen5 derived benchmarks for the magnitude of the ES, with an ES of 0.2 being "small," an ES of 0.5 being "moderate," and an ES of 0.8 or greater being "large." These benchmark values have been empirically confirmed in clinical studies.4,6,7 However, it is recognized that these benchmarks are approximations and, when used, should be evaluated with other independent measures of effect size.4
In this study, anchor-based and distribution-based methods were used to determine the magnitude of small, intermediate, and large clinically significant differences in domain scores for the Head and Neck Cancer Inventory (HNCI), a validated HNC-specific survey that addresses speech, eating, aesthetics, and social disruption in patients with HNC. To demonstrate a method of presenting HRQOL data in a clinically relevant format, HNCI domain scores were grouped as high, intermediate, and low and presented with the corresponding clinical anchor health states.
Data used in this study were obtained from all patients in the Outcomes Assessment Study (OAS) who filled out the HNCI between August 1997 and August 2002 and whose participation in the project was complete (because of having finished 12 months or having withdrawn or died). The 30-item HNCI has a 1 to 5 ordinal response scale, which is transformed to a 0 to 100 range for ease of interpreting the results. (A detailed description of the validation and scoring of this survey has been previously reported.8 The HNCI is available from the authors.) In addition to the HNCI, OAS data used in this study included clinical variables defining the patients' type of communication, diet, employment status, and level of pain as well as information about survival. In accordance with the guidelines of the University of Iowa College of Medicine Human Subjects Investigational Review Board, informed consent is obtained by having participants sign the informed consent document.
Clinical anchor states corresponding to the domains of the HNCI of speech (method of communication), eating (diet descriptor), and social disruption (pain, employment, and 90-day survival) were defined on the basis of data collected within the OAS. In addition, an overall quality-of-life question (item 30 on the HNCI) was used as an anchor for the total score. The anchor states are listed in Table 1. Within the speech domain, laryngeal indicated speech using some or all of a functioning larynx; alaryngeal included tracheoesophageal fistula, esophageal, and electrolarynx communication; and written meant that the patient communicated only by writing. Within the eating domain, unrestricted meant that the patients ate whatever they liked and had dentition or dentures; full edentulous meant the patient had no diet restrictions, but did not have dentition or dentures; soft indicated a puree diet; and liquid/minimal oral intake designated patients who took liquids only or remained dependent on a feeding tube despite being able to take some nutrition by mouth. The designation NPO indicated that the patients took nothing by mouth. The anchor states for pain within the social disruption domain were verbal descriptors given by the patients to define the state of pain they were in, and these ranged through none, mild, discomforting, distressing, horrible, and excruciating. Distressing, horrible, and excruciating were aggregated because of the relatively small numbers of patients in the individual groups. Response to the overall quality-of-life question ranged from 1 (very poor) to 5 (excellent). The top 2 responses (4 and 5) and the bottom 2 responses (1 and 2) were aggregated. Responses for this item were scaled to a score of 0 to 100. The survival anchor within the social disruption domain represented the status of the patient during the 90 days after completion of an HNCI survey. Only 6- and 12-month survey administrations were included in the evaluation of survival status. The HNCI domain of aesthetics had no easily identifiable clinical anchor state for comparison.
Data collected before treatment and 3, 6, and 12 months after treatment are included for all eligible patients. Mean HNCI domain scores for the patients within each anchor state were calculated with 95% confidence intervals. The concepts of pain and employment were included within the social disruption domain, and therefore mean social disruption domain scores were used with both pain and employment anchor states as well as the survival anchor state. Differences between the mean HNCI domain scores for the clinical anchor states appropriate to each domain were calculated (Table 1).
On the basis of an evaluation of the mean domain scores corresponding to the different anchor states, domain score results for the patients in OAS at their most recent evaluation were grouped as high (70-100), intermediate (31-69), or low (0-30). The high group cutoff was determined to include primarily the most desirable anchor health states, and the low group cutoff was determined to include primarily the least desirable anchor health states. The percentage of patients within each of these 3 groups in a particular anchor health state is presented as a means of providing clinical significance to the domain scores (Table 2).
The clinical variables used as anchors should demonstrate an association with their corresponding HNCI scores. To determine this association, the anchor states were assigned ordinal values (1, 2, 3, etc) ranging from the least desirable to the most desirable state. Then a Spearman correlation was used to calculate the association between these ordinal anchor states and their corresponding HNCI domain scores.
A distribution-based approach was used to calculate the magnitude of small, intermediate, and large domain score differences. On the basis of the benchmark ESs (0.2, small; 0.5, intermediate; and 0.8 or greater, large), small, intermediate, and large domain score differences in HNCI domain scores were calculated by the following equation: Domain Score Difference = (ES)(SD). In this equation, SD is the standard deviation of the pretreatment domain score. Domain score differences as calculated with the above equation were compared with the observed differences in mean HNCI domain scores between the different clinical anchor states.4,5
Table 1 gives the mean HNCI domain scores for the selected clinical anchor states, the 95% confidence intervals around these means, and the difference in mean HNCI domain scores between the anchor states within each domain. The speech scores were linked with type of communication and the eating scores with type of diet. Because the social disruption domain reflects the degree to which HNC affects patients' routine life functions and physical pain, its scores were linked with employment status, level of reported pain, and survival status. Finally, the overall HNCI score (representing a summation of all 4 domain scores) was linked with an HNCI item in which patients rated their overall quality of life.
The mean domain scores stratified well within their relevant anchor states (with no overlap of the 95% confidence intervals across each anchor state's mean scores). This stratification ranged from fairly high scores for the clinical anchor states that represent relatively normal functioning (eg, laryngeal speech, unrestricted diet, no pain) to increasingly lower scores for the clinical anchor states that represent progressively worse functioning (eg, written communication, NPO status, distressing pain). These mean domain scores, which represent the average scores of all patients within the various clinical states, can be used to interpret a patient's individual domain score. For example, patients with a score of 50 in the speech domain are not necessarily alaryngeal speakers, but their score can be interpreted as indicating that they rated their speech functioning (and their satisfaction with that functioning) on the same level as that of patients who use alaryngeal speech.
The distribution of high, intermediate, and low scores for the most recent 6- or 12-month survey provided by 421 patients in the OAS are given in Table 2. The high, intermediate, and low categories offer an efficient and straightforward method for interpreting patients' outcomes on the basis of their HNCI domain scores. This format for the presentation of HRQOL data provides clinically relevant information regarding the likely clinical status of patients on the basis of HRQOL scores (Table 2). For example, 97.0% of patients with a mean speech domain score of 70 or greater had laryngeal speech, and 88.3% of patients with an eating domain score of 70 or greater had an unrestricted diet. However, 50.1% (26.9% + 23.2%) of patients with an eating score below 30 were classified as NPO or took only liquids or minimal oral intake with supplemental tube feeding. Approximately 80% of patients with a social domain score of 70 or greater were employed, and approximately 80% with a social domain score below 30 were involuntarily unemployed. Approximately 1 of 4 patients with a social disruption domain score below 30 did not survive 3 months.
Table 3 gives the correlations between domain scores or the total 29-item HNCI score and the ordinally ranked anchor states. All calculated correlation coefficients were statistically significant (P<.001), and the magnitude of the correlations ranged from a low of 0.381 for the communication anchors with the speech domain scores to a high of 0.766 for the diet anchors with the eating domain scores. The magnitude of these correlation coefficients suggests a relatively strong association between the ordinally ranked anchors and the corresponding domain scores, satisfying the criteria that the selected anchors be related to the HRQOL scores under investigation.
Small, intermediate, and large domain score differences were calculated by means of the distribution-based standardized effect size benchmarks (Table 4). Domain score differences of approximately 4, 10, and 14 or greater for small, intermediate, and large clinically significant differences, respectively, were found. These distribution-based benchmark values are in agreement with the anchor-derived mean difference scores in Table 1. All of the differences are in the direction that would be anticipated, going from a less desirable to a more desirable anchor state. Within the speech domain, the difference between written and alaryngeal communication represented a large difference, and the difference between alaryngeal and laryngeal communication an intermediate difference. Within the eating domain, the difference between NPO vs liquids or minimal oral intake, and between liquids or minimal oral intake vs soft diet, were relatively small. The difference between a soft diet and a full edentulous diet represented an intermediate difference. The difference between a full edentulous diet and an unrestricted diet represented a large difference. It should be noted that these clinical differences are additive. For example, the difference between NPO and a soft diet would indicate a difference of 9.02 + 11.53 = 20.55, representing a large clinical difference. Stepwise differences between the anchor states for pain were all intermediate, with the difference from distressing/horrible/excruciating pain to none being a difference of 33.25, equivalent to approximately 3 times the size of a large change for a social disruption domain score. The difference between employed and involuntarily unemployed represented a large clinically significant change in the social disruption domain.
There has been a tremendous proliferation of surveys and instruments used to evaluate HRQOL in patients with HNC during the last 10 years.2 A number of these validated instruments are being used in clinical research projects involving patients with HNC throughout the world. The HRQOL information in HNC management not only is recognized as crucial to the comprehensive evaluation of alternative treatment methods, it is also required in many clinical trials.
For these HRQOL data to ultimately be useful as an outcome measure on which clinical decisions can be based, several conditions must be met. First, the clinical significance of observed intragroup changes or intergroup differences in reported scores must be clearly defined. Second, HRQOL data pertaining to patients with HNC must be presented with frames of reference that are familiar and easily understood by clinicians. If these 2 conditions are met, an increased familiarity with the instruments, the data derived from them, and the clinical meaning of scores should facilitate increased use.9
In the evaluation of reported HRQOL scores, statistical significance does not guarantee clinical significance to the patient. Likewise, the absence of a statistically significant change for a group of patients after an intervention or over time does not mean an absence of clinically significant change for all of the patients within the evaluated group. The HRQOL data are patient derived, and although statistical rigor is crucial in data presentation, the clinical significance of reported HRQOL results should be a key focus.9- 13
Clinical significance may be defined in several ways. Within the area of clinical assessment, interest centers around determining the magnitude of a clinically important difference between measured HRQOL scores. A frequent benchmark by which differences in scores are evaluated is the minimal important difference. A generally accepted definition of this minimal important difference is "the smallest difference in score that the patient perceives as important, either beneficial or harmful, and which would lead a clinician to consider a change in the patient's management."10 The minimal important difference may take on several different meanings depending on the context in which it is used. The minimal important difference may be used to reflect the minimal difference in health that is subjectively important to the patient, the change that would mandate a change in medical management, or the change considered sufficient to substantiate intragroup longitudinal change or intergroup difference in a clinical trial.7 In this article, we have focused on the magnitude of clinically important differences.
In addition to a clinical understanding of HRQOL scores and an understanding of the difference in scores that represents a clinically significant difference in health functioning, clinicians need HRQOL results to be reported in ways that are useful in making patient management decisions and in advising patients about anticipated treatment outcomes. There is no universally agreed on best method for presenting HRQOL data. However, simply reporting a mean HRQOL score for a group of patients as an outcome measure is likely to be unsatisfactory in many instances. A frequently useful approach is to stratify HRQOL scores, then link the HRQOL scores to health states that are familiar to clinicians and easily understood by patients. This is an extension of the use of clinical anchors in determining clinically significant differences between scores (eg, patients with a score of 80 or above on an HRQOL survey have a 90% chance of living at least 1 year). This approach serves 2 purposes: (1) the relatively abstract HRQOL score is linked to 1 or more familiar clinical health states and (2) clinicians develop an understanding and clinical picture of the typical patient with a score of 80 or greater on this survey, which enhances the general utility of the survey.3,12
We have demonstrated how 2 different techniques may be used separately and together to obtain information about the clinical significance of observed changes or differences in scores on an HNC-specific HRQOL survey. Using familiar clinical anchors allowed the rather abstract domain scores to be put into a frame of reference that was easily understood. This resulted in a much clearer understanding of the level of patient functioning or magnitude of HRQOL represented by a unit change or difference in domain scores. These data alone were quite useful, providing substantial insight into the HNCI scores that have been collected during the past 5 years. This anchor-based technique of defining clinically important difference values for HRQOL data is well described and is frequently used alone as a method for enhancing the clinical utility of HRQOL data.3
The utilization of distribution-based methods allowed benchmark differences in scores to be defined in more general, statistical terms.5,8,9 In general, the magnitude of clinically significant differences defined by anchor-based and distribution-based methods should agree, and we found this to be true in our results. It is important to recognize that, although they are derived from the standard deviation, the small, intermediate, and large clinically significant difference benchmarks used in this article do not have an associated statistical significance evaluation. Cohen5 originally proposed these benchmarks for use in the evaluation of differences in psychological tests. However, they have been empirically evaluated in a number of studies and found to correspond fairly well with independently measured values for small, intermediate, and large clinically significant differences.4,6,7,9
The mean domain scores for each of the anchor states were used as a guide to define the cutoff between high, intermediate, and low scores in Table 2. Presenting the HNCI data in ordinal groups that are associated with familiar anchor health states not only assists in presenting a clinically relevant picture of patient outcomes, but provides information not available when only mean HRQOL scores are reported. For example, the distribution of high, intermediate, and low scores will easily demonstrate the percentage of patients who report unacceptably poor outcomes or favorable outcomes. In addition, the health states most likely represented by a particular score are evident. Such information cannot be obtained by simply looking at a mean score. In the ideal circumstance, familiarity with selected HNC-specific instruments would be developed to such a level that a simple number would impart a clinical picture in the same way that an Apgar score immediately provides a clinical picture of an infant to an obstetrician.
Tremendous advances have been made in the area of HRQOL evaluation in patients with HNC. If the field of HRQOL evaluation in patients with HNC is going to continue to evolve, data must be presented in a manner that clearly defines the clinical significance and clinical relevance of the scores.
Correspondence: Gerry F. Funk, MD, Department of Otolaryngology–Head and Neck Surgery, Pomerantz Family Pavilion, Room 21200, University of Iowa Hospitals and Clinics, 200 Hawkins Dr, Iowa City, IA 52242-1093.
Submitted for publication April 16, 2003; final revision received November 10, 2003; accepted February 18, 2004.
This study was presented at the American Head and Neck Society meeting; May 5, 2003; Nashville, Tenn.