Questionnaires vs Interviews for the Assessment of Global Functional Outcomes After Traumatic Brain Injury

This cohort study compares global functional outcome ratings of the Glasgow Outcome Scale–Extended administered as a structured interview vs a questionnaire to patients with traumatic brain injury.


Introduction
A popular way of assessing outcomes for clinical trials in acute traumatic brain injury (TBI) is through a clinician rating scale, particularly the Glasgow Outcome Scale (GOS) or Glasgow Outcome Scale-Extended (GOSE). 1 A structured interview has become a standard method for obtaining ratings on the GOSE and is a core recommended outcome in the Common Data Elements for TBI. 2 A questionnaire version of the GOSE, completed by the patient or a caregiver, has been used as an end point in several multicenter trials of acute TBI. [3][4][5][6] The questionnaire format avoids investigator bias in studies such as surgical trials where masking is impractical. Questionnaires offer pragmatic advantages in overall costs and can make large-scale clinical trials of TBI feasible if industry sponsorship is lacking. 7 Although there have been concerns about low follow-up rates, 8 6-month GOSE outcomes were obtained for 97% of patients enrolled in the Eurotherm3235 Trial. 3 In practice, studies have typically followed up nonresponders by telephone interview or another type of contact that can be organized centrally. These studies thus ultimately combined ratings derived from questionnaires and interviews in their primary end point.
Work 9 to date comparing GOSE interviews and questionnaires has been small in scale and did not indicate whether there were differences in the information collected or whether an interview offered added value compared with a questionnaire. Interviewing may be expected to be superior because it allows for flexible questioning in borderline cases, the reliability of respondents can be evaluated, and when inconsistencies arise, a judgment can be made concerning the overall rating.
Areas likely to need judgment include the influence of preexisting disability or extracranial injury. 10 If interviews are superior, the ratings should have better validity, for example, by identifying dependency more precisely or by discounting preexisting disability. Interview ratings should be more correlated with measures of injury severity than questionnaires completed by patients and caregivers. The latter, in contrast, might be expected to be more subjective and more correlated with patient-reported outcomes.
The Collaborative European NeuroTrauma Effectiveness Research in TBI (CENTER-TBI) project 11 used a flexible data-collection approach to maximize follow-up. The GOSE was administered as either an interview or a questionnaire. To allow comparison of methods, the study design encouraged CENTER-TBI investigators to collect both versions of the GOSE when possible. In addition, patientreported outcomes were used to assess health-related quality of life, mental health, and TBI symptoms. The information that was available to investigators at the time of the interview could include completed questionnaires. Thus, the comparison made in the current study concerns whether interviewing added value to the GOSE assessment and increased validity. We compared agreement of the assessments in 3 areas: (1) overall ratings, (2) individual sections of the GOSE, and (3) correlations with baseline factors and patient-reported outcomes. We also studied the use of judgment by interviewers in assigning an overall rating.

Methods
This cohort study used data from patients enrolled in the CENTER-TBI project from December 2014 to December 2017. Data were analyzed from December 2020 to April 2021. Ethical approval was obtained for each project site according to national and local procedures. A detailed ethics statement is given on the project website. 12 This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.

Participants
The CENTER-TBI project included 4509 patients from 65 sites across 18 countries. 13 Inclusion criteria were a diagnosis of TBI and clinical indication for a computed tomography scan, being seen in a hospital within 24 hours of the injury, and availability of informed consent (written consent was obtained at the earliest opportunity, but some patients may have been enrolled initially with oral consent). Patients were excluded if they had a severe preexisting neurological disorder that would confound outcome assessments. Additional inclusion criteria for the current analyses were an age of 16 years or older, survival at 6 months after injury, a complete and scorable GOSE interview and questionnaire for the participant at 3 months or 6 months after injury, and completion of the GOSE interview and questionnaire within 3 weeks of one another.

Measures
Demographic information was recorded at the time of recruitment along with information on the cause of injury and preexisting systemic disease based on the American Society of Anesthesiologists Physical Status Classification System. 14 Injury severity was assessed using early computed tomography imaging, 15 the Abbreviated Injury Scale score (scores range from 1 to 6, with higher scores indicating more severe injury) and Injury Severity Score (scores range from 1 to 75, with higher scores indicating more severe injury), 16 and a baseline Glasgow Coma Scale score (scores range from 3 to 15, with higher scores indicating less severe injury). 17,18

Global Outcome
The GOSE interview 19 was administered at each site either in person or by telephone. Investigators attended training and were provided a study manual including advice on supplementary questions for borderline situations, problem cases, and scoring. 10 Interviewers were instructed to include disability associated with all aspects of the injury, including extracranial injury, in the rating. The weighted κ statistic (κ w ) for test-retest agreement is 0.92. 20 The GOSE questionnaire 9 consists of 14 questions in 7 sections that parallel the interview (eAppendix in the Supplement 1). Questions are designed to be appropriate for an adult patient or caregiver. The Flesch readability score for the text is 72, and the Flesch-Kincaid Grade Level is 6.5. The response choices include the option to indicate that limitations are present but are not attributable to head injury; responses using this option are not included in the scoring. Because it is not practical to assess responsiveness using a questionnaire format, the categories of "lower severe disability" and "vegetative state" are collapsed. The κ w for test-retest agreement is 0.98. 9

Health-Related Quality of Life
The 36-Item Short Form Health Survey, version 2 (SF-36v2) 21 is a patient-reported outcome that has been used for many health conditions. The instrument has 8 subscales and 2 summary scores, the Mental Component Summary and the Physical Component Summary. Scores are transformed to T-scores (mean [SD], 50 [10]), with higher scores indicating better outcomes.
The Quality of Life after Brain Injury Scale 22 is a TBI-specific measure of health-related quality of life comprising 37 items in 6 domains relevant for brain injury. Scores range from 0 to 100, with higher scores indicating better health-related quality of life.

Mental Health
The Patient Health Questionnaire-9 is a self-report instrument of 9 items assessing depression severity. 23 Scores range from 0 to 27, with higher scores indicating greater depression. The Generalized Anxiety Disorder-7 is a 7-item self-report instrument for the severity of anxiety symptoms. 24 Scores range from 0 to 21, with higher scores indicating greater anxiety.

TBI Symptoms
The Rivermead Post Concussion Symptoms Questionnaire is a self-report instrument consisting of 16 symptoms typical after mild or moderate TBI. 25 Scores range from 0 to 64, with higher scores indicating a greater burden of symptoms. Comparisons with this questionnaire were restricted to patients with a baseline Glasgow Coma Scale score of 9 to 15 (ie, mild or moderate injury), consistent with the context of use of this instrument. When translations were not available from the publishers, all instruments used were translated into local languages using a process of linguistic validation. 26

Data Collection Procedure
Patients were enrolled from December 2014 to December 2017. Follow-up was scheduled at 3 and 6 months. The 3-month assessment was conducted either in person or by a postal questionnaire and telephone interview. The 6-month assessment was planned as an in-person meeting that included the GOSE interview; questionnaires could be completed at the time of follow-up or returned by post.
To allow sites the flexibility to maximize follow-up, the use of both interviews and questionnaires was not mandated, but investigators were encouraged to collect both versions if possible.

Statistical Analysis
Data were downloaded on November 22, 2019, from the Neurobot database, version 2.1 (International Neuroinformatics Coordinating Facility). Analyses were conducted from December 2020 to April 2021 using IBM SPSS, version 25 (IBM). Demographic and clinical characteristics were described using frequencies and percentages for categorical variables and medians and IQRs for continuous data.

Agreement Between Instruments
Preinjury and postinjury items in each of the 7 subsections of the GOSE were used to code whether a problem or limitation was recorded that had not been present before the injury. The strength of agreement for these 2 × 2 comparisons was evaluated using the κ statistic 27 (Յ0.20, poor; 0.21-0.40, fair; 0.41-0.60, moderate; 0.61-0.80, good; and 0.81-1.00, very good). 28 Differences in limitations recorded in each section were evaluated by the McNemar test. Agreement between overall ratings was assessed using κ w ; quadratic weights penalize extreme disagreements between ratings more heavily than slight disagreements. 29 The Wilcoxon signed rank test was used to test for differences between GOSE scores from the 2 formats, with r as the measure of effect size. We also compared the questionnaire and interview formats when ratings were dichotomized between upper severe disability and lower moderate disability, a common cut point for unfavorable vs favorable outcomes.
To provide an indication of the use of personal judgment in assigning overall ratings for the interviews, we identified departure from interview scoring rules. We scored the interviews centrally according to the standard procedure for the assessment and calculated the difference between interviewer ratings and central scoring. Variation in these differences across GOSE outcome categories (assigned by central scoring) was assessed using χ 2 tests.

Comparative Validity
Spearman correlations were calculated between the GOSE and baseline factors typically included in prognostic models (ie, age, Glasgow Coma Scale score, pupillary reactivity, Injury Severity Score, Abbreviated Injury Scale score, and extracranial injury) and patient-reported outcome measures (ie,

Results
Of 3691 individuals aged 16 years or older who were alive and eligible for follow-up 6

Ratings for Sections of the GOSE
Levels of agreement in individual sections of the interview and questionnaire are shown in Table 3 (

Comparative Validity
Correlations between variables are shown in Table 4. Correlations of the GOSE interview and questionnaire outcomes with baseline variables were strongest for the Glasgow Coma Scale score

Overall Ratings
Overall, the GOSE scores from interviews and questionnaires were in good agreement. The GOSE consists of a hierarchy of broad categories, and thus, many individuals are unambiguously assessed as being in a particular category. Cases assessed near borderlines were associated with more uncertainty. 10 Disagreement between the interview and questionnaire scores by 1 category was common and suggests that borderlines may be an important factor in differences between the 2 approaches. Some cases of TBI represent a challenge for global outcome assessment and may lead to large discrepancies. These challenging cases include those in individuals with preexisting limitations, which can mask any post-TBI changes. 10 Consistent with this, more disagreement between the questionnaire and interview occurred in the context of preexisting systemic disease in the present study. However, large discrepancies between the 2 formats were found in only 5.8% of cases at 3 months and 4.0% at 6 months.
Guidelines for interviewing allow the assessor to use judgment to move the rating to a higher or lower category than indicated by the responses recorded. 9 We found that interviewers were using such discretion, and this was particularly prominent for individuals who potentially had outcomes regarded as unfavorable. Consistent with this, when outcomes were dichotomized at the cut point between upper severe disability and lower moderate disability, the interviews at 3 months showed fewer ratings of an unfavorable outcome than did the questionnaires. A judgment to assign a higher category may have been made because later parts of the interview are inconsistent with dependency  (eg, the person is back at work). However, it may also indicate bias on the part of interviewers toward assigning particular outcomes.

Subsections of the GOSE
Levels of agreement on individual sections of the GOSE were highest for objective aspects of functioning such as independence in activities of daily living and lowest for subjective aspects such as TBI symptoms and personal relationships. Judgments about ability to return to participation in work and social and leisure activities can be hypothetical when the individual is still recovering, and the differences between interview and questionnaire outcomes seemed to decrease as the time since the TBI increased. These may be areas where the interview allowed finer judgment, particularly in the first few months after injury.
Of note, we found that symptoms that interfere with daily life were less likely to be recorded on the questionnaire than by interviewers. The findings suggest that respondents may find it difficult to judge the association between TBI-related symptoms and daily functioning and may even be unaware of symptoms that are relevant. 31 These observations are consistent with literature on mild TBI, 32,33 in which symptom reports can vary substantially depending on the method of data collection. Some researchers have found that self-report questionnaires yield more reports of symptoms than do interviews, 32 whereas other researchers have found the opposite. 33 Concerns have been raised about the use of patient reports in TBI studies, particularly that patient reports may suggest an overly optimistic perspective of recovery owing to lack of awareness. 34 However, we did not find this to be true overall. To clarify issues of informant reliability, it would be useful to conduct further research in which self-awareness was examined directly.

Comparative Validity
The associations found between the GOSE ratings and other factors were as expected from previous research. [35][36][37] A key novel finding of this study was the similarity in the strength of the correlations between each of the 2 assessment methods and other variables. The expectation that stronger correlations would be found between interview ratings and baseline factors and between questionnaire scores and patient-reported outcomes was not shown. The concordance in the associations implies that the core information collected from interviews and questionnaires concerning global outcomes after TBI was similar. This conclusion is consistent with the good alignment that has been reported between prognostic models based on questionnaire or interview outcomes. 38 This study found that there was not a definitive advantage of interviewing. In a single-center study with a limited number of data collectors, one may expect interviews to be superior to questionnaires for reasons already stated. However, in a large-scale study such as the CENTER-TBI project, the involvement of multiple interviewers may introduce additional variability in the assessment. Thus, the potential advantage given by interviews may be cancelled by interrater differences. To achieve benefit from interviewing, multicenter studies may need to address interrater differences systematically, for example, by regular and repeated training and by undertaking central monitoring of individual assessments. 39,40 Investigators may wish to use the 2 formats (questionnaires and interviews) as alternative methods of data collection. The results of the current study support combining ratings from these separate sources with the caveat that some additional variation may be introduced. The study's findings indicate that amendments to scoring may help to align the methods further. Investigators using questionnaires may also consider using interviews to obtain additional information concerning specific individuals. For example, individuals with a need for assistance in only 1 area (eg, home, shopping, or travel) would potentially be on the borderline for independence and might be followed up by interview.
The GOSE questionnaire was originally designed for postal administration and is readily adaptable for use as an online instrument or a smartphone app. The GOSE interview has been applied in conditions other than TBI, including stroke, cardiac arrest, and multiple trauma. 41-43 The instrument could potentially be appropriately reworded to assess the long-term neurological consequences of other conditions that have an acute onset and before-and-after states, including sepsis and other illnesses that may manifest long-term neurological symptoms. Validation would be necessary and could open the way for large-scale data collection for a variety of conditions for which chronic neurological outcomes are underresearched.

Limitations
This study has limitations. The comparison of interviews with questionnaires was a planned analysis of the CENTER-TBI project but was not an experimental design. Information about baseline factors or scores on other outcome assessments was not masked to investigators. Furthermore, systematic comparisons between different modes of data collection (ie, telephone vs in-person interviews or patients vs other informants) were not possible because of either limited numbers of cases or confounding with outcome distributions. Prospective studies are needed to compare modes of data collection. The study also excluded patients with severe preexisting neurological conditions, and this limits the generalizability of the findings to these patients; interviews may be more able to disentangle the association between complex preinjury conditions and TBI outcomes.

Conclusions
In this cohort study, GOSE ratings of outcomes for TBI that were obtained from questionnaires and interviews had good overall agreement. We found some disagreement between GOSE categories as well as differences in the percentages of post-TBI problems recorded by each method. We also found that interviewers used judgment in their overall ratings. However, any differences in the ratings did not translate into differences in the validity of the assessments. In this large-scale, multicenter study, interviews did not seem to offer substantial advantages compared with central scoring of information collected directly from patients and caregivers using a questionnaire. These findings support the use of questionnaires in studies in which this form of contact may offer substantial practical advantages compared with interviews.