ImPACT indicates Immediate Postconcussion and Cognitive Testing.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Abeare CA, Messa I, Zuccato BG, Merker B, Erdodi L. Prevalence of Invalid Performance on Baseline Testing for Sport-Related Concussion by Age and Validity Indicator. JAMA Neurol. 2018;75(6):697–703. doi:10.1001/jamaneurol.2018.0031
What is the prevalence of invalid neurocognitive performance at baseline in the management of sport-related concussion and does it vary by age and validity indicator?
In this cross-sectional study of 7897 participants who completed baseline neurocognitive testing for the management of sport-related concussion, 56% failed at least 1 of 4 published validity indicators. Base rates of failure varied considerably across age groups from 84% in those aged 10 years to 29% in those aged 21 years.
Base rates of failure were surprisingly high overall, suggesting a need for a critical examination of performance validity assessment practices on baseline testing in concussion management programs.
Estimated base rates of invalid performance on baseline testing (base rates of failure) for the management of sport-related concussion range from 6.1% to 40.0%, depending on the validity indicator used. The instability of this key measure represents a challenge in the clinical interpretation of test results that could undermine the utility of baseline testing.
To determine the prevalence of invalid performance on baseline testing and to assess whether the prevalence varies as a function of age and validity indicator.
Design, Setting, and Participants
This retrospective, cross-sectional study included data collected between January 1, 2012, and December 31, 2016, from a clinical referral center in the Midwestern United States. Participants included 7897 consecutively tested, equivalently proportioned male and female athletes aged 10 to 21 years, who completed baseline neurocognitive testing for the purpose of concussion management.
Baseline assessment was conducted with the Immediate Postconcussion Assessment and Cognitive Testing (ImPACT), a computerized neurocognitive test designed for assessment of concussion.
Main Outcomes and Measures
Base rates of failure on published ImPACT validity indicators were compared within and across age groups. Hypotheses were developed after data collection but prior to analyses.
Of the 7897 study participants, 4086 (51.7%) were male, mean (SD) age was 14.71 (1.78) years, 7820 (99.0%) were primarily English speaking, and the mean (SD) educational level was 8.79 (1.68) years. The base rate of failure ranged from 6.4% to 47.6% across individual indicators. Most of the sample (55.7%) failed at least 1 of 4 validity indicators. The base rate of failure varied considerably across age groups (117 of 140 [83.6%] for those aged 10 years to 14 of 48 [29.2%] for those aged 21 years), representing a risk ratio of 2.86 (95% CI, 2.60-3.16; P < .001).
Conclusions and Relevance
The results for base rate of failure were surprisingly high overall and varied widely depending on the specific validity indicator and the age of the examinee. The strong age association, with 3 of 4 participants aged 10 to 12 years failing validity indicators, suggests that the clinical interpretation and utility of baseline testing in this age group is questionable. These findings underscore the need for close scrutiny of performance validity indicators on baseline testing across age groups.
In the United States, it is estimated that between 1.1 million and 1.9 million sport- and recreation-related concussions occur annually in children aged 18 years or younger.1 This statistic, when combined with concerns about microstructural damage in the white matter tracts2 of the brain and the potential long-term effects of multiple concussions,3 is leading to a focus on the management of concussion as a public health concern. The heterogeneity in clinical presentation and natural history,4 with the absence of a reliable biomarker or other medical test to identify concussion,5 forces examiners to diagnose concussion based on the clinical assessment of medical history, balance testing, neurocognitive functioning, and postconcussion symptoms.
In an attempt to improve the diagnostic accuracy of concussion, baseline neurocognitive testing was introduced in the 1980s to assess preinjury neurocognitive functioning against which postinjury functioning could be compared.6 In contrast to the traditional methods of neuropsychological assessment in which the premorbid level of functioning is estimated and the test results are compared with normative data, baseline testing was implemented to reduce the inherent error in estimating premorbid functioning. Theoretically, this practice increases the validity of decision making because individuals serve as their own healthy controls.7 However, the utility of baseline testing compared with the use of normative data in postinjury evaluations has long been debated.8,9 The most recent Consensus Statement on Concussion in Sport4 concluded that baseline testing can be useful in the management of concussion, but there is not enough evidence to suggest that baseline testing should be mandatory.
During the past decade, computerized neurocognitive testing has largely replaced traditional paper-and-pencil methods and has become widespread at all levels of sport10 because of the ease of administration and scoring, which allows for the baseline testing of groups and reduces barriers for use by a wider range of health care professionals.11 This expansion of the user base supports the need for monitoring the validity of assessment practices. Computerized baseline testing has the potential to contribute to this goal. However, computer-administered evaluations also introduce new concerns about the validity of the data they produce.
Performance validity is defined as the extent to which test-taking behavior provides an accurate reflection of the underlying cognitive ability that the instrument was designed to measure. Performance validity tests are objective measures designed to identify response sets that are unlikely to accurately reflect the true ability of the test taker. Valid performance is a basic assumption of neuropsychological testing, and it is necessary to arrive at valid and useful clinical decision making.12 Noncredible responding in neurologically intact populations can be attributed to inattentiveness, poor task comprehension, lack of incentive to perform well, or incentive to perform poorly. The influence of these factors varies as a function of age, with children and adolescents more likely to have invalid performance associated with inattentiveness, poor task comprehension, and a lack of appreciation for the importance of performing their best,13,14 whereas adults have been shown to also be influenced by external incentives.15-18
The base rate of failure (BRF), also known as the base rate of invalid performance, varies as a function of metric, population, and situational variables. Although there is limited research on the BRF during baseline testing, a systematic review by Gaudet and Weyandt19 focused specifically on the Immediate Postconcussion and Cognitive Testing (ImPACT),20 the most widely used computerized neurocognitive test for the management of concussion. An estimated 75% of National Collegiate Athletic Association member schools use the ImPACT for baseline assessments,21 with BRF ranging from 2.7%22 to 27.9%23 (weighted mean, 6.1%).19 These numbers are likely underestimates, as they are based on embedded validity indicators (EVIs) that are less sensitive. These BRFs tend to be lower than those found in neurologically intact young adults who participate in academic research (BRF range, 18.3%-36.7%).24
Although children and adolescents typically pass performance validity tests designed for and normed on adults, elevated BRF in young examinees are commonly reported in the pediatric population and are typically attributed to the stage of cognitive development.13 The BRF for children in clinical settings is highly variable, ranging from 0% to 70% (weighted mean, 15.5%).13 Within studies, young children tend to have higher BRF,25,26 prompting some researchers to suggest that examinees younger than 10 years should be exempt from performance validity testing.27 Other studies found a more complex association between age and BRF that was mediated by the combined choice of performance validity tests and the cutoff used.25,26,28 Associations with age have also been found in athletes, with younger athletes (aged 10-12 years) having higher BRF on baseline testing than older athletes (aged 13-18 years).14
Four EVIs have been published for the ImPACT; 2 are included in the ImPACT Clinical Manual29 and the others were developed by independent research teams. In addition to the default ImPACT EVI that automatically flags invalid performance, the ImPACT Clinical Manual29 also provides a second EVI, “Red Flags,” as a more liberal index of suboptimal performance. Two alternative EVIs have been introduced by Schatz and Glatts16 and Higgins et al30 based on experimental malingering paradigms. These new EVIs produce higher BRFs than in the ImPACT EVIs (Table 1).
The combination of the experimental and observational evidence suggests that true BRFs could be as high as 40% in high school and collegiate athletes and even higher in younger athletes. Group-based computerized testing may be associated with a higher BRF compared with 1-on-1 evaluations because of the lack of close monitoring of test-taking behavior;31 however, other factors, such as adherence to standardized administration and the quality of supervision, may be better predictors of BRF than group size.32 The lack of consensus regarding the most appropriate EVI for determining the validity of ImPACT scores, in combination with wide discrepancies in BRFs across samples, indicators, and research designs, necessitates a direct comparison of the available EVIs in a naturalistic setting. The present study was designed to compare the 4 existing ImPACT-based EVIs across age groups in a large sample of athletes undergoing baseline testing.
The default ImPACT EVI was expected to produce a BRF comparable to the 6.1% reported by Gaudet and Weyandt,19 and the ImPACT Red Flags EVI was expected to have a BRF between 20% and 30%, as reported in previous studies.23,33 Based on previous reports, we predicted that the BRF would be the highest for the EVIs introduced by Schatz and Glatts16 and Higgins et al.30 However, the paucity of research on these 2 EVIs in naturalistic settings precluded a more specific prediction. We hypothesized that the BRF on all 4 EVIs would vary across age group, producing a higher BRF in younger athletes partially because of the reliance on raw scores uncorrected for developmental changes in the underlying cognitive ability. We also predicted that BRFs would be elevated as athletes became more acculturated to playing at the more competitive levels of sport, such as the collegiate level, when the incentive to underperform would be greatest.
Participants included 7897 consecutively evaluated athletes aged 10 to 21 years who completed baseline neurocognitive testing for the management of concussion. Most of the participants were English speaking (7820 [99.0%]) and right-handed (6848 [86.7%]). The most commonly reported preexisting diagnoses were attention-deficit/hyperactivity disorder (869 participants [11.0%]), dyslexia (145 [1.8%]), and autism (27 [0.3%]). Most athletes played football (1652 [20.9%]), followed by soccer (1252 [15.9%]), volleyball (770 [9.8%]), basketball (733 [9.3%]), hockey (726 [9.2%]), and field hockey (680 [8.6%]).
Data were collected during baseline testing for concussion between January 1, 2012, and December 31, 2016, through the concussion management program of a large hospital in the Midwestern United States. The online version of the ImPACT,34 version 2.1 (ImPACT Applications Inc), was administered to all participants. Testing lasted approximately 45 minutes and was conducted in groups of approximately 20 athletes at schools, at community centers, or in a hospital-based setting. Examinees were overseen by either an athletic trainer or a licensed clinical neuropsychologist. This study was approved by the institutional research ethics boards of Henry Ford Health System and the University of Windsor, which also waived the need for participant informed consent because of the retrospective use of deidentified data. Ethical guidelines regulating research with human participants were followed throughout the study.
The ImPACT is a computerized neurocognitive test designed for baseline and postconcussion assessment. The test includes 5 performance-based cognitive indices: reaction time, visual motor speed, impulse control, and visual and verbal memory as well as an inventory of postconcussion symptoms. One-year test-retest reliabilities are variable and range from low (intraclass correlation coefficient = 0.22) to high (0.85).35,36 Convergent validity has been established against traditional neuropsychological measures.37 The ImPACT demonstrated high sensitivity (0.82) and specificity (0.89) for concussion, with an overall classification accuracy of 86%.38
Data were analyzed with IBM SPSS, version 24.0 (IBM Corporation). No data were missing for any variables. The BRFs were calculated for the entire sample and separately for each age group (10 through 21 years), for each of the 4 EVIs and cumulative BRF, representing the percentage of athletes who failed at least 1 of the 4 EVIs. Risk ratio (RR) was calculated between the 10-year-old group and the 21-year-old group to summarize the association of age. Mean ImPACT composite scores were calculated for each age group.
Of the 7897 study participants, 4086 (51.7%) were male, the mean (SD) age was 14.71 (1.78) years, 7820 (99.0%) were primarily English speaking, and the mean (SD) educational level was 8.79 (1.68) years. Across the sample of 7897 participants, the BRF was 6.4% (505 participants) for the default ImPACT EVI, 31.8% (2509) for the ImPACT Red Flags, 34.9% (2759) for the Higgins et al30 logistic regression equation, and 47.6% (3757) for the Schatz and Glatts16 EVI. The cumulative BRF was 55.7% (4400). Examination of the BRF by age demonstrated a strong age association (Figure). The age with the highest cumulative BRF was the 10-year-old group at 83.6% (117 of 140 participants), whereas the 21-year-old group had the lowest at 29.2% (14 of 48 participants) (RR, 2.86; 95% CI, 2.60-3.16; P < .001) (Table 1). Table 2 provides ImPACT mean raw composite scores and percentiles by age.
Baseline cognitive testing is a common practice for concussion management programs and was introduced to enhance the clinical utility of postinjury ImPACT data for making return-to-play decisions for athletes. However, given the reports of high and fluctuating BRFs across validity indicators and cutoffs, the validity of baseline data has become a source of concern. To empirically evaluate the legitimacy of this concern and to characterize it across development, we compared the BRF on the 4 previously published EVIs for the ImPACT. As expected for the EVIs, the BRF on the default ImPACT EVI was the lowest at 6.4%, comparable with the 6.1% rate found by Gaudet and Weyandt.19 The ImPACT Red Flags identified 31.8% of the sample as invalid, and the Higgins et al30 logistic regression equation identified 34.9%. The EVI with the highest BRF was developed by Schatz and Glatts16 at 47.6%. The cumulative BRF (individuals with ≥1 EVI failure) was 55.7%.
Consistent with our prediction, the youngest athletes (aged 10-11 years) had the highest BRF across all 4 EVIs. Age and invalid performance had a strong negative association; older athletes were less likely to fail EVIs. However, contrary to our prediction, the collegiate age groups did not show an increase in BRF, with the isolated exception of the 19-year-old group. The BRF continued to decrease across ages 20 and 21.
Although the literature on performance validity at baseline focuses largely on deliberate suppression of performance (ie, “sandbagging”) in high school and collegiate athletes, invalid performance is multifactorial and includes age-related fluctuations in attention and comprehension characteristic to the stage of cognitive development13 as well as varying ability to appreciate the importance of demonstrating one’s true ability level during cognitive testing. The very young athletes are most likely to be negatively affected by such confounding variables, which could explain the high BRF. It must be noted that these factors are likely to be independent of the conscious downward manipulation of test scores.27,28 Nevertheless, these confounding factors place examinees at higher cumulative risk of failing EVIs, particularly given that EVIs are distributed throughout the test battery as opposed to free-standing performance validity tests, which typically assess validity at discrete points in time.
The BRFs varied considerably across different EVIs. The default ImPACT EVI and ImPACT Red Flags rely on some of the same subtest scores but use different cutoffs, with default ImPACT EVI based on 2 SDs and ImPACT Red Flags based on 1.5 SDs below the mean. On the other hand, the 2 newest EVIs were calibrated empirically using experimental malingering paradigms in high school athletes and nonathlete college students. Although research suggests that performance validity tests designed for adults can be applied to children,13 the downward extension of adult cutoffs may violate important trajectories in cognitive development and increase the likelihood of false-positive errors in younger age groups.
Experimental malingering studies are at risk for overfitting their detection model to an atypical (ie, artificially exaggerated) manifestation of invalid responding. Namely, they were calibrated to detect the most egregious forms of invalid responding in examinees who had little incentive to avoid detection, limiting its generalizability to real-life situations with potentially substantial (perceived) reward for successful malingering.24 This suggests that (1) these EVIs may not apply to athletes below the high school level and (2) the current BRF in the collegiate athletes, and possibly the high school athletes, may be underestimated, as these EVIs may not detect more subtle or sophisticated forms of noncredible responding.
The age association may partly be an artifact of norming practices within the ImPACT. Specifically, given the strong age associations on BRF, the use of wide age bands in the normative sample puts the youngest child within an age band at higher risk to fail the EVIs compared with the oldest child within the same age band. Consistent with this explanation, the greatest changes in the slope of BRF occur at the ages of 12 and 18 years, which mark the upper limit of their age band. Further evidence of this explanation comes from the pattern of percentile scores (intended to measure relative standing in terms of cognitive ability) across ImPACT scales represented in Table 2, showing a consistent pattern of lower percentile scores for the 10- and 11-year-old groups. These youngest examinees are statistically at higher risk of failing 1 of the EVIs, suggesting that the relative contribution of ability and test-taking effort cannot be separated at the low end of the age distribution. The particularly high BRF for the 10- and 11-year-olds may be resolved by the recent publication of the ImPACT Pediatric,39 which is intended for children aged 5 to 11 years. However, this hypothesis is yet to be tested empirically. The association between age and BRF may reflect a cohort effect between the current sample and the cohort for which the normative data were gathered, which was more than 17 years ago.
In the absence of data on independent, well-established performance validity tests, the true meaning of EVI failures and the unusually high BRF in the youngest athletes is ultimately unknown. At the descriptive level, the age and instrumentation artifacts (ie, varying BRF as a function of examinee age and EVI cutoff) are compelling and warrant follow-up investigations. Although the very high BRFs in 10- to 12-year-old children presents a serious challenge in clinical interpretation and demands a sensible explanation, the available evidence precludes any definitive conclusion. Both extremes must be considered: the BRFs could represent either false-positive errors (ie, undeveloped cognitive skills that were mistaken for invalid responding by validity cutoffs designed for adults) or true-positives (ie, most young athletes produce invalid data). Future research should examine this issue. To the extent that the high BRFs represents false-positives, existing ImPACT EVIs should be recalibrated by applying more conservative cutoffs for failure to account for developmental factors. If the high BRFs represent true-positives, that would render the majority of baseline data meaningless, calling into question the utility of the practice in this age group.
These findings have several practical implications. The high BRF in the youngest athletes and the unexpectedly lower BRF in collegiate-age athletes necessitates differential age-specific approaches to the management of concussion. Given that more than 4 of 5 children aged 10 years fail EVIs, the clinical interpretation and utility of baseline testing in this age range is highly questionable. Until the high BRF in young children is better understood, this group could be exempt from group-administered, computerized baseline testing. This finding raises concerns about the validity of postinjury test results in young children, suggesting that their postinjury data should be interpreted with caution. Baseline assessment of athletes aged 13 to 18 years should include procedures aimed at reducing BRF, such as strictly adhering to standardized administration practices with specific consideration to extra-test factors that may increase the BRF (limiting group size, close supervision, and minimizing distractions during testing). For the collegiate-age athletes, determinations about the validity of the baseline test scores should make use of the cumulative EVI because of the potential insensitivity of individual cutoffs in this age group.
Providing BRF as a function of age helps clinicians contextualize the performance of athletes under their care. Knowing how common a given EVI failure is can guide diagnostic decision making because BRF foreshadows classification accuracy.40 However, the high overall BRF signals a potential confound in the measurement model. In the absence of objective, well-validated criterion measures, the degree to which the high BRF reflects false-positive errors or truly invalid response sets has yet to be determined. The standard practice of validating performance validity tests through comparison with other, well-established performance validity tests as part of a neuropsychological test battery, administered in the traditional one-on-one fashion, is warranted. This approach has the potential to isolate situational artifacts within the ImPACT (computerized group administration of a battery designed for a specific purpose). Depending on the findings, EVIs may need to be recalibrated to account for the association of examinee age and assessment context. These recommendations, combined with the complex nature of the clinical diagnosis of concussion, highlight the necessity of concussion programs to include members who have advanced knowledge of psychometric testing and performance validity assessment in addition to knowledge of management of concussion.
Results converge on a number of conclusions: (1) the high BRF, particularly for younger athletes, poses significant concerns regarding the validity of the baseline ImPACT data; (2) the factors that may contribute to invalid performance vary by age, suggesting that adopting age-appropriate test administration strategies (ie, item content and test instructions that match the examinee’s stage of cognitive development, individual administration, improved supervision, and strict adherence to standardized administration) may lower BRF; and (3) clinicians should routinely consider performance validity, as measured by all 4 EVIs, as well as the age-specific BRF when making return-to-play decisions based on postinjury evaluations.
Accepted for Publication: December 8, 2017.
Corresponding Author: Christopher A. Abeare, PhD, Department of Psychology, University of Windsor, 401 Sunset Ave, Windsor, ON N9B 3P4, Canada (firstname.lastname@example.org).
Published Online: March 12, 2018. doi:10.1001/jamaneurol.2018.0031
Author Contributions: Drs Abeare and Merker had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Abeare, Messa, Zuccato, Erdodi.
Acquisition, analysis, or interpretation of data: Abeare, Zuccato, Merker, Erdodi.
Drafting of the manuscript: Abeare, Messa, Zuccato, Erdodi.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Abeare, Zuccato, Erdodi.
Administrative, technical, or material support: Abeare, Zuccato, Merker, Erdodi.
Study supervision: Abeare, Merker.
Conflict of Interest Disclosures: None reported.
Create a personal account or sign in to: