Validation of Medicare Rehabilitation Functional Assessments in Routine Care

Key Points Question Are functional assessments in rehabilitation settings valid measures? Findings In this cross-sectional study of 1036 Medicare beneficiaries, the correlation of functional assessments from rehabilitation services with National Health and Aging Trends Study assessments was 0.63 when assessments were performed no more than 90 days apart and 0.66 when performed no more than 30 days apart. Differences in assessment scores were generally small; however, assessments from the National Health and Aging Trends Study tended to be lower than those from rehabilitation services in a small sample of older adults with less function. Meaning These findings suggest that Medicare rehabilitation functional assessments may be sufficiently associated with functional outcomes for use in some research applications.


Introduction
Function is an important patient-centered outcome, but it is difficult to assess on a national scale.
Medicare beneficiaries undergo functional assessments while receiving rehabilitation services from inpatient rehabilitation facilities (IRFs), skilling nursing facilities (SNFs), and home health agencies (HHAs). The assessments are based on the individual's ability to perform specific activities, as measured by rehabilitation staff. While the purpose of the assessments is to estimate resource needs and determine payment amounts, these data are also available for use in research studies. However, the validity of the assessments performed in the rehabilitation setting is not known.
The assessment instruments used in rehabilitation settings are the Functional Independence Measure (FIM), Minimum Data Set, and the Outcome and Assessment Information Set (eAppendix and eTable 1 in the Supplement). These instruments overlap in the activities and scales used to rate functioning. Prior studies support the reliability and validity of these instruments. [1][2][3][4][5][6][7] However, prior studies were generally performed in small samples of patients and with a limited number of evaluators who received specific training. To our knowledge, nationally representative validation of routine care assessments has not been performed. Because of the lack of large-scale validation, concerns have been raised about the appropriateness of using the data for research purposes. 3,7 These concerns are based on the large number of staff who perform the assessments, the lack of required standardized training on evaluation and scoring, and a potential bias because of facility performance and associated payments.
The validity of Medicare rehabilitation assessments can be explored by comparing them with criterion-standard functional assessments with known reliability and validity, 8,9 such as those obtained as part of the National Health and Aging Trends Study (NHATS), a large population-based survey of health and functional ability trends among Medicare beneficiaries aged 65 years and older. 10 The NHATS is also linked to Medicare, creating the opportunity to compare functional assessments from rehabilitation settings with research-based functional assessments. The advantages of this comparison are that the data are drawn from a large, nationally representative sample in routine practice, masked to any knowledge of a subsequent comparison with a criterion standard that is complete and rigorous. Disadvantages of this comparison are the variable time interval between Medicare and NHATS assessments as well as differences in the instruments used by the various rehabilitation settings.
In this study, we aimed to compare the functional assessments performed in routine care Medicare rehabilitation settings with the NHATS assessments considered to be the criterion standard. To enable the comparison, we crosswalked overlapping functional instruments and limited the comparisons to NHATS assessments performed no more than 90 days after an assessment in a rehabilitation setting. If the correlation and agreement between the assessments are satisfactory, they would support the use of Medicare rehabilitation functional assessments in some research contexts.

Study Design and Setting
The design was a cross-sectional validation study using retrospective data comparing functional assessments in Medicare rehabilitation settings with similar criterion-standard assessments from the NHATS. The setting was Medicare rehabilitation facilities (IRFs and SNFs), and older adults' homes for HHA and NHATS assessments. The study was approved by the University of Michigan institutional review board, and written informed consent was obtained from NHATS participants at the time of enrollment in NHATS. This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.

Data Sources
We used the 2011 to 2015 NHATS linked with Medicare files. The data were linked by the Centers for Medicare & Medicaid Services. Rehabilitation functional assessment data from IRFs, SNFs, or HHAs were obtained from Medicare functional assessment files. The NHATS is a population-based survey, with oversampling of the oldest population and African American participants. Trained staff perform annual in-person data collection from participants regarding their physical and cognitive function, social environment, and participation in daily activities. Detailed methods of the NHATS have been published previously. 8,11

Study Population
The inclusion criteria were NHATS participants with IRF, SNF, or HHA discharge rehabilitation claims no more than 90 days before the NHATS assessment. We excluded assessments when the participant was in a nursing home at the time of NHATS assessment because self-care activities are not obtained in this setting. We also excluded assessments when a hospitalization occurred in the period between the rehabilitation assessment and the NHATS assessment because we anticipated that the hospitalization could lead to a change in function that would not be captured by the assessments. When an individual had more than 1 eligible assessment, we used the assessment closest in time to the subsequent NHATS assessment, regardless of setting. Race/ethnicity was selfreported using options defined by NHATS.

Variables
There are different but overlapping functional assessment instruments for each rehabilitation setting, ie, the FIM, the Minimum Data Set, and the Outcome and Assessment Information Set (eTable 1 in the Supplement). We identified 6 functional capacity instrument components based on similar domains and similar scoring scales in both NHATS and the rehabilitation assessments. The overlapping components were eating, toilet hygiene, bathing, dressing, bed transfers, and mobility or walking. For each component, the scores indicated whether help (from a device or another person) was required to perform the activity and the extent of any help required, ranging from completely independent to completely dependent.
The questions used in the instruments were directly comparable for eating, toilet hygiene, and bathing. For the other domains, there were some differences in both the specific domains and how they were queried. Despite these differences, we included these domains because we concluded it was likely that the elements were sufficiently similar to capture the key underlying constructs. For bed transfers, the NHATS questions asked about transitions from bed, whereas the FIM question asked about transitions from bed, chairs, or wheelchairs. The dressing activity variable in NHATS does not specify upper or lower body, whereas the FIM uses separate variables for upper and lower body dressing. We used the most dependent FIM dressing score for the comparison, as done in previous research. 12 For mobility and walking functional capacity, we used the NHATS mobility outside questions and the FIM's locomotion assessment. Item scoring by rehabilitation setting appears in eTable 2, eTable 3, and eTable 4 in the Supplement. Each FIM item is scored from 1 to 7, with higher scores representing greater function. Based on our 6 FIM components, the overall functional score ranged from 7 to 49.
For rehabilitation setting assessments, staff score each item based on observed abilities during the previous 1 to 7 days. For NHATS, trained interviewers ask participants to self-report the level of function for each activity over the last month. 8,10,13 For each item, individuals are first asked if they require a device to perform the activity. Next, individuals are asked if anyone has helped them with the activity in the last month. Individuals who report help are then asked how often they performed the activity by themselves and without help. Variables in NHATS were converted to a 7-point scale to match the FIM scale (eTable 2, eTable 3, and eTable 4 in the Supplement).
To calculate the days between the rehabilitation assessment and the NHATS assessment, we used the date of the rehabilitation assessment and the fifteenth day of the month for the NHATS

JAMA Network Open | Statistics and Research Methods
Validation of Medicare Rehabilitation Functional Assessments in Routine Care assessment. The fifteenth day of the month was used for the NHATS date because NHATS only reports the month and year of the assessment.

Primary and Secondary Outcomes
The primary outcome was the overall function score. This was calculated by summing the scores for the 6 individual activities in each rehabilitation setting and NHATS. Secondary outcomes were the individual components of the overall function score.

Statistical Analysis
We used descriptive statistics to summarize the population, days between assessments, and the function scores. We compared the correlation of the overall function score between Medicare functional assessments and NHATS assessments using the Pearson correlation coefficient for eligible assessments within 90 days and separately for assessments within 30 and 15 days. We used linear regression to examine the association of the rehabilitation facility score with the NHATS score, the intercept, and the squared value of the correlation coefficient (ie, R 2 ) for the rehabilitation setting and NHATS assessment, adjusting for days between the assessments. Next, we added a variable of rehabilitation facility setting to assess differences explained by rehabilitation setting. We calculated the difference in assessment scores (NHATS score − rehabilitation service score) and used Bland-Altman difference against the mean plots to assess agreement between the 2 assessments and to visually inspect for variation in the differences in scores across the range of mean scores. 14 Items with missing data were infrequent (ie, <1% per item) and therefore excluded. All analyses were performed using Stata version 15.1 (StataCorp) and SAS statistical software version 9.4 (SAS Institute). Data analyses were performed from June 2019 to November 2019. Statistical significance was set at α < .05, and all tests were 2-tailed.

Study Population
From 2011 to 2015, we identified 6436 NHATS assessments that matched Medicare rehabilitation setting functional assessments. After excluding NHATS assessments that occurred 90 days or more after the rehabilitation assessment, assessments with an interim hospitalization, and multiple assessments per individual, our final study population included 1036 individuals (eFigure 1 in the Supplement). Characteristics of the study population are presented in

Correlation of Summary Assessments
The mean (SD) rehabilitation service functional score was 27.5 (7.  In the linear regression model adjusting for days between assessments, the rehabilitation assessments had a strong association with the NHATS assessments (β = 0.87; 95% CI 0.80 to 0.93).

JAMA Network Open | Statistics and Research Methods
Days since rehabilitation setting assessment was associated with a small increase in the NHATS  assessment score (β = 0.02; 95% CI, 0.00 to 0.04). The model intercept was 5.73 (95% CI, 3.77 to 7.70), and the overall R 2 was 0.40 ( Table 2). In the model that added rehabilitation setting, the association of the rehabilitation assessment with the NHATS assessments increased (β = 1.00; 95% CI, 0.93 to 1.08). The SNF setting assessment was associated with an increase in the disability score compared with the HHA setting (β = 4.02; 95% CI, 2.72 to 5.31), the intercept decreased to 0.72 (95% CI, −1.79 to 3.24), and the R 2 increased slightly to 0.42.

Agreement of Summary Assessments
The mean (SD) difference of the rehabilitation service scores (NHATS score − rehabilitation score) was 2.96 (7.91). Plots of the differences in the scores are displayed in Figure 2. The histogram of differences is consistent with a normal distribution (Figure 2A). The Bland-Altman plot of the differences in scores compared with the mean of the score showed that only 59 of 1036 individuals (5.7%) had a difference in function scores that was more than 2 SDs of the mean difference ( Figure 2B). Scores on NHATS were slightly higher than rehabilitation service scores at the high end of mean function scores. A relatively small number of individuals had low mean function scores (156 of 1036 individuals [15.1%] with mean scores Յ20). Among individuals with mean scores of 20 or less, there was a low frequency of individuals with NHATS scores greater than rehabilitation scores, particularly from the HHA setting, which had 64 individuals in this range (Figure 2; eFigure 2 in the Supplement). We did not observe a pattern of variation in differences by days between assessments ( Figure 2B).

Correlation of Individual Components
Mean scores and correlations of the 6 individual disability components by assessment time difference are displayed in Table 3

Discussion
Our study of more than 1000 individuals found that functional assessments in rehabilitation settings are correlated with criterion-standard research assessments. The assessments also had good overall agreement. These findings provide important new evidence to support the use of routine care assessments as a functional outcome measure in some contexts. Given that these measures can be linked to Medicare claims, the range of potential applications of these measures is broad.
The correlation of the 2 assessments performed within 90 days was in the range of what is considered moderate by some and substantial by others. [15][16][17] The magnitude of correlation suggests that the 2 assessments measure a similar underlying construct of function, which ranges from an independent to a dependent status. We also found that the correlation of the assessments increased slightly as the time between assessments narrowed from 90 days to 30 days. This finding supports construct validity, given that we expected function to change somewhat after rehabilitation, with some people continuing to improve while others experience some worsening.
The analysis of the differences of the scores found a small bias for higher NHATS scores overall and a normal distribution of differences with few outliers. The slightly higher scores in NHATS could be in part because of a difference in the scaling of items in NHATS compared with SNF and HHA, considering that the NHATS scale for each functional item was up to 7 points for individuals who are independent and do not use devices, whereas the functional scales were mostly truncated at 6 points for SNF and HHA. 12 Therefore, a fully independent person who does not use devices would score a 42 in NHATS but a 37 in SNF or HHA. Another potential reason for differences in scores is that the NHATS functional scores were obtained by self-report whereas function in rehabilitation services were rated by staff observing the patient. We were not able to determine with our data which approach was a more accurate overall measure of function.
While the overall NHATS scores were slightly higher than rehabilitation scores, we found that most individuals with lower mean function scores had NHATS assessments lower than rehabilitation assessments, particularly when compared with HHA assessments. The reason for this finding is not certain. First, only 156 individuals (64 from HHA) had lower mean function, so it is possible that the differences at the lower mean range were random selection error. Other possibilities are that individuals with lower mean scores may have had conditions that rendered them more susceptible to functional worsening after rehabilitation (eg, degenerative disorders) or conditions associated with lower self-perception of dependence (eg, depression) compared with independent staff ratings.
Future studies are needed to evaluate how reasons for rehabilitation and comorbidities influence agreement in the scales.
Functional assessments, a highly patient-centered measure, are challenging to incorporate into longitudinal studies. Therefore, the Medicare data makes it feasible to better understand function and the factors associated with it among older adults. The assessments could be used in observational studies linked to Medicare or potentially as a clinical trial outcome. Based on our findings, it would be appropriate to use data from rehabilitation settings in large studies like ours with

JAMA Network Open | Statistics and Research Methods
Validation of Medicare Rehabilitation Functional Assessments in Routine Care either a similar distribution of function or populations with moderate to high function. We recommend caution using this data in populations with low function until we have a better understanding of why rehabilitation scores are typically higher than NHATS scores in individuals with lower mean function scores.

Strengths and Limitations
The primary strengths of our study were rehabilitation assessments from routine care, a large sample, and a rigorous criterion standard. Including data from all 3 rehabilitation settings was another strength. Correlations for each setting were in the moderate range. By including all settings, we were able to derive an equation that can be used to standardize functional scores across sites.
Our study has important limitations in addition those already noted. We were only able to select FIM components that overlap in topics and scoring scales with the NHATS assessment. We had relatively few assessments from IRF after applying our exclusions. Because NHATS dates are limited to month and year, our difference in days could be off by up to 30 days. Our analysis was limited to discharge assessments from rehabilitation services. Therefore, we were not able to make conclusions about the validity of admission and interim assessments.
It is important to note that, as of fiscal year 2020, the FIM assessments will no longer be performed in IRFs. The assessment was replaced by the quality indicators in the Quality Reporting Program (QRP). Based on the considerable overlap of FIM and the QRP items, the FIM was dropped to reduce administrative burden. Although a separate validation of the NHATS with QRP-graded function would be preferred, it seems likely that that QRP measures will also generally correlate with criterion-standard measures of function given the similarity of FIM and QRP items. The major difference in FIM and QRP items is in the scaling of the items. Compared with the 7-point scales on the FIM, the QRP has 6-point scales because it does not consider the use of a device. The QRP categories also have a slight difference in what is considered supervision and the percentage of effort a helper provides to perform the activity. The QRP items are specifically used as quality metrics and therefore bias in scoring is also a concern, as it was with the FIM.

Conclusions
This study extends our knowledge of the validity of rehabilitation functional assessments in the routine care setting. Our results demonstrated that these data were correlated and agree with a rigorous criterion-standard functional assessment. These findings suggest that rehabilitation facility assessments provide sufficiently accurate estimates of function for application in many research contexts.