Assessment of the Medicare Advantage Risk Adjustment Model for Measuring Veterans Affairs Hospital Performance

IMPORTANCE Policymakers and consumers are eager to compare hospitals on performance metrics, such as surgical complications or unplanned readmissions, measured from administrative data. Fair comparisons depend on risk adjustment algorithms that control for differences in case mix. OBJECTIVE To examine whether the Medicare Advantage risk adjustment system version 21 (V21) adequately risk adjusts performance metrics for Veterans Affairs (VA) hospitals. DESIGN, SETTING, AND PARTICIPANTS This cohort analysis of administrative data from all 5.5 million veterans who received VA care or VA-purchased care in 2012 was performed from September 8, 2015, to October 22, 2018. Data analysis was performed from January 22, 2016, to October 22, 2018. EXPOSURES A patient’s risk as measured by the V21 model. MAIN OUTCOMES AND MEASURES The main outcome was total cost, and the key independent variable was the V21 risk score. RESULTS Of the 5 472 629 VA patients (mean [SD] age, 63.0 [16.1] years; 5 118 908 [93.5%] male), the V21 model identified 694 706 as having a mental health or substance use condition. In contrast, a separate classification system for psychiatric comorbidities identified another 1 266 938 patients with a mental health condition. The V21 model missed depression not otherwise specified (396 062 [31.3%]), posttraumatic stress disorder (345 338 [27.3%]), and anxiety (129 808 [10.2%]). Overall, the V21 model underestimated the cost of care by $2314 (6.7%) for every person with a mental health diagnosis. CONCLUSIONS AND RELEVANCE The findings suggest that current aspirations to engender competition by comparing hospital systems may not be appropriate or fair for safety-net hospitals, including the VA hospitals, which treat patients with complex psychiatric illness. Without better risk scores, which is technically possible, outcome comparisons may potentially mislead consumers and policymakers and possibly aggravate inequities in access for such vulnerable populations. JAMA Network Open. 2018;1(8):e185993. doi:10.1001/jamanetworkopen.2018.5993

The evolution of Hospital Compare is consistent with efforts to increase transparency and competition. 2 For the VA hospitals, this push coincides with the passage of the $55 billion VA Mission Act, which supports veterans' ability to choose where they get care. Although it seems reasonable to suggest that greater transparency and any ensuing competition will help patients, including veterans, some researchers have suggested that the VA hospitals do not compare well with commercial hospitals and that the VA hospitals should expand their role as purchasers. 3 However, the Commission on Care, among others, concluded that the VA hospitals work well but need modernization so that they can be a learning health care system, as envisioned by the Institute of Medicine. 4,5 Whether increasing transparency through hospital comparisons will motivate socially beneficial competition is unclear. The CMS publishes performance metrics on Hospital Compare, but the risk adjustment algorithms underlying these metrics are often unclear. The recent literature has questioned whether existing risk adjustment algorithms, including those used by the CMS to pay Medicare Advantage (MA) plans, accurately adjust for mental health comorbidities. For example, Montz and colleagues 6 used commercial claim data from the Truven Health Analytics database to examine adjustment methods and payments to health plans. They found that the CMS risk adjustment algorithm missed 80% of individuals with a mental health or substance use diagnosis, leading to a systematic underpayment to plans for these individuals. 6 Shrestha and colleagues 7 followed up on this work by testing 21 algorithms for measuring mental health and substance use.
They found notable variation in model performance but that substantial gains of as much as 10% were possible when analyzing commercial claims. Whether these findings translate to other hospitals that have a higher prevalence of patients with mental health and substance use problems is unknown.
We examined the applicability of using the Medicare risk adjustment model for comparing VA hospitals. We focused on the VA because it is a large safety-net institution that is under pressure to compare its hospitals with non-VA hospitals with the expectation that greater transparency will lead to improvements in access, quality, and cost. The importance of appropriate risk adjustment is highlighted by a recent Agency for Healthcare Research and Quality report, 8 which found that veterans who receive care in the VA system are sicker than veterans who receive care elsewhere.
However, whether existing risk adjustment models can level the playing field of statistical risk adjustment is unclear. In this study, we computed risk-adjusted costs for all VA patients in 2012 and then examined predicted costs for different subgroups, including patients with a diagnosis of diabetes, a mental health condition, or dementia. We used the CMS MA risk adjustment system version 21 (V21) because it is publicly available and has been used to adjust metrics published on CMS' Hospital Compare website. In addition, it allowed us to examine whether technical improvements in the risk models were sufficient to overcome the deficiencies in the V21 model.

Study Population and Data
This study, performed from September 8, 2015, to October 22, 2018, included all 5.5 million veterans who received VA inpatient or outpatient care in 2012. We excluded patients who only used the VA for medications and who had no other VA use. We also excluded veterans who received care exclusively through other insurance programs. Veterans older than 65 years are selective in their use of VA and Medicare services. 9 To avoid biased cost data, we included all VA and Medicare costs. For all participants, we obtained their VA and Medicare Part A, B, and D data. We excluded MA claims, which were not available, but noted that many veterans are enrolled in both VA and MA plans. 10 The data included demographic information and International Classification of Diseases, Ninth Revision (ICD-9) diagnostics codes from inpatient and outpatient use. For VA costs, we used the VA Health Economics Resource Center (HERC) mean cost data for ambulatory care and inpatient care and VA managerial cost accounting data for pharmacy costs. We added payments from VA-purchased care as reported in the Fee Basis system. Annualized HERC and VA managerial cost accounting costs are similar, 11

Measures
For all VA patients, we obtained demographic information from the VA enrollment files. For each patient, we computed their risk score using the V21 model. For patients who spent less than 90 days in skilled nursing or long-term care, we used the V21 community score. For patients who spent more than 90 days in a skilled nursing or long-term care facility, we used the institutionalized V21 score. We included all diagnostic codes from both VA administrative data and Medicare claims data from the prior year (2011). Because many veterans also receive care from Medicare, 9 the inclusion of diagnosis codes from Medicare claims data allowed us to capture the risk profiles of veterans who used both systems.
The V21 model creates 83 hierarchical condition categories (HCCs), including 4 for mental health and substance use (HCC54 drug/alcohol psychosis, HCC55 drug/alcohol dependence, HCC57 schizophrenia, and HCC58 major depressive, bipolar, and paranoid disorders). The HCCs, although it includes the same 4 mental health HCCs that were used in V21. 12 We also measured mental health comorbidities using the Psychiatric Case Mix System (PsyCMS) 13 ; specific ICD-9 and ICD-10 coding for the PsyCMS can be found online. 14,15 We computed the total cost of care for all veterans who used VA care in 2012. This total included all VA costs and payments by Medicare Parts A, B, and D. We included VA and Medicare costs to understand the full cost of care for these patients; analyzing only VA costs might bias the results by focusing on existing distortions in the marketplace.

Statistical Analysis
Data analysis was performed from January 22, 2016, to October 22, 2018. We regressed total costs on patients' V21 risk scores. We used a linear model because the MA payment formula uses a linear additive model and estimated it using ordinary least squares. 16 Using the regression estimates, we calculated predicted costs for all patients and compared predicted costs with actual costs. We did this by decile of predicted costs. This goodness-of-fit test showed how the risk adjustment model fits data by decile of predicted costs.
To explore whether the V21 risk adjustment could be improved, in a second set of regression models, we included indicators for 47 mental health conditions as measured by the PsyCMS. 13 This grouping was developed to measure mental health and substance use in risk adjustment. We examined goodness of fit for all VA patients and for 3 subgroups: patients with diabetes, patients with a mental health diagnosis, and patients with dementia. The main comparisons of interest were how the patients with a mental health diagnosis, as measured by the V21, compared with all VA patients and those with diabetes. We chose diabetes because it is a common chronic condition that results in considerable costs. Dementia was included because it often requires custodial care, which the VA provides and Medicare does not cover. This comparison offers insights on whether risk adjustment models built on Medicare data are sufficient for comparing VA hospitals, which provide a different scope of services. We performed sensitivity analyses using general linear models (log link and a γ distribution) and a square root transformed ordinary least squares model. All analyses used a 2-sided test with P < .05 considered to be statistically significant. Overall, the V21 model underestimated costs for patients with low costs and overestimated costs for patients with above-average costs except for the top decile (Table 3). However, when the sample was separated by diagnosis, the V21 model fit the diabetes population well across most of the deciles. For mental health, however, the V21 universally underestimated costs across every decile (Table 4). Overall, this resulted in an underestimate cost of $2314 per person (6.7%) for every person with a mental health diagnosis.

Improving the Model Fit for Mental Health
The Figure gives the mean difference between the predicted costs and actual costs by decile. A perfect fit across the deciles would be a horizontal line at zero. Adding the 47 PsyCMS condition categories improved the model fit for patients with a mental health condition, but the data showed that measurement issues remain, suggesting continued room for improvement. The results were not sensitive in the analytical model, although model fit statistics varied across models. The R 2 was 0.12   in the ordinary least squares model, which is consistent with reported fit statistics for the V21 model. 16 Inclusion of the 47 psychiatric condition categories improved the R 2 to 0.14. Results were robust to the model choice; in the sensitivity analysis, the best-fitting model was the square root transformed model, which had an R 2 of 0.19 with the V21 model and an R 2 of 0.22 with the V21 model augmented with PsyCMS groups. Table 4 also gives the cost estimates for patients with dementia (n = 157 907). We used the institutionalized risk score for individuals who spent more than 90 days in skilled nursing or longterm care facilities. In this group, the V21 model underestimated costs by $1841 in the lowest cost decile, and this difference increased with each subsequent decile. In the highest cost decile, the difference between the expected and actual cost was $12 813.

Discussion
Policymakers and consumers are eager to compare hospitals. Working to meet this demand, CMS provides a website that enables people to compare hospitals, including VA and DoD hospitals, on different performance metrics. Many comparisons focus on medical-surgical care, but it is possible to compare nursing homes, and CMS is rolling out additional comparisons, such as hospice. A motivating factor behind these websites is that greater transparency and more information will create incentives to improve quality of care by engendering competition. A critical assumption is that the risk adjustment algorithms used by Hospital Compare are sufficient to enable fair comparisons across performance metrics (eg, surgical complications, unplanned readmissions, or costs).

Limitations
This study relies on data with different coding practices by VA and non-VA hospitals. One question is whether poor coding in the VA could have led to the results. The VA facilities receive capitated payments for each patient, and the practitioners are salaried; therefore, there are few incentives to code meticulously. In contrast, physicians in private practice, especially those with MA patients, have incentives that reward detailed coding. 37 The question of bias attributable to poor coding in the VA hinges on whether VA practitioners are more likely to undercode mental health or physical health comorbidities. An article by Yoon and Chow 38 suggests that VA practitioners are more likely to undercode mental health than other conditions. Thus, if mental health comorbidities are being under coded uniformly, our analysis is biased toward the null and these results are likely to be a conservative estimate.
Another limitation of this study is that we only tested the model fit for the V21 model. Other approaches that may work for disadvantaged populations include template matching, 39 stratification, and peer comparisons, 40 but their feasibility and practicality need to be tested.
Commercially available risk adjustment algorithms may do a better job fitting VA data, but this would only further underscore the need to be careful when choosing a risk-adjusting algorithm because not all of them are useful for comparing health care systems.
The results generalize to the hospitals in the VA health care system. Variation is often seen across VA hospitals, and it is likely that individual VA hospitals differ in terms of the percentage of patients with mental health comorbidities, which could affect their ratings in Hospital Compare. It is