Use of Data-Driven Methods to Predict Long-term Patterns of Health Care Spending for Medicare Patients

Key Points Question What are the long-term spending patterns by Medicare beneficiaries, and do baseline patient factors that are potentially modifiable predict these patterns? Findings In this cohort study using a data-driven approach to classifying Medicare beneficiaries by their spending over 2 years, 5 patterns were identified and could be predicted, including those with consistent spending levels and others with spending that increased progressively. The most influential potentially modifiable factors were number of medications, number of office visits, and mean medication adherence. Meaning These findings suggest that spending by Medicare beneficiaries falls into 5 distinct groups and could be accurately predicted; this approach could be adapted by organizations to target interventions.


Introduction
With health care spending now accounting for almost 18% of the US gross domestic product, identifying individuals who may benefit from interventions to address potentially avoidable spending has become a central priority for health insurers and health care professionals. 1 Current approaches generally focus on prediction or intervention for patients who may have escalating costs on the basis of a single composite value of total spending over short time periods. 2,3 However, many patients experience substantial increases or decreases in spending not captured by these approaches. [4][5][6][7][8][9] For example, Tamang et al 10  These prior studies were conducted over a 1-year period, yet there may also be dynamic patterns of spending over longer periods that may have implications both for whom to outreach for intervention and when to do so. 1,12 For example, patients with the same clinical conditions who are hospitalized early during a 12-month period may differ meaningfully from those hospitalized later, although both could be identified as having rising costs. 13,14 If these different spending patterns could be predicted using routinely collected data, then the ability to better proactively differentiate patients with increasing or decreasing spending patterns could better target interventions to those who are at greatest need of improved health or cost containment. 15 The predictive accuracy of spending may also be higher when evaluating a long-term, compared with a short-term, time horizon as seen for other outcomes. 16 Accordingly, we sought to classify patients according to their spending patterns over a 2-year period and to evaluate the ability to predict these spending groups using patient characteristics that are potentially modifiable.

Methods
This cohort study was approved by the institutional review board of Brigham and Women's Hospital and was granted a waiver of informed patient consent because the data are secondary routinely collected data. This study follows reporting requirements of the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.

Setting and Study Design
This study used administrative claims data from a 1-million-member sample of Medicare fee-forservice beneficiaries; the original sample included approximately 20 000 beneficiaries in a nationwide quality improvement program and approximately 980 000 randomly selected patients nationally. 17 We restricted the cohort to the randomly selected patients and used their paid Medicare Parts A, B, and D patient-level files containing all procedures, physician encounters, hospitalizations, and filled outpatient prescriptions, including amounts paid by the insurer and patient. These data were linked to eligibility data including age, race/ethnicity, gender, and geographic location of residence. Aggregate zip code level data on median income and educational attainment were obtained by linking with 2010 US Census data.
To be included, patients had to be aged 65 years or older and maintain continuous eligibility from January 1, 2011, to December 31, 2013. The cohort entry date was defined as January 1, 2012, to provide 1 year of prior year of baseline data (year 0) and 2 years of follow-up data (year 1 and year 2) (eFigure 1 in the Supplement).

Costs
We measured total monthly health care spending over a 2-year period for each patient by summing the allowed amounts on all inpatient, outpatient, and prescription drug claims. Monthly costs were generated by summing the costs in each month and were standardized by dividing the summed costs by the number of days in that month and then multiplying the result by 30. Costs were then logarithmically transformed to normalize their distribution, after adding $0.01, as frequently done. 9,18 Costs were inflated using the Medical Care Component of the Consumer Price Index to 2013 dollars when necessary.

Predictors
Using data from Medicare enrollment files and claims, we defined 37 clinically relevant baseline characteristics that were potential predictors of future spending (eTable 1 in the Supplement). These baseline variables were measured during the 12 months prior to the 2-year period during which cost outcomes were evaluated (eFigure 1 in the Supplement). These variables were based on characteristics used in cost modeling in claims data in the peer-reviewed literature and from the quality-cost theoretical framework. 6,10,11,15 These sets of predictors have also been shown to have equivalent predictive accuracy of predicting 1-year spending as proprietary risk-adjustment methods. 11 Sociodemographic characteristics included age, race/ethnicity, gender, and community-level variables based on member's zip code of residence, including median household income and educational attainment. Clinical comorbidities were measured using International Classification of Diseases, Ninth Revision codes (eAppendix and eTable 1 in the Supplement). Each patients' number of unique prescriptions by generic name (ie, therapeutic complexity), physician office visits, emergency department visits, hospitalizations, unique physicians visited, unique pharmacies used, benefits' generosity 19 (copayments and deductibles or total net payments), and baseline year total costs were also measured. Adherence to long-term medication classes (eg, β-blockers) was measured in the baseline year. 11 For each class, we created a supply diary beginning with the first fill for each class in the baseline year. This diary linked all observed fills based on dispensing date and days' supply; switching was allowed within each class (eg, β-blockers). From this, we calculated the proportion of days covered (PDC) as a mean across classes that the patient filled to yield 1 mean PDC. 20,21 We categorized each predictor by whether it was potentially modifiable, defined by whether it could theoretically be addressed in interventions and by classifications in prior literature. 22,23 For example, number of unique physicians could be potentially modifiable, while race/ethnicity is not. In total, we classified 10 predictors as potentially modifiable ( Table 1).

Data-Driven Approach to Modeling Long-term Costs
We used trajectory modeling to empirically classify spending during follow-up. One advantage is that it allows the data to define the cost outcomes, rather than using arbitrarily selected thresholds. 24 It also considers changes in spending over time, rather than aggregating costs over a set time. 25 To define spending patterns, we used the previously described SAS procedure Proc Traj, a free add-on. [24][25][26] In brief, group-based trajectory models are an application of finite mixture modeling that identify clusters of individuals with similar outcome patterns over time. 24 This modeling approach analyzes longitudinal data by fitting a semiparametric (discrete) mixture model, estimating each individual's probability of membership in each group, and assigning them to the group according to their highest probability. We modeled longitudinal cost trajectories using calendar month as the time variable, costs in each month, order equal to 4, and a censored-normal distribution (linear between minimum and maximum values). 11,24,26 The models were estimated using a forward classifying approach using 2 to 7 groups, each time investigating model fit using the bayesian information criterion (BIC), whereby a lower BIC indicates  Tobacco use a better model fit. 24 The number of groups investigated was capped at 7 on the basis of groupings observed in prior work. 11 In addition to considering BIC, other key considerations in selecting the best-fitting trajectory were the ability to visually interpret separate groups, minimum membership probabilities in each group, and having 5% or more of the sample in each group. [26][27][28]

Statistical Analysis
After selecting the best fitting number of trajectories, we assessed the ability to predict membership in each 2-year trajectory group using boosted logistic regression, a nonparametric machine learning method. The boosted algorithm is considered one of the best data-mining approaches for prediction problems. 16,29 Specifically, the algorithm creates a prediction model by building numerous small regression trees that together provide highly accurate classification. 30 The boosting algorithm has several built-in protections from model overfitting, provides automatic variable selection, and describes the relative influence of predictors. 31 They also consider all possible interaction terms between potential predictors. We used the gbm package in R with 5-fold cross-validation to identify the optimal number of trees and applied standard default values for tuning parameters to identify the optimal model. 16 For each trajectory group, we estimated 2 separate models. The first included all 37 baseline predictors (model 1) and the second included only the 10 baseline predictors that were considered a priori to be potentially modifiable (model 2). Because of the ability of boosted regression to handle missing data, an indicator of long-term medication use and mean PDC were both included as variables for model 1, and mean PDC was included alone as a variable for model 2.
To avoid overoptimism bias, we used internal split-sample validation by randomly dividing the full cohort into 2 halves as an initial derivation sample and a validation sample for all models. 32 We evaluated each model through discrimination measures. 33 Discrimination, the model's ability to distinguish between patients who do and do not experience the outcome, was measured by the C-statistic, which ranges from 0.5 (noninformative model) to 1.0 (perfect prediction). 34,35 For clinical context, we explored the association between potentially modifiable baseline characteristics and membership in a rising-cost trajectory compared with other trajectory groups that had similar spending at baseline. Specifically, we used multivariable logistic regression to compare membership in the rising-cost trajectory, including each potentially modifiable variable vs other groups. This approach provides insight into baseline factors that may help distinguish patients who become costly later (ie, at least a year later) and potential levers for interventions. We also explored the relative influence of each potentially modifiable predictor from model 2.
We also evaluated the ability to predict patients who experience rising costs in year 2 defined using a decile-threshold approach (ie, those in the lower 90% of spending in year 1 and then were in the top 10% of spending in year 2 10 ) and patients who in trajectory modeling were estimated as belonging to a rising-cost trajectory. For this approach, we estimated each outcome with 2 additional models with boosted regression. Model 3 used all baseline predictors, and model 4 used the potentially modifiable predictors. This approach helps provide insight into whether these spending increases could be accurately predicted using baseline information less temporal to the spending changes, which could ultimately inform intervention design and allow more time for them to be implemented.
We conducted several sensitivity analyses. Although our primary analysis included zip code sociodemographic characteristics, we also included patients' region of residence based on enrollment files as a predictor in model 1. Then, we included adherence to each class separately as predictors in models 1 and 2. Finally, we repeated measurements and analyses in a subsequent year (ie, 2012-2014) to confirm generalizability (eAppendix in the Supplement).
All analyses except for the boosted regression were performed using SAS version 9.4 (SAS Institute); the boosting algorithm was performed using R version 3.4.1 (The R Project for Statistical Computing). Statistical analysis was performed from August 2018 to December 2019.

Study Population and Characteristics
Our cohort consisted of 329 476 patients (eTable 2 in the Supplement). Their mean (SD) age was 76.0 (7.2) years, and 190 346 (57.8%) were women. A 5-group trajectory model best described the 2-year spending patterns ( Figure);  Table 1. Table 2 shows the results of the main prediction models in the validation sample.  The mean observed spending levels using 5-group trajectory modeling in the full sample are plotted. The percentages in the key refer to the number of patients who belong to each trajectory group out of the full cohort (bayesian information criterion for this model: 21704747).

Discussion
Using a data-driven approach to classify 2-year health spending for Medicare beneficiaries, we observed 5 distinct spending patterns. Membership in these groups could be accurately predicted, even when using a simple set of potentially modifiable characteristics from claims data. These results suggest that this approach could potentially help inform the design, application, and timing of interventions.
Prior efforts to predict health care spending have generally focused on a single composite value, such as total yearly costs or a threshold-based measure, such as being in the top 5% of spending, both of which collapse an entire year's spending into a static variable. These approaches have had modest accuracy; C-statistics for threshold-based outcomes have generally ranged from 0.6 to 0.8. 2,5,36,37 Two recently published approaches offer other cluster-based solutions to elucidate subgroups of high-cost patients with some notable successes. 38,39 However, these were not applied to evaluate changes in spending, outcomes over more than 1 year, or to elucidate patients with rising costs. 38,39 They also focused on Medicare Advantage populations, which can differ from fee-forservice beneficiaries. 40,41 Patients may have dynamic patterns of spending over longer periods of time that can be potentially meaningful, with implications on whom to outreach for intervention as well as when and perhaps how to do so. 1,12 For example, Tamang et al 10 identified low-spending patients in 1 year whose costs bloomed in the subsequent year using thresholds. When applied to our data, the ability to predict these patients using baseline data alone was modest. Using a data-driven approach, we observed a similarly sized group whose costs later increased that could be predicted somewhat better. One possible explanation could be that the 2-year time horizon itself as an outcome helped discriminate between groups. The ability to proactively differentiate between patients with rising or falling spending patterns using distally measured variables could better target interventions to those who are at greatest need. If successful, using these longer time horizons could allow more time for the implementation of potential interventions. 42 Focusing interventions on patients with rising costs has some theoretical advantages, even though predictive ability was modest. First, the size of the group identified in this study was modest (ie, 7.5%). Of course, it still may be infeasible to intervene upon a group this large, and not all costs may be preventable. Identifying additional segmentation may be necessary, and the use of this approach may be just a starting point. Regardless, the ability to predict better could target interventions to those at greater need, and targeting has been shown to result in better populationlevel outcomes. 43 When considering potential interventions, a prediction rule comprising the most influential potentially modifiable variables could be applied to better target patients. We observed several clinically actionable characteristics, such as therapeutic complexity (ie, number of medications or office visits), depression, medication adherence, and tobacco use that could be levers for interventions. Filling fewer medications and having fewer office visits were also predictors of the rising-cost trajectory, suggesting that patients may not be getting sufficient care to prevent future escalation of health problems. 22 This information could also be used for intervention design to improve care.
Many health care organizations, insurers, researchers, and policy makers use claims data to identify patients for interventions. Therefore, the ability to better leverage these routinely collected data for cost predictions and interventions with a variety of more nuanced cost-modeling methods holds wide potential. Moreover, using data-driven approaches to classify longer-term spending may hold promise compared with threshold-based approaches alone.

Limitations
Several limitations warrant mention. First, we examined trajectories from January to December; patients with incomplete enrollment or other policy start and end dates may differ. Because of differences in how outcomes are categorized, model performance of predicting a cost trajectory (binary outcome) cannot be directly compared with predicting total costs (continuous outcome) or patients defined by the rising-cost decile-threshold approach. The variables included in prediction models may also not be exhaustive, and although we used validated algorithms, they may be insufficiently sensitive. Trajectory modeling also provides predicted group membership; individual members may be assigned to their closest trajectory, but there could be within-group heterogeneity.
The high-cost group was large, possibly because of how the model was specified (ie, log costs); one could potentially apply trajectories to identify subgroups within that group for further segmentation.
Although group distribution did not differ on the basis of geographical region, the costs themselves were not adjusted for region; similarly, moving could have impacted relative changes in spending, but this was beyond the scope of this study. Furthermore, these results may not be generalizable to other payment systems, such as non-fee-for-service Medicare, Medicaid, or commercially insured beneficiaries. Although these other beneficiaries may have different spending levels, prior work has