Use of Data-Driven Methods to Predict Long-term Patterns of Health Care Spending for Medicare Patients | Health Care Economics, Insurance, Payment | JAMA Network Open | JAMA Network
[Skip to Navigation]
Sign In
Figure.  2-Year Spending Patterns Using Trajectory Modeling
2-Year Spending Patterns Using Trajectory Modeling

The mean observed spending levels using 5-group trajectory modeling in the full sample are plotted. The percentages in the key refer to the number of patients who belong to each trajectory group out of the full cohort (bayesian information criterion for this model: 21704747).

Table 1.  Patient Characteristics by Spending Trajectory
Patient Characteristics by Spending Trajectory
Table 2.  Ability of Models to Predict 2-Year Spending Trajectory Groups
Ability of Models to Predict 2-Year Spending Trajectory Groups
Table 3.  Association Between Potentially Modifiable Factors and Membership in the Rising-Cost Spending Trajectory (Group 3) vs Other Trajectory Groupsa
Association Between Potentially Modifiable Factors and Membership in the Rising-Cost Spending Trajectory (Group 3) vs Other Trajectory Groupsa
1.
Martin  AB, Hartman  M, Washington  B, Catlin  A; National Health Expenditure Accounts Team.  National health spending: faster growth in 2015 as coverage expands and utilization increases.   Health Aff (Millwood). 2017;36(1):166-176. doi:10.1377/hlthaff.2016.1330PubMedGoogle ScholarCrossref
2.
Kuo  RN, Dong  YH, Liu  JP, Chang  CH, Shau  WY, Lai  MS.  Predicting healthcare utilization using a pharmacy-based metric with the WHO’s Anatomic Therapeutic Chemical algorithm.   Med Care. 2011;49(11):1031-1039. doi:10.1097/MLR.0b013e31822ebe11PubMedGoogle ScholarCrossref
3.
Perkins  AJ, Kroenke  K, Unützer  J,  et al.  Common comorbidity scales were similar in their ability to predict health care costs and mortality.   J Clin Epidemiol. 2004;57(10):1040-1048. doi:10.1016/j.jclinepi.2004.03.002PubMedGoogle ScholarCrossref
4.
Sales  AE, Liu  CF, Sloan  KL,  et al.  Predicting costs of care using a pharmacy-based measure risk adjustment in a veteran population.   Med Care. 2003;41(6):753-760. doi:10.1097/01.MLR.0000069502.75914.DDPubMedGoogle Scholar
5.
Fishman  PA, Goodman  MJ, Hornbrook  MC, Meenan  RT, Bachman  DJ, O’Keeffe Rosetti  MC.  Risk adjustment using automated ambulatory pharmacy data: the RxRisk model.   Med Care. 2003;41(1):84-99. doi:10.1097/00005650-200301000-00011PubMedGoogle ScholarCrossref
6.
Powers  CA, Meyer  CM, Roebuck  MC, Vaziri  B.  Predictive modeling of total healthcare costs using pharmacy claims data: a comparison of alternative econometric cost modeling techniques.   Med Care. 2005;43(11):1065-1072. doi:10.1097/01.mlr.0000182408.54390.00PubMedGoogle ScholarCrossref
7.
Forrest  CB, Lemke  KW, Bodycombe  DP, Weiner  JP.  Medication, diagnostic, and cost information as predictors of high-risk patients in need of care management.   Am J Manag Care. 2009;15(1):41-48.PubMedGoogle Scholar
8.
Yarger  S, Rascati  K, Lawson  K, Barner  J, Leslie  R.  Analysis of predictive value of four risk models in Medicaid recipients with chronic obstructive pulmonary disease in Texas.   Clin Ther. 2008;30(Spec No):1051-1057. doi:10.1016/j.clinthera.2008.06.001PubMedGoogle ScholarCrossref
9.
Mihaylova  B, Briggs  A, O’Hagan  A, Thompson  SG.  Review of statistical methods for analysing healthcare resources and costs.   Health Econ. 2011;20(8):897-916. doi:10.1002/hec.1653PubMedGoogle ScholarCrossref
10.
Tamang  S, Milstein  A, Sørensen  HT,  et al.  Predicting patient ‘cost blooms’ in Denmark: a longitudinal population-based study.   BMJ Open. 2017;7(1):e011580. doi:10.1136/bmjopen-2016-011580PubMedGoogle Scholar
11.
Lauffenburger  JC, Franklin  JM, Krumme  AA,  et al.  Longitudinal patterns of spending enhance the ability to predict costly patients: a novel approach to identify patients for cost containment.   Med Care. 2017;55(1):64-73. doi:10.1097/MLR.0000000000000623PubMedGoogle ScholarCrossref
12.
Druss  BG, Marcus  SC, Olfson  M, Tanielian  T, Elinson  L, Pincus  HA.  Comparing the national economic burden of five chronic conditions.   Health Aff (Millwood). 2001;20(6):233-241. doi:10.1377/hlthaff.20.6.233PubMedGoogle ScholarCrossref
13.
Ziaeian  B, Fonarow  GC.  The prevention of hospital readmissions in heart failure.   Prog Cardiovasc Dis. 2016;58(4):379-385. doi:10.1016/j.pcad.2015.09.004PubMedGoogle ScholarCrossref
14.
Barnett  ML, Hsu  J, McWilliams  JM.  Patient characteristics and differences in hospital readmission rates.   JAMA Intern Med. 2015;175(11):1803-1812. doi:10.1001/jamainternmed.2015.4660PubMedGoogle ScholarCrossref
15.
Nuckols  TK, Escarce  JJ, Asch  SM.  The effects of quality of care on costs: a conceptual framework.   Milbank Q. 2013;91(2):316-353. doi:10.1111/milq.12015PubMedGoogle ScholarCrossref
16.
Franklin  JM, Shrank  WH, Lii  J,  et al.  Observing versus predicting: initial patterns of filling predict long-term adherence more accurately than high-dimensional modeling techniques.   Health Serv Res. 2016;51(1):220-239. doi:10.1111/1475-6773.12310PubMedGoogle ScholarCrossref
17.
Krumme  AA, Glynn  RJ, Schneeweiss  S,  et al.  Medication synchronization programs improve adherence to cardiovascular medications and health care use.   Health Aff (Millwood). 2018;37(1):125-133. doi:10.1377/hlthaff.2017.0881PubMedGoogle ScholarCrossref
18.
Austin  PC, Ghali  WA, Tu  JV.  A comparison of several regression models for analysing cost of CABG surgery.   Stat Med. 2003;22(17):2799-2815. doi:10.1002/sim.1442PubMedGoogle ScholarCrossref
19.
Artz  MB, Hadsall  RS, Schondelmeyer  SW.  Impact of generosity level of outpatient prescription drug coverage on prescription drug events and expenditure among older persons.   Am J Public Health. 2002;92(8):1257-1263. doi:10.2105/AJPH.92.8.1257PubMedGoogle ScholarCrossref
20.
Benner  JS, Glynn  RJ, Mogun  H, Neumann  PJ, Weinstein  MC, Avorn  J.  Long-term persistence in use of statin therapy in elderly patients.   JAMA. 2002;288(4):455-461. doi:10.1001/jama.288.4.455PubMedGoogle ScholarCrossref
21.
Choudhry  NK, Shrank  WH, Levin  RL,  et al.  Measuring concurrent adherence to multiple related medications.   Am J Manag Care. 2009;15(7):457-464.PubMedGoogle Scholar
22.
Goetzel  RZ, Pei  X, Tabrizi  MJ,  et al.  Ten modifiable health risk factors are linked to more than one-fifth of employer-employee health care spending.   Health Aff (Millwood). 2012;31(11):2474-2484. doi:10.1377/hlthaff.2011.0819PubMedGoogle ScholarCrossref
23.
Yusuf  S, Hawken  S, Ounpuu  S,  et al; INTERHEART Study Investigators.  Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study.   Lancet. 2004;364(9438):937-952. doi:10.1016/S0140-6736(04)17018-9PubMedGoogle ScholarCrossref
24.
Jones  BL, Nagin  DS.  Advances in group-based trajectory modeling and a SAS procedure for estimating them.   Sociol Methods Res. 2007;35(4):542-571. doi:10.1177/0049124106292364Google ScholarCrossref
25.
Franklin  JM, Shrank  WH, Pakes  J,  et al.  Group-based trajectory models: a new approach to classifying and predicting long-term medication adherence.   Med Care. 2013;51(9):789-796. doi:10.1097/MLR.0b013e3182984c1fPubMedGoogle ScholarCrossref
26.
Jones  BL, Nagin  DS, Roeder  K.  A SAS procedure based on mixture models for estimating developmental trajectories.   Sociol Methods Res. 2001;29:374-393. doi:10.1177/0049124101029003005Google ScholarCrossref
27.
Li  Y, Zhou  H, Cai  B,  et al.  Group-based trajectory modeling to assess adherence to biologics among patients with psoriasis.   Clinicoecon Outcomes Res. 2014;6:197-208. doi:10.2147/CEOR.S59339PubMedGoogle ScholarCrossref
28.
Franklin  JM, Krumme  AA, Tong  AY,  et al.  Association between trajectories of statin adherence and subsequent cardiovascular events.   Pharmacoepidemiol Drug Saf. 2015;24(10):1105-1113. doi:10.1002/pds.3787PubMedGoogle ScholarCrossref
29.
Koh  HC, Tan  G.  Data mining applications in healthcare.   J Healthc Inf Manag. 2005;19(2):64-72.PubMedGoogle Scholar
30.
Robinson  JW.  Regression tree boosting to adjust health care cost predictions for diagnostic mix.   Health Serv Res. 2008;43(2):755-772. doi:10.1111/j.1475-6773.2007.00761.xPubMedGoogle ScholarCrossref
31.
Varian  HR.  Big data: new tricks for econometrics.   J Econ Perspect. 2014;28(2):3-28. doi:10.1257/jep.28.2.3Google ScholarCrossref
32.
Steyerberg  EW, Harrell  FE  Jr, Borsboom  GJ, Eijkemans  MJ, Vergouwe  Y, Habbema  JD.  Internal validation of predictive models: efficiency of some procedures for logistic regression analysis.   J Clin Epidemiol. 2001;54(8):774-781. doi:10.1016/S0895-4356(01)00341-9PubMedGoogle ScholarCrossref
33.
Waljee  AK, Higgins  PD, Singal  AG.  A primer on predictive models.   Clin Transl Gastroenterol. 2014;5:e44. doi:10.1038/ctg.2013.19PubMedGoogle Scholar
34.
Steyerberg  EW, Vickers  AJ, Cook  NR,  et al.  Assessing the performance of prediction models: a framework for traditional and novel measures.   Epidemiology. 2010;21(1):128-138. doi:10.1097/EDE.0b013e3181c30fb2PubMedGoogle ScholarCrossref
35.
Cook  NR.  Use and misuse of the receiver operating characteristic curve in risk prediction.   Circulation. 2007;115(7):928-935. doi:10.1161/CIRCULATIONAHA.106.672402PubMedGoogle ScholarCrossref
36.
Liu  CF, Sales  AE, Sharp  ND,  et al.  Case-mix adjusting performance measures in a veteran population: pharmacy- and diagnosis-based approaches.   Health Serv Res. 2003;38(5):1319-1337. doi:10.1111/1475-6773.00179PubMedGoogle ScholarCrossref
37.
Zhao  Y, Ash  AS, Ellis  RP,  et al.  Predicting pharmacy costs and other medical costs using diagnoses and drug claims.   Med Care. 2005;43(1):34-43.PubMedGoogle Scholar
38.
Yan  J, Linn  KA, Powers  BW,  et al.  Applying machine learning algorithms to segment high-cost patient populations.   J Gen Intern Med. 2019;34(2):211-217. doi:10.1007/s11606-018-4760-8PubMedGoogle ScholarCrossref
39.
Powers  BW, Yan  J, Zhu  J,  et al.  Subgroups of high-cost Medicare Advantage patients: an observational study.   J Gen Intern Med. 2019;34(2):218-225. doi:10.1007/s11606-018-4759-1PubMedGoogle ScholarCrossref
40.
Powell  SK.  Choosing Medicare Advantage plans versus traditional fee-for-service: is this change the tipping point?   Prof Case Manag. 2019;24(1):1-3. doi:10.1097/NCM.0000000000000338PubMedGoogle ScholarCrossref
41.
Raetzman  SO, Hines  AL, Barrett  ML, Karaca  Z. Hospital stays in Medicare Advantage Plans versus the traditional Medicare fee-for-service program, 2013: statistical brief #198. Published December 2015. Accessed August 5, 2019. https://www.hcup-us.ahrq.gov/reports/statbriefs/sb198-Hospital-Stays-Medicare-Advantage-Versus-Traditional-Medicare.jsp
42.
Stadhouders  N, Kruse  F, Tanke  M, Koolman  X, Jeurissen  P.  Effective healthcare cost-containment policies: a systematic review.   Health Policy. 2019;123(1):71-79. doi:10.1016/j.healthpol.2018.10.015PubMedGoogle ScholarCrossref
43.
Lauffenburger  JC, Lewey  J, Jan  S,  et al.  Effectiveness of targeted insulin-adherence interventions for glycemic control using predictive analytics among patients with type 2 diabetes: a randomized clinical trial.   JAMA Netw Open. 2019;2(3):e190657. doi:10.1001/jamanetworkopen.2019.0657PubMedGoogle Scholar
Limit 200 characters
Limit 25 characters
Conflicts of Interest Disclosure

Identify all potential conflicts of interest that might be relevant to your comment.

Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.

Err on the side of full disclosure.

If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.

Not all submitted comments are published. Please see our commenting policy for details.

Limit 140 characters
Limit 3600 characters or approximately 600 words
    Original Investigation
    Health Policy
    October 19, 2020

    Use of Data-Driven Methods to Predict Long-term Patterns of Health Care Spending for Medicare Patients

    Author Affiliations
    • 1Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
    • 2Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
    JAMA Netw Open. 2020;3(10):e2020291. doi:10.1001/jamanetworkopen.2020.20291
    Key Points

    Question  What are the long-term spending patterns by Medicare beneficiaries, and do baseline patient factors that are potentially modifiable predict these patterns?

    Findings  In this cohort study using a data-driven approach to classifying Medicare beneficiaries by their spending over 2 years, 5 patterns were identified and could be predicted, including those with consistent spending levels and others with spending that increased progressively. The most influential potentially modifiable factors were number of medications, number of office visits, and mean medication adherence.

    Meaning  These findings suggest that spending by Medicare beneficiaries falls into 5 distinct groups and could be accurately predicted; this approach could be adapted by organizations to target interventions.

    Abstract

    Importance  Current approaches to predicting health care costs generally rely on a single composite value of spending and focus on short time horizons. By contrast, examining patients’ spending patterns using dynamic measures applied over longer periods may better identify patients with different spending and help target interventions to those with the greatest need.

    Objective  To classify patients by their long-term, dynamic health care spending patterns using a data-driven approach and assess the ability to predict spending patterns, particularly using characteristics that are potentially modifiable through intervention.

    Design, Setting, and Participants  This cohort study used a retrospective cohort design from a random nationwide sample of Medicare fee-for-service administrative claims data to identify beneficiaries aged 65 years or older with continuous eligibility from 2011 to 2013. Statistical analysis was performed from August 2018 to December 2019.

    Main Outcomes and Measures  Group-based trajectory modeling was applied to the claims data to classify the Medicare beneficiaries by their total health care spending patterns over a 2-year period. The ability to predict membership in each trajectory spending group was assessed using generalized boosted regression, a data mining approach to model building and prediction, with split-sample validation. Models were estimated using (1) prior-year predictors and (2) prior-year predictors potentially modifiable through intervention measured in the claims data. These models were evaluated using validated C-statistics. The relative influence of individual predictors in the models was evaluated.

    Results  Among the 329 476 beneficiaries, the mean (SD) age was 76.0 (7.2) years and 190 346 (57.8%) were female. This final 5-group model included a minimal-user group (group 1, 37 572 individuals [11.4%]), a low-cost group (group 2, 48 575 individuals [14.7%]), a rising-cost group (group 3, 24 736 individuals [7.5%]), a moderate-cost group (group 4, 83 338 individuals [25.3%]), and a high-cost group (group 5, 135 255 individuals [41.2%]). Potentially modifiable characteristics strongly predicted these patterns (C-statistics range: 0.68-0.94). For groups with progressively increasing spending in particular, the most influential factors were number of medications (relative influence: 29.2), number of office visits (relative influence: 30.3), and mean medication adherence (relative influence: 33.6).

    Conclusions and Relevance  Using a data-driven approach, distinct spending patterns were identified with high accuracy. The potentially modifiable predictors of membership in the rising-cost group represent important levers for early interventions that may prevent later spending increases. This approach could be adapted by organizations to target quality improvement interventions, particularly because numerous health care organizations are increasingly using these routinely collected data.

    Introduction

    With health care spending now accounting for almost 18% of the US gross domestic product, identifying individuals who may benefit from interventions to address potentially avoidable spending has become a central priority for health insurers and health care professionals.1 Current approaches generally focus on prediction or intervention for patients who may have escalating costs on the basis of a single composite value of total spending over short time periods.2,3

    However, many patients experience substantial increases or decreases in spending not captured by these approaches.4-9 For example, Tamang et al10 identified a definable group of low-spending patients in 1 year whose costs bloomed (ie, they became high-spending individuals) in the subsequent year in Denmark. Similarly, Lauffenburger et al11 observed 7 distinct, dynamic patterns of spending over a 1-year period in commercially insured beneficiaries, including individuals whose costs increased rapidly toward the end of the year and another group of high-cost individuals for whom spending decreased.

    These prior studies were conducted over a 1-year period, yet there may also be dynamic patterns of spending over longer periods that may have implications both for whom to outreach for intervention and when to do so.1,12 For example, patients with the same clinical conditions who are hospitalized early during a 12-month period may differ meaningfully from those hospitalized later, although both could be identified as having rising costs.13,14 If these different spending patterns could be predicted using routinely collected data, then the ability to better proactively differentiate patients with increasing or decreasing spending patterns could better target interventions to those who are at greatest need of improved health or cost containment.15 The predictive accuracy of spending may also be higher when evaluating a long-term, compared with a short-term, time horizon as seen for other outcomes.16 Accordingly, we sought to classify patients according to their spending patterns over a 2-year period and to evaluate the ability to predict these spending groups using patient characteristics that are potentially modifiable.

    Methods

    This cohort study was approved by the institutional review board of Brigham and Women’s Hospital and was granted a waiver of informed patient consent because the data are secondary routinely collected data. This study follows reporting requirements of the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.

    Setting and Study Design

    This study used administrative claims data from a 1-million-member sample of Medicare fee-for-service beneficiaries; the original sample included approximately 20 000 beneficiaries in a nationwide quality improvement program and approximately 980 000 randomly selected patients nationally.17 We restricted the cohort to the randomly selected patients and used their paid Medicare Parts A, B, and D patient-level files containing all procedures, physician encounters, hospitalizations, and filled outpatient prescriptions, including amounts paid by the insurer and patient. These data were linked to eligibility data including age, race/ethnicity, gender, and geographic location of residence. Aggregate zip code level data on median income and educational attainment were obtained by linking with 2010 US Census data.

    To be included, patients had to be aged 65 years or older and maintain continuous eligibility from January 1, 2011, to December 31, 2013. The cohort entry date was defined as January 1, 2012, to provide 1 year of prior year of baseline data (year 0) and 2 years of follow-up data (year 1 and year 2) (eFigure 1 in the Supplement).

    Costs

    We measured total monthly health care spending over a 2-year period for each patient by summing the allowed amounts on all inpatient, outpatient, and prescription drug claims. Monthly costs were generated by summing the costs in each month and were standardized by dividing the summed costs by the number of days in that month and then multiplying the result by 30. Costs were then logarithmically transformed to normalize their distribution, after adding $0.01, as frequently done.9,18 Costs were inflated using the Medical Care Component of the Consumer Price Index to 2013 dollars when necessary.

    Predictors

    Using data from Medicare enrollment files and claims, we defined 37 clinically relevant baseline characteristics that were potential predictors of future spending (eTable 1 in the Supplement). These baseline variables were measured during the 12 months prior to the 2-year period during which cost outcomes were evaluated (eFigure 1 in the Supplement). These variables were based on characteristics used in cost modeling in claims data in the peer-reviewed literature and from the quality-cost theoretical framework.6,10,11,15 These sets of predictors have also been shown to have equivalent predictive accuracy of predicting 1-year spending as proprietary risk-adjustment methods.11

    Sociodemographic characteristics included age, race/ethnicity, gender, and community-level variables based on member’s zip code of residence, including median household income and educational attainment. Clinical comorbidities were measured using International Classification of Diseases, Ninth Revision codes (eAppendix and eTable 1 in the Supplement). Each patients’ number of unique prescriptions by generic name (ie, therapeutic complexity), physician office visits, emergency department visits, hospitalizations, unique physicians visited, unique pharmacies used, benefits’ generosity19 (copayments and deductibles or total net payments), and baseline year total costs were also measured. Adherence to long-term medication classes (eg, β-blockers) was measured in the baseline year.11 For each class, we created a supply diary beginning with the first fill for each class in the baseline year. This diary linked all observed fills based on dispensing date and days’ supply; switching was allowed within each class (eg, β-blockers). From this, we calculated the proportion of days covered (PDC) as a mean across classes that the patient filled to yield 1 mean PDC.20,21

    We categorized each predictor by whether it was potentially modifiable, defined by whether it could theoretically be addressed in interventions and by classifications in prior literature.22,23 For example, number of unique physicians could be potentially modifiable, while race/ethnicity is not. In total, we classified 10 predictors as potentially modifiable (Table 1).

    Data-Driven Approach to Modeling Long-term Costs

    We used trajectory modeling to empirically classify spending during follow-up. One advantage is that it allows the data to define the cost outcomes, rather than using arbitrarily selected thresholds.24 It also considers changes in spending over time, rather than aggregating costs over a set time.25 To define spending patterns, we used the previously described SAS procedure Proc Traj, a free add-on.24-26 In brief, group-based trajectory models are an application of finite mixture modeling that identify clusters of individuals with similar outcome patterns over time.24 This modeling approach analyzes longitudinal data by fitting a semiparametric (discrete) mixture model, estimating each individual’s probability of membership in each group, and assigning them to the group according to their highest probability. We modeled longitudinal cost trajectories using calendar month as the time variable, costs in each month, order equal to 4, and a censored-normal distribution (linear between minimum and maximum values).11,24,26

    The models were estimated using a forward classifying approach using 2 to 7 groups, each time investigating model fit using the bayesian information criterion (BIC), whereby a lower BIC indicates better model fit.24 The number of groups investigated was capped at 7 on the basis of groupings observed in prior work.11 In addition to considering BIC, other key considerations in selecting the best-fitting trajectory were the ability to visually interpret separate groups, minimum membership probabilities in each group, and having 5% or more of the sample in each group.26-28

    Statistical Analysis

    After selecting the best fitting number of trajectories, we assessed the ability to predict membership in each 2-year trajectory group using boosted logistic regression, a nonparametric machine learning method. The boosted algorithm is considered one of the best data-mining approaches for prediction problems.16,29 Specifically, the algorithm creates a prediction model by building numerous small regression trees that together provide highly accurate classification.30 The boosting algorithm has several built-in protections from model overfitting, provides automatic variable selection, and describes the relative influence of predictors.31 They also consider all possible interaction terms between potential predictors. We used the gbm package in R with 5-fold cross-validation to identify the optimal number of trees and applied standard default values for tuning parameters to identify the optimal model.16

    For each trajectory group, we estimated 2 separate models. The first included all 37 baseline predictors (model 1) and the second included only the 10 baseline predictors that were considered a priori to be potentially modifiable (model 2). Because of the ability of boosted regression to handle missing data, an indicator of long-term medication use and mean PDC were both included as variables for model 1, and mean PDC was included alone as a variable for model 2.

    To avoid overoptimism bias, we used internal split-sample validation by randomly dividing the full cohort into 2 halves as an initial derivation sample and a validation sample for all models.32 We evaluated each model through discrimination measures.33 Discrimination, the model’s ability to distinguish between patients who do and do not experience the outcome, was measured by the C-statistic, which ranges from 0.5 (noninformative model) to 1.0 (perfect prediction).34,35

    For clinical context, we explored the association between potentially modifiable baseline characteristics and membership in a rising-cost trajectory compared with other trajectory groups that had similar spending at baseline. Specifically, we used multivariable logistic regression to compare membership in the rising-cost trajectory, including each potentially modifiable variable vs other groups. This approach provides insight into baseline factors that may help distinguish patients who become costly later (ie, at least a year later) and potential levers for interventions. We also explored the relative influence of each potentially modifiable predictor from model 2.

    We also evaluated the ability to predict patients who experience rising costs in year 2 defined using a decile-threshold approach (ie, those in the lower 90% of spending in year 1 and then were in the top 10% of spending in year 210) and patients who in trajectory modeling were estimated as belonging to a rising-cost trajectory. For this approach, we estimated each outcome with 2 additional models with boosted regression. Model 3 used all baseline predictors, and model 4 used the potentially modifiable predictors. This approach helps provide insight into whether these spending increases could be accurately predicted using baseline information less temporal to the spending changes, which could ultimately inform intervention design and allow more time for them to be implemented.

    We conducted several sensitivity analyses. Although our primary analysis included zip code sociodemographic characteristics, we also included patients’ region of residence based on enrollment files as a predictor in model 1. Then, we included adherence to each class separately as predictors in models 1 and 2. Finally, we repeated measurements and analyses in a subsequent year (ie, 2012-2014) to confirm generalizability (eAppendix in the Supplement).

    All analyses except for the boosted regression were performed using SAS version 9.4 (SAS Institute); the boosting algorithm was performed using R version 3.4.1 (The R Project for Statistical Computing). Statistical analysis was performed from August 2018 to December 2019.

    Results
    Study Population and Characteristics

    Our cohort consisted of 329 476 patients (eTable 2 in the Supplement). Their mean (SD) age was 76.0 (7.2) years, and 190 346 (57.8%) were women. A 5-group trajectory model best described the 2-year spending patterns (Figure); the model on the log scale is shown in eFigure 2 in the Supplement. The probabilities of group membership are in eTable 3 in the Supplement. Trajectories with alternative numbers of groups and corresponding BICs are shown in eFigure 3 in the Supplement; models with more groups had marginal improvements and were less interpretable.

    This final 5-group model included a minimal-user group (group 1, 37 572 individuals [11.4%]), a low-cost group (group 2, 48 575 individuals [14.7%]), a rising-cost group (group 3, 24 736 individuals [7.5%]), a moderate-cost group (group 4, 83 338 individuals [25.3%]), and a high-cost group (group 5, 135 255 individuals [41.2%]). Baseline characteristics for each group are shown in Table 1.

    Cost Prediction

    Table 2 shows the results of the main prediction models in the validation sample. Four of the 5 2-year spending trajectory groups could be accurately predicted using all baseline predictors, especially the minimal-user (C-statistic: 0.951), low-cost (C-statistic: 0.810), rising-cost (C-statistic: 0.764), and high-cost groups (C-statistic: 0.899). Using potentially modifiable predictors alone, overall predictive ability remained moderate to strong, with the exception of the moderate-cost group (eg, C-statistic: 0.684).

    Table 3 shows potentially modifiable prior-year predictors of being in a rising-cost trajectory compared with the other 3 groups with similar spending in the prior baseline year (mean, $1500-$8000 in year 0). In particular, using more medications (odds ratio [OR]: 0.81; 95% CI, 0.79-0.84) and having more office visits (OR: 0.98; 95% CI, 0.97-0.99) were associated with lower odds of being in the rising-cost trajectory. Seeing more physicians (OR: 1.04; 95% CI, 1.02-1.06) and using tobacco (OR: 1.10; 95% CI, 1.02-1.20) were also factors independently associated with rising-cost membership. eFigure 4 in the Supplement shows the relative influence plots for each group incorporating only potentially modifiable characteristics (model 2). The plot for predicting the rising-cost group in particular indicates that the most predictive potentially modifiable factors were mean medication adherence (relative influence: 33.6), number of office visits (relative influence: 30.3), and number of medications (relative influence: 29.2).

    The results from the models predicting rising costs using a decile-threshold–based method and the trajectory group method are shown in eTable 4 in the Supplement. Patients in the decile-threshold–based approach had higher total 2-year costs on average ($39 737), compared with the trajectory approach ($23 670). The ability to predict decile-threshold–based rising costs (model 4 C-statistic: 0.643) was lower than the trajectory-based approach (model 4 C-statistic: 0.753).

    Sensitivity analyses incorporating region of residence and medication adherence to by class are shown in eTables 5 and 6 in the Supplement. Notably, trajectory group membership was fairly similar across regions, and including these predictors did not meaningfully change C-statistics. Replication in a subsequent year of data resulted in similar patterns and sizes of group membership (eFigure 5 in the Supplement) as well as ability to predict those groups (eTable 7 in the Supplement).

    Discussion

    Using a data-driven approach to classify 2-year health spending for Medicare beneficiaries, we observed 5 distinct spending patterns. Membership in these groups could be accurately predicted, even when using a simple set of potentially modifiable characteristics from claims data. These results suggest that this approach could potentially help inform the design, application, and timing of interventions.

    Prior efforts to predict health care spending have generally focused on a single composite value, such as total yearly costs or a threshold-based measure, such as being in the top 5% of spending, both of which collapse an entire year’s spending into a static variable. These approaches have had modest accuracy; C-statistics for threshold-based outcomes have generally ranged from 0.6 to 0.8.2,5,36,37 Two recently published approaches offer other cluster-based solutions to elucidate subgroups of high-cost patients with some notable successes.38,39 However, these were not applied to evaluate changes in spending, outcomes over more than 1 year, or to elucidate patients with rising costs.38,39 They also focused on Medicare Advantage populations, which can differ from fee-for-service beneficiaries.40,41

    Patients may have dynamic patterns of spending over longer periods of time that can be potentially meaningful, with implications on whom to outreach for intervention as well as when and perhaps how to do so.1,12 For example, Tamang et al10 identified low-spending patients in 1 year whose costs bloomed in the subsequent year using thresholds. When applied to our data, the ability to predict these patients using baseline data alone was modest. Using a data-driven approach, we observed a similarly sized group whose costs later increased that could be predicted somewhat better. One possible explanation could be that the 2-year time horizon itself as an outcome helped discriminate between groups. The ability to proactively differentiate between patients with rising or falling spending patterns using distally measured variables could better target interventions to those who are at greatest need. If successful, using these longer time horizons could allow more time for the implementation of potential interventions.42

    Focusing interventions on patients with rising costs has some theoretical advantages, even though predictive ability was modest. First, the size of the group identified in this study was modest (ie, 7.5%). Of course, it still may be infeasible to intervene upon a group this large, and not all costs may be preventable. Identifying additional segmentation may be necessary, and the use of this approach may be just a starting point. Regardless, the ability to predict better could target interventions to those at greater need, and targeting has been shown to result in better population-level outcomes.43

    When considering potential interventions, a prediction rule comprising the most influential potentially modifiable variables could be applied to better target patients. We observed several clinically actionable characteristics, such as therapeutic complexity (ie, number of medications or office visits), depression, medication adherence, and tobacco use that could be levers for interventions. Filling fewer medications and having fewer office visits were also predictors of the rising-cost trajectory, suggesting that patients may not be getting sufficient care to prevent future escalation of health problems.22 This information could also be used for intervention design to improve care.

    Many health care organizations, insurers, researchers, and policy makers use claims data to identify patients for interventions. Therefore, the ability to better leverage these routinely collected data for cost predictions and interventions with a variety of more nuanced cost-modeling methods holds wide potential. Moreover, using data-driven approaches to classify longer-term spending may hold promise compared with threshold-based approaches alone.

    Limitations

    Several limitations warrant mention. First, we examined trajectories from January to December; patients with incomplete enrollment or other policy start and end dates may differ. Because of differences in how outcomes are categorized, model performance of predicting a cost trajectory (binary outcome) cannot be directly compared with predicting total costs (continuous outcome) or patients defined by the rising-cost decile-threshold approach. The variables included in prediction models may also not be exhaustive, and although we used validated algorithms, they may be insufficiently sensitive. Trajectory modeling also provides predicted group membership; individual members may be assigned to their closest trajectory, but there could be within-group heterogeneity. The high-cost group was large, possibly because of how the model was specified (ie, log costs); one could potentially apply trajectories to identify subgroups within that group for further segmentation. Although group distribution did not differ on the basis of geographical region, the costs themselves were not adjusted for region; similarly, moving could have impacted relative changes in spending, but this was beyond the scope of this study. Furthermore, these results may not be generalizable to other payment systems, such as non–fee-for-service Medicare, Medicaid, or commercially insured beneficiaries. Although these other beneficiaries may have different spending levels, prior work has suggested similar patterns.11 Regardless, the same groups or predictive ability may not apply to other types of beneficiaries, and the results should be studied further to confirm reproducibility.

    Conclusions

    Using trajectory modeling to examine a 2-year time horizon improved the understanding of dynamic patterns, including the identification of a group of patients with progressively increasing costs and a group of patients with consistently high spending. This approach could be potentially adapted by health care organizations to improve cost-containment efforts.

    Back to top
    Article Information

    Accepted for Publication: August 4, 2020.

    Published: October 19, 2020. doi:10.1001/jamanetworkopen.2020.20291

    Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2020 Lauffenburger JC et al. JAMA Network Open.

    Corresponding Author: Julie C. Lauffenburger, PharmD, PhD, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, 1620 Tremont St, Ste 3030, Boston, MA 02120 (jlauffenburger@bwh.harvard.edu).

    Author Contributions: Dr Lauffenburger had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

    Concept and design: Lauffenburger, Choudhry.

    Acquisition, analysis, or interpretation of data: All authors.

    Drafting of the manuscript: Lauffenburger, Mahesri.

    Critical revision of the manuscript for important intellectual content: Mahesri, Choudhry.

    Statistical analysis: Lauffenburger, Mahesri.

    Obtained funding: Lauffenburger.

    Supervision: Lauffenburger, Choudhry.

    Conflict of Interest Disclosures: Dr Choudhry reported receiving unrestricted research funding from Sanofi, AstraZeneca, and Medisafe Inc payable to Brigham and Women’s Hospital. No other disclosures were reported.

    Funding/Support: This work was supported by an unrestricted investigator-initiated grant from the National Institute for Health Care Management to Brigham and Women’s Hospital. Dr Lauffenburger was also supported in part by a National Institutes of Health career development grant (K01 HL 141538). Dr Choudhry was also supported in part by a National Institutes of Health center grant (P30AG064199).

    Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

    References
    1.
    Martin  AB, Hartman  M, Washington  B, Catlin  A; National Health Expenditure Accounts Team.  National health spending: faster growth in 2015 as coverage expands and utilization increases.   Health Aff (Millwood). 2017;36(1):166-176. doi:10.1377/hlthaff.2016.1330PubMedGoogle ScholarCrossref
    2.
    Kuo  RN, Dong  YH, Liu  JP, Chang  CH, Shau  WY, Lai  MS.  Predicting healthcare utilization using a pharmacy-based metric with the WHO’s Anatomic Therapeutic Chemical algorithm.   Med Care. 2011;49(11):1031-1039. doi:10.1097/MLR.0b013e31822ebe11PubMedGoogle ScholarCrossref
    3.
    Perkins  AJ, Kroenke  K, Unützer  J,  et al.  Common comorbidity scales were similar in their ability to predict health care costs and mortality.   J Clin Epidemiol. 2004;57(10):1040-1048. doi:10.1016/j.jclinepi.2004.03.002PubMedGoogle ScholarCrossref
    4.
    Sales  AE, Liu  CF, Sloan  KL,  et al.  Predicting costs of care using a pharmacy-based measure risk adjustment in a veteran population.   Med Care. 2003;41(6):753-760. doi:10.1097/01.MLR.0000069502.75914.DDPubMedGoogle Scholar
    5.
    Fishman  PA, Goodman  MJ, Hornbrook  MC, Meenan  RT, Bachman  DJ, O’Keeffe Rosetti  MC.  Risk adjustment using automated ambulatory pharmacy data: the RxRisk model.   Med Care. 2003;41(1):84-99. doi:10.1097/00005650-200301000-00011PubMedGoogle ScholarCrossref
    6.
    Powers  CA, Meyer  CM, Roebuck  MC, Vaziri  B.  Predictive modeling of total healthcare costs using pharmacy claims data: a comparison of alternative econometric cost modeling techniques.   Med Care. 2005;43(11):1065-1072. doi:10.1097/01.mlr.0000182408.54390.00PubMedGoogle ScholarCrossref
    7.
    Forrest  CB, Lemke  KW, Bodycombe  DP, Weiner  JP.  Medication, diagnostic, and cost information as predictors of high-risk patients in need of care management.   Am J Manag Care. 2009;15(1):41-48.PubMedGoogle Scholar
    8.
    Yarger  S, Rascati  K, Lawson  K, Barner  J, Leslie  R.  Analysis of predictive value of four risk models in Medicaid recipients with chronic obstructive pulmonary disease in Texas.   Clin Ther. 2008;30(Spec No):1051-1057. doi:10.1016/j.clinthera.2008.06.001PubMedGoogle ScholarCrossref
    9.
    Mihaylova  B, Briggs  A, O’Hagan  A, Thompson  SG.  Review of statistical methods for analysing healthcare resources and costs.   Health Econ. 2011;20(8):897-916. doi:10.1002/hec.1653PubMedGoogle ScholarCrossref
    10.
    Tamang  S, Milstein  A, Sørensen  HT,  et al.  Predicting patient ‘cost blooms’ in Denmark: a longitudinal population-based study.   BMJ Open. 2017;7(1):e011580. doi:10.1136/bmjopen-2016-011580PubMedGoogle Scholar
    11.
    Lauffenburger  JC, Franklin  JM, Krumme  AA,  et al.  Longitudinal patterns of spending enhance the ability to predict costly patients: a novel approach to identify patients for cost containment.   Med Care. 2017;55(1):64-73. doi:10.1097/MLR.0000000000000623PubMedGoogle ScholarCrossref
    12.
    Druss  BG, Marcus  SC, Olfson  M, Tanielian  T, Elinson  L, Pincus  HA.  Comparing the national economic burden of five chronic conditions.   Health Aff (Millwood). 2001;20(6):233-241. doi:10.1377/hlthaff.20.6.233PubMedGoogle ScholarCrossref
    13.
    Ziaeian  B, Fonarow  GC.  The prevention of hospital readmissions in heart failure.   Prog Cardiovasc Dis. 2016;58(4):379-385. doi:10.1016/j.pcad.2015.09.004PubMedGoogle ScholarCrossref
    14.
    Barnett  ML, Hsu  J, McWilliams  JM.  Patient characteristics and differences in hospital readmission rates.   JAMA Intern Med. 2015;175(11):1803-1812. doi:10.1001/jamainternmed.2015.4660PubMedGoogle ScholarCrossref
    15.
    Nuckols  TK, Escarce  JJ, Asch  SM.  The effects of quality of care on costs: a conceptual framework.   Milbank Q. 2013;91(2):316-353. doi:10.1111/milq.12015PubMedGoogle ScholarCrossref
    16.
    Franklin  JM, Shrank  WH, Lii  J,  et al.  Observing versus predicting: initial patterns of filling predict long-term adherence more accurately than high-dimensional modeling techniques.   Health Serv Res. 2016;51(1):220-239. doi:10.1111/1475-6773.12310PubMedGoogle ScholarCrossref
    17.
    Krumme  AA, Glynn  RJ, Schneeweiss  S,  et al.  Medication synchronization programs improve adherence to cardiovascular medications and health care use.   Health Aff (Millwood). 2018;37(1):125-133. doi:10.1377/hlthaff.2017.0881PubMedGoogle ScholarCrossref
    18.
    Austin  PC, Ghali  WA, Tu  JV.  A comparison of several regression models for analysing cost of CABG surgery.   Stat Med. 2003;22(17):2799-2815. doi:10.1002/sim.1442PubMedGoogle ScholarCrossref
    19.
    Artz  MB, Hadsall  RS, Schondelmeyer  SW.  Impact of generosity level of outpatient prescription drug coverage on prescription drug events and expenditure among older persons.   Am J Public Health. 2002;92(8):1257-1263. doi:10.2105/AJPH.92.8.1257PubMedGoogle ScholarCrossref
    20.
    Benner  JS, Glynn  RJ, Mogun  H, Neumann  PJ, Weinstein  MC, Avorn  J.  Long-term persistence in use of statin therapy in elderly patients.   JAMA. 2002;288(4):455-461. doi:10.1001/jama.288.4.455PubMedGoogle ScholarCrossref
    21.
    Choudhry  NK, Shrank  WH, Levin  RL,  et al.  Measuring concurrent adherence to multiple related medications.   Am J Manag Care. 2009;15(7):457-464.PubMedGoogle Scholar
    22.
    Goetzel  RZ, Pei  X, Tabrizi  MJ,  et al.  Ten modifiable health risk factors are linked to more than one-fifth of employer-employee health care spending.   Health Aff (Millwood). 2012;31(11):2474-2484. doi:10.1377/hlthaff.2011.0819PubMedGoogle ScholarCrossref
    23.
    Yusuf  S, Hawken  S, Ounpuu  S,  et al; INTERHEART Study Investigators.  Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study.   Lancet. 2004;364(9438):937-952. doi:10.1016/S0140-6736(04)17018-9PubMedGoogle ScholarCrossref
    24.
    Jones  BL, Nagin  DS.  Advances in group-based trajectory modeling and a SAS procedure for estimating them.   Sociol Methods Res. 2007;35(4):542-571. doi:10.1177/0049124106292364Google ScholarCrossref
    25.
    Franklin  JM, Shrank  WH, Pakes  J,  et al.  Group-based trajectory models: a new approach to classifying and predicting long-term medication adherence.   Med Care. 2013;51(9):789-796. doi:10.1097/MLR.0b013e3182984c1fPubMedGoogle ScholarCrossref
    26.
    Jones  BL, Nagin  DS, Roeder  K.  A SAS procedure based on mixture models for estimating developmental trajectories.   Sociol Methods Res. 2001;29:374-393. doi:10.1177/0049124101029003005Google ScholarCrossref
    27.
    Li  Y, Zhou  H, Cai  B,  et al.  Group-based trajectory modeling to assess adherence to biologics among patients with psoriasis.   Clinicoecon Outcomes Res. 2014;6:197-208. doi:10.2147/CEOR.S59339PubMedGoogle ScholarCrossref
    28.
    Franklin  JM, Krumme  AA, Tong  AY,  et al.  Association between trajectories of statin adherence and subsequent cardiovascular events.   Pharmacoepidemiol Drug Saf. 2015;24(10):1105-1113. doi:10.1002/pds.3787PubMedGoogle ScholarCrossref
    29.
    Koh  HC, Tan  G.  Data mining applications in healthcare.   J Healthc Inf Manag. 2005;19(2):64-72.PubMedGoogle Scholar
    30.
    Robinson  JW.  Regression tree boosting to adjust health care cost predictions for diagnostic mix.   Health Serv Res. 2008;43(2):755-772. doi:10.1111/j.1475-6773.2007.00761.xPubMedGoogle ScholarCrossref
    31.
    Varian  HR.  Big data: new tricks for econometrics.   J Econ Perspect. 2014;28(2):3-28. doi:10.1257/jep.28.2.3Google ScholarCrossref
    32.
    Steyerberg  EW, Harrell  FE  Jr, Borsboom  GJ, Eijkemans  MJ, Vergouwe  Y, Habbema  JD.  Internal validation of predictive models: efficiency of some procedures for logistic regression analysis.   J Clin Epidemiol. 2001;54(8):774-781. doi:10.1016/S0895-4356(01)00341-9PubMedGoogle ScholarCrossref
    33.
    Waljee  AK, Higgins  PD, Singal  AG.  A primer on predictive models.   Clin Transl Gastroenterol. 2014;5:e44. doi:10.1038/ctg.2013.19PubMedGoogle Scholar
    34.
    Steyerberg  EW, Vickers  AJ, Cook  NR,  et al.  Assessing the performance of prediction models: a framework for traditional and novel measures.   Epidemiology. 2010;21(1):128-138. doi:10.1097/EDE.0b013e3181c30fb2PubMedGoogle ScholarCrossref
    35.
    Cook  NR.  Use and misuse of the receiver operating characteristic curve in risk prediction.   Circulation. 2007;115(7):928-935. doi:10.1161/CIRCULATIONAHA.106.672402PubMedGoogle ScholarCrossref
    36.
    Liu  CF, Sales  AE, Sharp  ND,  et al.  Case-mix adjusting performance measures in a veteran population: pharmacy- and diagnosis-based approaches.   Health Serv Res. 2003;38(5):1319-1337. doi:10.1111/1475-6773.00179PubMedGoogle ScholarCrossref
    37.
    Zhao  Y, Ash  AS, Ellis  RP,  et al.  Predicting pharmacy costs and other medical costs using diagnoses and drug claims.   Med Care. 2005;43(1):34-43.PubMedGoogle Scholar
    38.
    Yan  J, Linn  KA, Powers  BW,  et al.  Applying machine learning algorithms to segment high-cost patient populations.   J Gen Intern Med. 2019;34(2):211-217. doi:10.1007/s11606-018-4760-8PubMedGoogle ScholarCrossref
    39.
    Powers  BW, Yan  J, Zhu  J,  et al.  Subgroups of high-cost Medicare Advantage patients: an observational study.   J Gen Intern Med. 2019;34(2):218-225. doi:10.1007/s11606-018-4759-1PubMedGoogle ScholarCrossref
    40.
    Powell  SK.  Choosing Medicare Advantage plans versus traditional fee-for-service: is this change the tipping point?   Prof Case Manag. 2019;24(1):1-3. doi:10.1097/NCM.0000000000000338PubMedGoogle ScholarCrossref
    41.
    Raetzman  SO, Hines  AL, Barrett  ML, Karaca  Z. Hospital stays in Medicare Advantage Plans versus the traditional Medicare fee-for-service program, 2013: statistical brief #198. Published December 2015. Accessed August 5, 2019. https://www.hcup-us.ahrq.gov/reports/statbriefs/sb198-Hospital-Stays-Medicare-Advantage-Versus-Traditional-Medicare.jsp
    42.
    Stadhouders  N, Kruse  F, Tanke  M, Koolman  X, Jeurissen  P.  Effective healthcare cost-containment policies: a systematic review.   Health Policy. 2019;123(1):71-79. doi:10.1016/j.healthpol.2018.10.015PubMedGoogle ScholarCrossref
    43.
    Lauffenburger  JC, Lewey  J, Jan  S,  et al.  Effectiveness of targeted insulin-adherence interventions for glycemic control using predictive analytics among patients with type 2 diabetes: a randomized clinical trial.   JAMA Netw Open. 2019;2(3):e190657. doi:10.1001/jamanetworkopen.2019.0657PubMedGoogle Scholar
    ×