A Novel Machine Learning Algorithm for Creating Risk-Adjusted Payment Formulas

Key Points Question Can a machine learning algorithm be used to produce risk adjustment models that respect clinical logic, address upcoding incentives, and predict costs better, especially for uncommon diseases, than the US Department of Health and Human Services (HHS) 2020 Affordable Care Act Marketplace hierarchical condition category (HCC) model? Findings In this economic evaluation, the Diagnostic Cost Group (DCG) machine learning algorithm used clinician-specified hierarchies to predict top-coded total annual health care spending. The DCG algorithm achieved a higher R2 value despite excluding vague and gameable diagnoses and dramatically reduced HHS-HCC underpayments for rare conditions. Meaning In this study, the DCG algorithm addressed gaming concerns and predicted costs better than the HHS-HCC model.

Previous work estimated DXI formula (1) using the same 59 million enrollee sample and found 2,282 statistically significant parameters with no evidence of overfitting.Three problems were identified in using DXIs for payment, benchmarking, or performance assessment.First, 373 of the DXI and CCSR parameters (12.2%) in the topcoded model of concurrent spending were negative, which is unattractive for practical payment models because it predicts negative spending for many individuals and lacks face validity.Second the large parameter (low parsimony) models were unattractive for re-estimation on smaller size samples that lack the power to estimate coefficients for relatively rare DXIs.ML algorithms popular in the literature 1,3 reveal that more sparsely parameterized models are often superior in smaller sample sizes.A third problem is that an additive DXI formula does not address coding incentives, and will reward coding proliferation by increasing payments whenever more diagnoses are added, even when newly added codes are already implied by diagnoses already present.Coding "Diabetes, Unspecified Type" should not be recognized when a more specific diagnosis such as "Diabetes, Type 1" is also available.

The DXI Diagnostic Cost Group (DCG) Algorithm
The DCG algorithm presented here differs from the original Ash et al. 4 DCG formulation and its CMS implementation 5 in 2000 in five respects.First, we flexibly screen out DXIs considered vague or highly gameability as captured by their ATI scores.Second, we allow for multiple hierarchies for each DXI.Third, we cluster diagnostic items according to the similarity of their regression coefficients rather than their average costs.Fourth, we use specified statistical criteria for grouping DXIs into DCGs. 6Fifth, the estimation is algorithmic, not manual, enabling ML models to be efficiently estimated for diverse outcomes in new samples for a variety of purposes in a reasonable amount of time.Note that DXIs or CCSRs that were perfectly colinear with sets of other variables were assigned an ATI score of 6 and hence excluded automatically.
Because age*gender variables enter additively in the base model and also needed to be constrained to be non-negative, two hierarchies were also created for female and male and included in the estimation algorithm analogously to other DXIs.The final DCG model specification can be written compactly as in (2).Within each hierarchy h we are creating DCG groups of DXIs, indexed by g, where the highest coefficient DCG within h is  , and higher indexed DCGs having smaller coefficients.Within each hierarchy a person in DCG g' cannot also be assigned to DCG g" when g' < g".
For ease of interpretation, DCGs are given informative names reflecting the ICD10 disease chapter and hierarchy, and numbered sequentially with 1 being the highest cost DCG.
The DCG model is estimated by iteratively choosing sets of DXIs to assign to high coefficient DCGs before choosing lower coefficient DCGs overall and within each HIER, with iterations continuing until stopping rules are satisfied.At the first iteration, only DXIs with incremental costs above $50,000 are eligible to be assigned to DCGs, a lower bound that is successively lowered as lower coefficient DCGs are identified.No DXI can end up in more than one HEIR, negative coefficients are not allowed, and within a hierarchy a given person can only be assigned one DCG.Iterations continue creating DCGs until no further DCGs satisfying the stopping rules can be created.
(This required 14 iterations in the Base case specification) The DCG ML algorithm then continues a second type of iteration intended to eliminate statistically insignificant, negative, and non-monotonic DCGs within each HIER group.This second type of iteration proceeds very speedily since once the full set of cross products of all DCGs and the dependent variables is created, these steps can be done by simply imposing constraints on the DCG coefficients.Specifically, the model first performs a backwards stepwise weighted least squares (WLS) regression that dropped variable with p values greater 0.0001.The model then constrains any negative DCG coefficients β hg to be zero, and finally, if within any hierarchy h, β hg' < β hg" when g' < g", then we restrict these two coefficients on these two DCGs to be the same.These three steps were performed repeated until all three desirable features of coefficients were satisfied (four iterations in the Base model specification) Because imposing monotonicity has no meaning in strictly additive models, and requires extra processing, we did not impose monotonicity on all models used for sensitivity analysis, but instead did it once at the end of all estimation for our Base model specification.

Stopping Rules for DCG Groups
The number of DCGs created within each hierarchy is not specified a priori, but instead is controlled by six modeling parameters.They (with their Base case settings) are: 1.The minimum sample size for each DCG (2,000), 2. The maximum percent difference allowed between the current weighted average coefficients in a DCG and the next coefficient considered (30%), 3. The statistical significance required to assign a DXI to a DCG (p<0.001) 4. Whether to assign DXI with negative coefficient weights to DCG (no), 5.The initial floor to the DCG average used ($50,000), 6.The decrease in the floor in each iteration ($10,000).Each of these six parameter was varied for sensitivity analysis.
Once the above stopping DCG rules were all satisfied the stopping rules, all remaining DXIs were dropped, and a backwards stepwise regression with an even tighter inclusion criteria (p<0.0001) was estimated to exclude less significant DCGs and negative coefficient DCGs.This final step was repeated until all included DCGs have non-negative coefficients.
When an enrollee has a DXI that is assigned a DCG in any hierarchies, all of their other DXIs in that hierarchy are reset to zero.This also happens when an assigned DXI is assign to other hierarchies.An implication of this process is that sample sizes in some DXIs are very low when they are added to a DCG or excluded from the model.The inclusion of very rare diagnostic information in model predictions is unique to our algorithm and not a feature of any other payment algorithm with which we are familiar.$16,452  ($16,250, $16,655) eFigure 2 (continued): DCG Model Coefficients for Sets of Circulatory (CIR) DXIs, DCG Version 1.1 Notes: WLS is weighted least squares, CCSR is the Clinical Classifications Software Refined model, DXI is the Diagnostic Items model, DCG is the Diagnostic Cost Group, and CIR is the Circulatory chapter.Plot whiskers correspond to 95% confidence intervals.Chapter assignment is based on the hierarchy mappings.eFigure3. DCG Model Coefficients for Sets of Endocrine, Nutritional, and Metabolic DXIs, DCG Version 1.1 eFigure 3 (continued): DCG Model Coefficients for Sets of Endocrine, Nutritional, and Metabolic (END) DXIs, DCG version 1.1 Notes: WLS is weighted least squares; CCSR is the Clinical Classifications Software Refined model; DXI is the Diagnostic Items model; DCG is the Diagnostic Cost Group; and END is the Endocrine, nutritional, and metabolic chapter.Plot whiskers correspond to 95% confidence intervals.Chapter assignment is based on the hierarchy mappings.Principle 1: Diagnostic categories should be clinically meaningful.Principle 2: Diagnostic categories should predict medical (including drug) expenditures.Principle 3: Diagnostic categories that will affect payments should have adequate sample sizes to permit accurate and stable estimates of expenditures.Principle 4: In creating an individual's clinical profile, hierarchies should be used to characterize the person's illness level within each disease process, while the effects of unrelated disease processes accumulate.Principle 5: The diagnostic classification should encourage specific coding.Principle 6: The diagnostic classification should not reward coding proliferation.Principle 7: Providers should not be penalized for recording additional diagnoses (monotonicity).Principle 8: The classification system should be internally consistent (transitive).Principle 9: The diagnostic classification should assign all ICD-10-CM codes (exhaustive classification).Principle 10: Discretionary diagnostic categories should be excluded from payment models.Two new principles were added in this project: Principle 11: Models should do well even on sets of rare diagnoses Principle 12: Parsimonious models with fewer parameters are preferred.

eTable 2 .
Scale Used by Clinicians to Assign Appropriateness to Include Scores We instructed the clinical review panels to use the following definitions for assigning ATI scores to each DXI.0 => No concerns about using for payment 1 => Trivial concerns … 2 => Minor concerns … 3 => Meaningful concerns … 4 => Serious concerns … 5 => Major concerns: avoid using for payment Panelists were informed that 4 might be a plausible threshold for exclusion from the model.