The Effect of Including Benchmark Prevalence Data of Common Imaging Findings in Spine Image Reports on Health Care Utilization Among Adults Undergoing Spine Imaging

Key Points Question What is the impact of including benchmark prevalence data of common findings in reports of spinal imaging ordered by primary care clinicians? Findings In this randomized clinical trial that included 250 401 adults, no overall decrease in subsequent spine-related health care utilization after the intervention was observed. However, there was a significant decrease in opioid prescriptions at 1 year in the intervention group compared with the control group. Meaning The findings of this study suggest that including epidemiological benchmarks on spinal imaging reports has little impact on subsequent spine-related utilization overall but may reduce subsequent opioid prescriptions.

Algorithm finalized for electronic medical record extraction and tested at all sites This will continue the work started as UH2 Milestone #4. Each site will require a customized algorithm-hence the need for site-specific development and testing Protocol paper submitted for publication We will prepare a manuscript describing our study protocol and procedures.

Year 3 Randomized intervention implemented at 80% of clinical sites
Planned staggered implementation using stepped wedge design will require close monitoring of progress.
Medical record extraction complete for 12mo outcomes on randomization waves 1-2 Data extraction ongoing for duration of project for 12 and 24mo time-points.
Comparison of abstraction methods for radiology reports (natural language processing vs. Amazon Turk)

Year 4 Intervention implementation completed
All clinical sites randomized to intervention by this time.
Comments!(e.g.!population!characteristics,!potential!problems!using!data):! ! displays the site-specific strata definitions and size. In total, we will randomize 110 clinics with 1,824 PCPs as units of observation within those clinics. Note that we have chosen to use site-specific definitions for the size of the clinic with the goal of having balance of clinic size within each site. In addition, by balancing randomization on size we will be sure to have comparable time on control and intervention for each clinic size strata.

LIRE literature search strategy for Intervention Text
In the original project application, we assumed 128 clinics and 1,898 PCPs would participate in the LIRE project. After input from the Collaboratory Biostatistics Core, we excluded all clinics with a single PCP (n=18) from the primary study and statistical analysis and will only include clinics with 2 or more PCPs.
Primary Outcome: We have devoted substantial effort towards developing and refining the primary outcome measure: a summary back-specific relative value unit (RVU). The back-specific RVU is a composite measure of spine intervention intensity that combines the overall intensity of resource utilization for back pain care into a single metric.
To develop the composite RVU measure, we used data from our large cohort of patients with back pain who comprise the Back pain Outcomes using Longitudinal Data (BOLD) Project, Agency for Healthcare Research and Quality (AHRQ)-funded study. During our work with the BOLD Project we developed algorithms to abstract electronic medical record (EMR) data across three health systems (two of which overlap with LIRE): Kaiser Northern California, Henry Ford Health System and Harvard Vanguard/Harvard Pilgrim. For the 5,239 BOLD cohort participants, we obtained extensive EMR data on pharmacy records, healthcare utilization (CPT codes), diagnoses and provider visits (ICD-9 codes), and inpatient hospitalization data.
Using the Medicare Physician Fee Schedule (http://www.cms.gov/) we generated and tested a mapping algorithm to assign more than 10,000 unique CPT codes to RVUs. A sample of RVUs from the 2012 CMS file is shown in Table 2. Using the BOLD cohort EMR data, we developed and tested an algorithm for aggregating individual RVUs across procedures over a time interval for a given patient, as well as across primary care providers or clinics.
To obtain a spine-related summary RVU from CPT and ICD-9 codes, we used an existing algorithm  We are currently preparing a manuscript describing this development work as well as a manuscript that directly influences and informs our LIRE UH3 proposal. Using BOLD cohort data, we identified a subset of patients who have had an early lumbar image (MRI/CT or plain film) following an office visit for back pain. Our BOLD cohort manuscript (in preparation) compares the one-year cumulative RVU of early-imaged patients to carefully matched BOLD cohort controls who did not have an early lumbar image. Preliminary results indicate a substantial downstream increase in healthcare utilization for patients who received an early image compared to propensity score matched controls. Patients who underwent a lumbar MRI or CT had a mean one-year RVU of 150 +/-410, versus 120 +/-450 for those who had an early plain film, versus 43 +/-120 for carefully matched controls. Mapping the relative increases of utilization of nearly 80 and 110 RVUs for the plain film and advanced imaging modalities to the example codes shown in Table 2, we see that imaged patients undergo substantially more procedures. Our expectation for the UH3 project is that the insertion of normative prevalence data into lumbar imaging reports will reduce subsequent inappropriate healthcare utilization.

Secondary Outcomes:
In addition to back-specific RVU, important secondary outcomes will be obtained and derived using electronic medical record data pulls and include: an indicator of opioid prescriptions written within 30 and 90 days after the index image (Aim 1b); subsequent cross-sectional re-imaging within 90 days and 12 months (Aim 1c); and medical costs (Aim 1d). In the BOLD project, we developed mapping algorithms based upon the United States Food and Drug Administration National Drug Codes (NDC) 5 that generate an indicator of whether or not an individual pharmacy record is an opioid analgesic. Similarly, we have enumerated and categorized a listing of CPT codes that indicate cross-sectional lumbar imaging (CT, or MRI).

General Analytic Strategy:
To evaluate the effectiveness of inserting epidemiologic evidence into an imaging report we will use longitudinal regression methods such as linear mixed effects models (LMMs) or generalized linear mixed models (GLMMs) for all primary and secondary outcome measures. Mixed models provide an efficient method for analysis of longitudinal or multilevel data and will be the basis of our primary analysis approach. However, correct model specification is required to ensure valid results when using LMMs or GLMMs and we will therefore use robust standard errors for our primary analysis. Therefore, we are effectively adopting a "working" correlation structure through the specification of flexible multilevel models (LMM or GLMM) but will rely on non-parametrically valid robust standard errors for inference where we cluster on the clinic. Secondary analysis will directly use generalized estimating equations (GEE) adopting simple exchangeable correlation models at the clinic level to determine whether conclusions appear sensitive to model specification.
In each analysis we will also consider a 'washout period' in the three months prior to the intervention being activated at a clinic, as determined by the randomization schedule. The rationale for a washout period is to reduce or eliminate within-provider cross-contamination of patient outcomes and utilization in the transition period between control and intervention. Including a washout period reduces the risk of having a patient initially treated in the control time period return to their primary care provider for subsequent care after the primary care provider has been exposed to the intervention through other patients. This reduces the potential bias due to within-provider cross-contamination of outcomes on the estimated inte rvention effect.
Primary Analysis: The primary longitudinal model for back pain specific RVUs will use a time-varying intervention status indicator Statuskt (0 = control, 1 = intervention, for clinic k at time t). Use of the timedependent intervention status indicator permits both within-clinic contrasts that inform intervention effects (postversus pre-intervention) as well as contrasts across clinics with different intervention statuses within each time period. The specific regression model will adopt a functional form given below, with fixed effects for time (linear), age (18-39, 40-59, 60+, using two dummy variables), imaging modality type (plain film, CT, MRI using two dummy variables), and clinic size (small, medium, large, using two dummy variables), and site (Group Health Cooperative, Henry Ford, Kaiser Permanente, Mayo Clinic, using three dummy variables) in addition to random effects for provider, clinic, and intervention status: We will collect the outcome measure Yijk on patient i (i= 1,2,…,nj) under primary care provider j (j =1,2,…,nk) enrolled in time period t (t = 0,1,2,… ,5) in order to evaluate the overall effect of the intervention at the level of the clinic k (k = 1,2,…,110). Note that we will collect a single outcome measure for each subject recording the total utilization (RVU) over the 12 months after the index imaging event. Given that the random effects structure may contain additional elements (see below) we will use a robust standard error to test the null hypothesis that l0 = 0. For example, using SAS PROC MIXED we can use the "empirical" option in order to obtain robust standard errors. Alternatively, use of the jackknife (at the clinic level) provides a robust standard error estimate (if using R and lmer) that is simple to compute.

Key Model Parameters: The primary
parameter of interest is l0, which represents the average effect of the intervention adjusting for temporal trends (Timet), clinic characteristics (Sitek, Sizek), and individual covariates (Ageijk, Modalityijk). In order to interpret the random effects structure we focus on clinic level means removing covariate effects where we have: adjusted mean at clinic k for times prior to intervention = b0 + bk,0; and the adjusted mean at clinic k for times after start of the intervention = b0 + bk,0 + l0 + bk,1. For clinic-specific means we average over both providers (ajk,0) and patients (eijk). Using this representation we interpret b0 as the pre-intervention adjusted overall mean outcome averaging across all clinics, and bk,0 is the difference between that adjusted overall mean and the pre-intervention (baseline) mean for clinic k. The variance, var(bk,0), is a measure of the variation in the baseline mean outcome across clinics. The change in the adjusted mean outcome for clinic k is given by: (post-intervention adjusted mean) -(pre-intervention adjusted mean) = (b0 + bk,0 + l0 + bk,1) -(b0 + bk,0) = l0 + bk,1. Here l0 represents the average intervention effect across all clinics and bk,1 represents the difference between that average intervention effect and the intervention effect for clinic k. The variance, var(bk,1), is a measure of the variation in the change associated with intervention across clinics, or a measure of the heterogeneity of the intervention effect.
Our primary regression model acknowledges the fundamental multilevel structure of individual-level data collected in health care systems with patients nested within providers, and providers nested within clinics. Although, the basic intervention contrast is the pre-post change associated with the initiation of intervention for each clinic, we do not propose using clinic-level summary measures for inference since the weighting of both patients and providers is not simple when heterogeneity of cluster sizes exists (e.g. PCPs per clinic, and patients per PCP). A proper multilevel model allows for optimal weighting based on the estimated variance components (e.g. Gauss-Markov) and yields both an efficient summary of the overall intervention effect, as well an estimate of the variability in the magnitude of effect across clinics. However, we will not rely on the covariance model being correct for statistical inference and will use a robust (empirical) standard error. With greater than 100 total clusters (clinics) we expect valid inference and proper test size and do not anticipate needing to perform any correction such as the jackknife 7 (recommended when the number of clusters is small).
In our analysis we effectively assume that individual patients are nested within a single provider. However, in practice a patient may change providers during the follow-up year over which the primary outcome is captured. However, our basic mixed model covariance structure will simply use the assigned primary provider at the index imaging time. Therefore, we do not rely on model-based standard errors since the covariance structure may not match the true within-clinic covariance structure. We will use robust standard errors clustering at the clinic level, and therefore our analysis is valid even if there are changes in patient provider leading to an incorrectly specified covariance structure. Robust standard errors remain valid when a covariance model is not correctly specified. Furthermore, key secondary analysis of the primary outcome will directly use GEE and only cluster at the clinic level and provider level linkages are not used (nor needed) for simple GEE analysis.

Secondary Analyses of Primary Outcome:
We will conduct additional secondary analyses that evaluate the sensitivity of the multilevel model to the assumed basic random effects structure. We have included in the primary model multilevel random intercepts and a random effect for the clinic-level intervention. However, we will expand the random effects structure to also permit random slopes on time for both clinics and providers. Given the relatively short duration of follow-up with only six (6) total measurement times we do not expect strong heterogeneity across providers or clinics in cluster-specific temporal trends. Figure 2 shows an example of hypothetical data series for two clinics (assuming aggregation of providers to a clinic summary) and illustrates both the staggering of the crossover time and the potential to observe clinic-specific intervention effects. This figure also illustrates the fact that separating random effects of time (linear) from random effects of intervention would be difficult since time and intervention status are correlated give the unidirectional crossover from control to intervention. In addition, we will use GEE as a covariance model robust inference method and therefore can produce valid point estimates and confidence intervals without relying on correct covariance specification. Details of model choice and comparison of alternative models for longitudinal cluster level crossover trials is presented in French and Heagerty (2008) and comparison of alternative approaches is recommended.

Models for time and intervention effect:
Our primary analysis adopts a linear adjustment for calendar time in order to remove any large-scale temporal trends that may bias estimates of intervention effects. However, our basic regression model assumes a common (adjusted) mean for all times after the initiation of intervention. In practice there may be a delay in the impact of intervention so alternative models will be considered that incorporate a delayed and/or gradual effect of intervention. For example, the basic coding of the time-dependent variable Statuskt takes the value 0 pre-intervention and the value 1 post-intervention. Delay in the impact of intervention can be accommodated using alternatives such as: 0 pre-intervention; 0.5 for quarter 1 after intervention; and 1 for all other post-intervention quarters. Such a modified model would allow full impact of the intervention to require two quarters of exposure. We will conduct secondary analyses to explore alternative models for the accumulation or delay of the intervention effect.

Secondary Outcome Analysis:
We will also analyze the impact of intervention on the rate of opioid prescription using Generalized Linear Mixed Models (GLMMs). For Aim 1b let Yijk =1 if opioids were prescribed within a given timeframe (e.g. 30 days or 90 days) to patient i (i= 1,2,…,nj) seen by primary care provider j (j =1,2,…,nk) within clinic k (k = 1,2,…,110). Analysis for this outcome will use a logistic mixed model given as: logit(pijk) = b0 + b1 × Timet + b2 T × Ageijk + b3 T × Modalityijk + b4 T × Sizek + b5 T × Sitek + l0 × Statuskt + mean model bk,0 + bk,1 × Statuskt + clinic random effects ajk,0 provider random effects where pijk denotes the probability that Yijk =1. Our secondary outcome analysis parallels the primary and will be based on a natural multilevel mixed model, with additional robust secondary analysis provided by GEE. For Aim 1c we will use Yijk =1 if CT or MR imaging occurs within a specified timeframe (e.g. 90 days or 12 months) after the index imaging event.
Medical costs (Aim 1d): Spine-related costs of care will be estimated using two approaches. First, we will use the spine-related RVU calculated in Aim 1a and estimate clinical-level, spine-intervention expenditures using the annual Medicare-determined payment amount per RVU (e.g., CY2013 = $34.023 per RVU). (reference: http://www.cms.gov/Outreach-and-Education/Medicare-Learning-Network-LN/MLNProducts/downloads/medcrephysfeeschedfctsht.pdf ) Second, as a proxy for costs of spine care, we will use a standard set of reimbursement amounts, i.e., CMSbased payments, and estimate clinic-level spine-related aggregate expenditures by applying CPT-based payment amounts to specific spine-intervention events (e.g., imaging, office visits, procedures, other). We will present monthly and annual means, medians, and ranges of clinic-level cost estimates, prior to and subsequent to implementing the epidemiological intervention. We will assess the level of right-skewness in the expenditure estimates and use t-tests to compare arithmetic means for clinic-level expenditures. In the case of considerable skewness, we will test for differences in logarithmically transformed mean clinic-level expenditures (before and after implementing intervention). We will also describe categories of prescriptions ordered, when available in the electronic medical records for a health system, and estimate costs for prescribed spine-related medications Analysis for Aim 2: The hypothesis of Aim 2 is that there will be a differential effect of the intervention according to the imaging modality used. In order to test this hypothesis we will analyze patient-level data according to the appropriate LMM or GLMM given above, but including the interactions between Modalityijk indicators (modeled using two indicator variables coding CT and MR, with plain film as the reference) and Statuskt. A test of the interaction terms (2 degree of freedom Wald test) will be used to test the null hypothesis that the effect of the intervention does not vary according the imaging modality.
Analysis for Aim 3: The hypothesis of Aim 3 is that there will be a differential effect of the intervention according to the results that are found in the imaging report. We will use an additional variable, ImageFindingijk, that takes the value 1 if a significant image finding is present, and 0 otherwise (see detail regarding variable specification in protocol). We will test the null hypothesis that the interaction between ImageFindingijk and Statuskt is zero using a Wald test.
Power Calculations: Our UH2 efforts with respect to sample size and statistical power focused on two key items. First, an important aim of our UH2 Working Group 2 was to obtain an accurate clinic and provider count for each health system. We will now randomize n=110 clinics (1,824 PCPs), which is slightly lower than the n=128 clinics (1,898 PCPs) assumed in our initial project application. However, the majority of the clinics that were dropped were those with only one PCP and therefore would not have contributed much information to the analysis. Second, in our Working Group 3 we sought to develop and characterize a composite RVU summary to be used as the primary outcome measure in this study. In our UH2 project application we discussed statistical power in the context of an important secondary outcome measure, a reduction in subsequent opioid prescription rates. We now present statistical power for the primary outcome measure using data from the BOLD Registry to inform key design parameter estimates.
To our knowledge, off-the-shelf calculators do not exist that would adequately characterize statistical power for a stepped wedge cluster randomized trial with a varying number of sampling units between clusters. We therefore utilized simulation methods to generate and analyze data that closely mimics the design characteristics we anticipate for this study. With a simulation approach, we were able to include estimates of both patient and cliniclevel variability and implement the proposed primary analysis methods: random intercept linear mixed effects models for RVU outcomes; and generalized linear mixed models for opioid prescription rates. All simulations