Comparative Effectiveness of Microdecompression Alone vs Decompression Plus Instrumented Fusion in Lumbar Degenerative Spondylolisthesis

This comparative effectiveness study evaluates whether the effectiveness of microdecompression alone is noninferior to decompression with instrumented fusion in a real-world setting among patients with lumbar degenerative spondylolisthesis.


Introduction
Degenerative spondylolisthesis is a forward slip of one vertebra relative to the vertebra below, occurring in a spondylotic and narrowed spinal segment (ie, lumbar spinal stenosis). 1 Typical symptoms are low back pain and radiating pain into the buttocks and the legs, especially when standing and walking. The standard surgical procedure has been to decompress the stenosis. 2 In the early 1990s, 2 landmark studies 3,4 recommended additional fusion surgery. In the following decades, the rate and complexity of fusion procedures increased dramatically. 5,6 The fusion rate in the United States more than doubled from 2005 to 2014, and degenerative spondylolisthesis accounted for most fusions. 7 In 2015, the hospital costs for elective lumbar degenerative fusions exceeded $10 billion, the highest aggregate costs of any surgical procedure in the United States. 8 Adding instrumented fusion to decompression has been supported by 1 randomized clinical trial (RCT) 9 and clinical guidelines and reviews. [10][11][12][13] Another RCT, 14 registry studies, 15,16 and systematic reviews 17,18 have recommended decompression alone.
The role of fusion surgery is controversial, [19][20][21] and a large surgical practice variation between hospitals is reported. In 2011 to 2013, approximately 50% of patients with degenerative spondylolisthesis in Norway and Sweden underwent fusion procedures 22 compared with 90% to 95% in other countries. 6,7,22,23 Industrial boosting with differences in industrial encouragement and lucrative financial reimbursement might explain some of the differences in practice. 24,25 Only a few small-sample studies [26][27][28][29] have evaluated the performance of less invasive methods of decompression alone, preserving potentially stabilizing structures of the spine. In this study from the Norwegian Registry for Spine Surgery (NORSpine), we hypothesized that in real-world daily clinical practice, microdecompression alone works as well as decompression with instrumented fusion.

Study Setting, Data Collection, and Patient Selection
The reporting and interpretation of this comparative effectiveness study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) 30 recommendations and the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) reporting guideline for cohort studies. 31 Relative effectiveness was studied using prospectively collected data from NORSpine, a national comprehensive registry for quality control and research. According to NORSpine's annual report for 2015, the coverage rate for lumbar spine surgery was 93% at the hospital level and 63% at the individual level. The registry receives no funding from industry. At hospital admission (baseline), the patients completed questionnaires, which included patient-reported outcome measures and questions about demographics and lifestyle. The surgeons recorded surgical parameters such as diagnosis, treatment, and occurrence of complications. At the 3-and 12-month follow-up, NORSpine's central unit sent questionnaires by mail to the patients without the involvement of the surgical units.
Written informed consent was obtained from the participants preoperatively, and the Norwegian Committee for Medical and Health Research Ethics Central approved the study.
A total of 1376 patients undergoing surgical procedures for lumbar spinal stenosis with degenerative spondylolisthesis from September 19, 2007, to December 21, 2015, at 35 orthopedic and neurosurgical departments were screened for eligibility. Patients were excluded if they had undergone a previous procedure at the index level(s), a procedure in more than 2 levels, or a procedure with an interspinous device or with an anterior approach. Patients were included regardless of missing or incomplete follow-up data.
The primary and secondary outcomes, the criterion for noninferiority, and the statistical methods were defined before statistical analysis. 32 Data were analyzed from March 20 to October 30, 2018.

Treatment Groups
Patients who underwent microdecompression alone had preservation of the midline (ie, the spinous process and the interspinous ligaments), and one of the following techniques was used: (1) unilateral laminotomy, (2) bilateral laminotomy, or (3) unilateral laminotomy and crossover decompression.
Magnifying devices (microscopes or loupes) were used. Patients who underwent instrumented fusion had a decompression with or without preservation of the midline structures and with or without visual enhancement and additional posterior pedicle screw instrumentation with or without an intervertebral cage.

Outcome Measures
The Oswestry Disability Index (ODI), version 2.0 33,34 is a pain-related disability score of 10 items ranging from 0 (no impairment) to 100 (maximum impairment). The primary outcome was a reduction from baseline of 30% or greater at the 12-month follow-up (ie, a clinically important improvement). 35,36 A patient achieving this amount of improvement was defined as a responder.
Secondary outcome measures included the following.
1. The mean change scores and the mean 12-month follow-up scores for the ODI and the Numeric Rating Scale (NRS), which ranges from 0 (no pain) to 10 (worst pain imaginable) for leg pain and for back pain experienced in the last week; 2. The Global Perceived Effect instrument 37 with 7 response alternatives, including completely recovered, much improved, slightly improved, unchanged, slightly worse, much worse, and worse than ever, that were trichotomized into substantially improved (completely recovered and much improved), little or no change (slightly improved, unchanged, and slightly worse), and substantially deteriorated (much worse and worse than ever); 3. Duration of surgery and hospital stay; 4. The rate of perioperative complications and adverse events registered on the surgeon form; and 5. The rate of complications and adverse events reported by the patients at the 3-month follow-up.

Statistical Analysis
To make the distribution of observed baseline patient characteristics in the 2 treatment groups as similar as possible, propensity score matching was performed. 38 A propensity score, derived from a logistic regression model, is defined as a patient's baseline probability for receiving decompression plus instrumented fusion, conditional on prespecified plausible confounders (age, sex, American Society of Anesthesiologists grade, body mass index, smoking, ODI, NRS leg pain score, NRS back pain score, the European Quality of Life-5 Dimensions questionnaire, the presence of foraminal stenosis, degenerative disc disease, predominating back pain, number of levels undergoing surgery, and neurological palsy). We used the technique of 1:1 matching without replacement, forming paired cases of microdecompression alone and decompression plus fusion, which had a difference in propensity scores of less than 0.2 in logit of the standard deviation. 38 The null hypothesis was that the proportion of patients with a clinically important improvement (the responder rate) was at least 15 percentage points lower in the microdecompression group than in the fusion group. The null hypothesis was tested by forming a 2-sided 95% CI for the betweengroup difference in responder rate and would be rejected if the lower bound of the CI was greater than −15%. A noninferiority margin of −15% was assumed to reflect the advantage of performing microdecompression alone instead of the more extensive and expensive instrumentation. 39,40 This margin is consistent with analysis in other relevant studies. [39][40][41] The margin corresponds to a number needed to treat of 7 patients (100/15 [6.67]), 42 that is, if 7 patients or more need instrumented fusion to achieve 1 additional responder, the cheaper, safer, and less comprehensive method of microdecompression alone should be considered good enough (ie, noninferior).
Level and change in the ODI and NRS leg and back pain scores were estimated by multisample latent growth curve (LGC) models, with full information maximum likelihood 43 under the assumption of missing at random. Owing to nonlinearity, the models were specified as a latent difference score model, including changes from baseline to 3 months, from 3 to 12 months, and the 12-month follow-up (intercept level). The proportion of each type of procedure varied between departments.
Patient data were nested within hospital departments and could show clustering effects within units.
However, multilevel analyses showed low interclass correlation values for the ODI of 0.023 (baseline) to 0.042 (12 months), for NRS leg pain of 0.026 (baseline) to 0.067 (3 months), and for NRS back pain of 0.013 (3 months) to 0.066 (12 months). The estimated design effect, taking cluster size and interclass correlation into account, showed the highest value to be 2.00 (leg pain at 3 months), which is in the borderline for nonignorable clustering. 44 However, multilevel models including random slope variance at the hospital department level showed no department differences in change scores in the 2 intervals.
The intercept variance was found to be statistically significant for NRS leg pain (σ 2 = 0.49; P = .03) but not for the ODI (σ 2 = 13.52; P = .13) or NRS back pain (σ 2 = 0.12; P = .10). Due to this level of clustering and the focus on observations within patients, single-level LGC models were estimated with robust standard errors corrected for clustering with the maximum likelihood robust. 44 For secondary outcomes, comparisons of treatment groups and corresponding estimates of P values and 95% CI were based on 2-sided tests for superiority. SPSS, version 24 (IBM Corporation) was used for descriptive statistics, analyses of continuous variables with 2-sided t tests or Mann-Whitney tests depending on the distribution of data, analyses of binary variables with Fisher mid-P value and Newcombe hybrid score CIs, 45 and propensity score matching. The LGC analysis was performed with Mplus 8 (Muthén & Muthén). 46 P < .05 indicated statistical significance.

Missing Data
A loss to follow-up of about 20% was anticipated. 15 LGC analysis under the missing data at random assumption was performed for the matched cohort.
Multiple imputation 49 was used with baseline patient characteristics; several clinical, surgical, and radiological parameters; and outcome variables at baseline and follow-up to generate 70 data sets with complete follow-up scores for the ODI and NRS leg pain and back pain. This procedure is recommended if missing data at random may only be partly assumed. 50 Including such covariates may increase the probability for missing data at random and reduce the probability of missingness not at random. 50

Sample Size
For the primary outcome, choosing a 15% noninferiority margin, a type 1 error of 0.05, and power of 0.90 gave a total sample size of 394. An expected 75% follow-up 15 Figure 1 shows the enrollment of participants. Baseline parameters are given in Table 1, and surgical parameters are shown in eTable 1 in the Supplement.

Unmatched Cohort
Follow-up scores of the ODI and NRS leg pain and back pain are shown in eTable 2 in the Supplement.

Primary Outcome
The proportion of patients with a clinically important improvement in the ODI at the 12-month follow-up was 150 of 219 (68%) in the microdecompression group and 155 of 215 (72%) in the instrumented fusion group. The lower bound of the 95% CI (-12% to 5%) for the between-group difference of −4% did not cross the −15% limit of noninferiority ( Figure 2). An absolute difference of 4% corresponds to a number needed to treat of 25 patients (95% CI, 8 to ϱ).  Other registered complications are listed in Table 3.

JAMA Network Open | Orthopedics
eTable 4 in the Supplement shows 12-month follow-up results for the LGC models subsequent to multiple imputation of missing data. At 12 months, there were no differences in the ODI and NRS back pain between the groups, whereas NRS leg pain was statistically significantly higher for the microdecompression group than for the instrumented fusion group (mean [SD], 3.5 [3.0] and 2.8

Discussion
Microdecompression alone was noninferior to decompression plus instrumented fusion. The result of the primary outcome was supported by the patients' global perceived effects and by analyses of the mean ODI scores both before and after imputation of missing data. Furthermore, microdecompression alone was associated with considerably shorter duration of surgery and hospital stay and somewhat fewer surgeon-reported perioperative complications. Patients treated  with instrumented fusion had slightly less leg and back pain and fewer patient-reported superficial wound infections.
Other unmatched observational studies [26][27][28][29] have found nondifferent outcomes between microdecompression alone and decompression plus instrumented fusion. Unlike our study, these studies did not reveal any between-group differences in outcome scores for leg or back pain.
Following 2 simultaneously published RCTs, 9,14 the role of fusion has been debated. [19][20][21]52,53 In the RCT by Ghogawala et al, 9  Randomized clinical trials are the criterion standard for studying treatment efficacy, but their generalizability has been questioned owing to strictly recruited patients and clinicians and enforced treatment allocation. 54 This study was designed to reflect real-world relative effectiveness between carefully matched groups. The aim was to study patients recruited in daily clinical practice at several different hospitals and the treatments chosen according to surgeon and patient preferences. 54,55 Our study provides knowledge about how treatments work in the more pragmatic delivery of health care. 54,56,57 Based on the present study as well as previous pragmatic studies 14, 16 and considering the large clinical practice variation, 6,22,23 the high rate of instrumented fusion seems unreasonable. In accordance with a priori expectations and former studies, 23,58 our findings of shorter operation times and hospital stays indicate that a microdecompression alone is associated with acceptable clinical results at lower costs. Although instrumented fusion was associated with somewhat more pain reduction, the high number needed to achieve 1 additional responder and the somewhat higher perioperative complication rate showed disadvantages of instrumentation. Altogether, we consider the noninferior clinical effectiveness and the potential health-economic benefits of microdecompression alone to surpass the procedure's potential inferiority. However, this study does not provide evidence that microdecompression alone should be the preferred method for all patients. Adding fusion to decompression may still be a good treatment option for subgroups. Owing to a lack of evidence for defining such subgroups, future research should endeavor to identify variables associated with the best treatment for each individual. 39

Limitations
The diagnoses of spinal stenosis and degenerative spondylolisthesis are based on the surgeons' evaluation of radiographs, the radiological descriptions, and clinical signs and symptoms. We have not retrospectively checked whether all established diagnostic criteria 59 were fulfilled. Incorrect diagnoses may therefore have been undetected. Furthermore, data on reoperations and data beyond the 12-month follow-up are lacking. Some studies have found lower reoperation rates when a decompression has been supported by fusion, 9,60 whereas other studies have demonstrated similar 14 and even higher 27 reoperation rates after an additional fusion. For a mixed population undergoing spinal surgical procedures, the clinical outcomes at the 12-month follow-up seem to be the same as the 2-year outcomes 61,62 and stable even at the 5-year follow-up. 26,63,64 Finally, it is important to recognize that, unlike an RCT, this study was not able to detect treatment-related differences in efficacy. Although the propensity score matching equalized the baseline scores regarding the observed parameters, the distribution of unobserved parameters (eg, radiological parameters and potential differences associated with patients' expectations owing to given information) might have been unbalanced. A risk of residual bias does therefore still exist.

Conclusions
This study found that microdecompression alone seems to be not appreciably worse than decompression and instrumented fusion for treatment of degenerative spondylolisthesis. We would carefully suggest the less extensive and less expensive treatment as the primary surgical choice for most patients with lumbar degenerative spondylolisthesis.