For the purposes of coverage policy, a meaningful benefit in human subjects must be demonstrated. This MAC considers a clinically meaningful benefit to be a change in symptoms evident to the patient as he or she goes about daily life. As such, while lumbar artificial discs may preserve range of motion, range of motion preservation does not itself make a treatment reasonable and necessary. Only studies designed to assess clinical outcomes or composite outcomes that included at least 1 clinical measure were considered relevant to this coverage determination. Studies that strictly assessed biomechanics, radiographic changes, or other strictly physiologic effects were not evaluated and are not considered relevant to coverage determination.
A number of artificial discs are available in the United States with different indications, and much of the literature is device-specific, so this summary will categorize the evidence by device. As will be reviewed below, lumbar disc replacement appears to be regarded as an alternative treatment approach to fusion, which in itself is not considered in this LCD.
CHARITÉ® and INMOTION®
The first artificial disc to be FDA approved for use in the lumbar spine was the CHARITÉ® artificial disc. CHARITÉ® was subsequently replaced by the INMOTION® artificial disc, which was approved under the CHARITÉ® Artificial Disc Registration. CHARITÉ® received a FDA premarket approval decision on 10/26/2004, but has subsequently been officially withdrawn on 1/5/2012 (FDA CHARITÉ® 1). While CHARITÉ® and INMOTION® artificial disc systems are unavailable for routine clinical use in the United States currently, the clinical research done on this device family is reviewed for its relevance to artificial disc replacement in general.
The premarket approval clinical study for CHARITÉ® artificial disc in the Unites States was a non-inferiority randomized study comparing CHARITÉ® to anterior lumbar interbody fusion with a BAK cage in patients with single-level DDD, the results of which are available on the FDA’s website (FDA CHARITÉ® 2). Specifically, the study enrolled subjects between the ages of 18 and 60 years with single-level DDD between L4 and S1 who had no prior spine surgery except prior discectomy, laminotomy, laminectomy, or nucleolysis at the same level and had not improved with 6 months of conservative management. Subjects were required to have a diagnosis of DDD with back and/or leg pain with radiographic findings consistent with disc degeneration but without evidence of nerve root compromise or significant central or lateral recess stenosis. Subjects were also required to have an Oswestry Disability Index (ODI) score of at least 30/50 and Visual Analog Score (VAS) of at least 40 mm. Patients were excluded from enrollment if they had DDD at more than 1 level, greater than 3 mm spondylolisthesis, morbid obesity, compression or burst trauma to L4-S1, pregnancy, active infection, psychosocial disorders, autoimmune disease, recent investigational drug or device use, spinal tumor, isthmic spondylolisthesis, or significant lumbar scoliosis. Clinical effectiveness was established at 24 months post-operatively with a primary composite endpoint consisting of ODI improvement, device failures, major complications, and neurological deterioration. The device sponsor recommended a 25% ODI improvement to meet the threshold of success, and the FDA requested a 15 point improvement in ODI (a more stringent standard).
A total of 304 patients were studied with 99 assigned to receive fusion, and 205 assigned to receive CHARITÉ® implantation. Follow-up data at 24 months was collected on 81 of the patients randomized to fusion and 184 of the patients randomized to CHARITÉ® implantation. On the FDA-requested primary endpoints, the overall success was 54% for the fusion group and 58% for the CHARITÉ® group. The group treated with CHARITÉ® had results that met both the manufacturer’s proposed and the FDA’s requested overall outcome requirements for non-inferiority of CHARITÉ® as compared with fusion using a 90% confidence interval. This study was a non-inferiority study and as such was not planned to show a difference in overall success. However, based on numbers provided in the FDA report, there appears to be no statistically significant difference in overall composite success on either the manufacturer’s proposed or the FDA’s requested outcomes (p=0.334 and p = 0.592 respectively). In looking at the individual elements of the composite endpoint, there were very few differences between the 2 treatment groups in device failure success, major complications success, and neurologic status. With regards to success as measured by ODI improvement, 71% of the CHARITÉ® group and 62% of the fusion group met the manufacturer’s proposed definition of success. For the FDA’s requested definition of success in ODI reduction, 64% of the CHARITÉ® group and 58% of the fusion group achieved successful reductions in ODI score. Though statistical significance testing is not reported for ODI change specifically in the FDA report, a Fisher exact test on this data shows that the difference is not significant for either outcome (p = 0.156 and p = 0.412 respectively).
Results from this study have been separately reported in the peer-reviewed literature as well as in a manuscript that reported only the manufacturer-proposed definition of success and did not report results for the FDA-defined endpoint of ODI success (Blumenthal, 2005). This manuscript reports the data for all of the patients enrolled in the study and appears to assign patients lost to follow-up to the non-response group. This manuscript reports a statistically significant difference in outcomes on the composite measure of success (p < 0.0001), but our own analysis of their data using Fisher’s exact test (the reported analytic method of choice for categorical outcomes) shows a non-significant finding (p = 0.087). The difference in mean ODI and VAS score was reported in the study at 2 years as not being statistically significantly different.
Longer term follow-up data was reported from the initial FDA trial for some patients at 5 years in a later paper (Guyer, 2009). This later report presented follow-up data from 58% of the originally studied cohort of patients coming from 8 of the 14 initial study sites. Results are reported regarding the number of subjects who achieved the FDA requested composite outcome as well as each individual component of the composite outcome. In the fusion group 51.2% of subjects met the criteria of overall success at 5 years post-operatively compared with 57.8% of those who received CHARITÉ® implantation, which met the threshold for non-inferiority. Significance testing for the difference between groups is not reported, but using a 2-sided Fisher’s exact test and ignoring patients who were not followed up, the difference between groups is not significant (p = 0.576). For ODI improvement in particular, 65% of subjects met the FDA requested threshold in the control group compared with 68% in the CHARITÉ®, a nonsignificant difference (p = 0.8443). The editors of the journal also placed a comment in the manuscript indicating that BAK cage used in the fusion control group was a largely abandoned fusion device at the time of study publication.
ProDisc®
ProDisc®-L received a FDA premarket approval decision on 8/14/2006 with indications as follows:
“This device is indicated for spinal arthroplasty in skeletally mature patients with DDD at 1 level from L3-S1. DDD is defined as discogenic back pain with degeneration of the disc confirmed by patient history and radiographic studies. These DDD patients should have no more than Grade I spondylolisthesis at the involved level. Patients receiving the ProDisc®-L Total Disc Replacement should have failed at least 6 months of conservative treatment prior to implantation of the ProDisc® Total Disc Replacement” (FDA ProDisc® 1).
The premarket approval clinical study for ProDisc®-L was a non-inferiority study comparing ProDisc®-L to circumferential fusion in patients with single-level DDD, the results of which are available on the FDA’s website (FDA ProDisc® 2). Specifically, the study enrolled subjects between the ages of 18 and 60 years with single-level DDD between L3 and S1 who had no prior spinal fusions and had not improved with 6 months of conservative management. The diagnosis of DDD was established based on the presence of back and/or leg pain with radiographic findings consistent with disc degeneration. Subjects were also required to have an ODI score of at least 20/50 and to be “psychosocially, mentally and physically able to fully comply.” Patients were excluded from enrollment if they had DDD at more than 1 level, too small vertebral endplates, allergies to 1 of the prosthetic disc materials, compromised vertebral bodies, facet joint disease, lytic spondylolisthesis, greater than grade I spondylolisthesis, spinal stenosis, disease of bone quality, morbid obesity, pregnancy, active infection, use of medications that interfere with bone healing, autoimmune disease, systemic viral illness, or active malignancy. Clinical effectiveness was established with a primary composite endpoint consisting of ODI improvement, no need for reoperation, Short Form (SF)-36 improvement, neurological status, and radiographic success 24 months following surgery.
A total of 242 patients were randomized with 80 assigned to receive fusion, and 162 assigned to receive ProDisc®-L implantation. Data was also reported for an additional 50 subjects who received ProDisc®-L outside of randomization, but this non-randomized group was not considered in the effectiveness comparisons. Follow-up data at 24 months was collected on 71 of the patients randomized to fusion and 149 of the patients randomized to ProDisc®-L implantation. On the FDA-requested primary endpoints, the overall success was 40.8% for the fusion group and 53.4% for the ProDisc®-L group. The group treated with ProDisc®-L had results that met the FDA’s requirements for non-inferiority of ProDisc®-L as compared with fusion. This study was not planned to show a statistically significant difference in overall success, though the FDA summary report indicates that using a 1-sided Fisher’s exact test, a difference was found between the ProDisc®-L and control groups with p = 0.0438 on the FDA requested definition of success (while a 2-sided test value is not reported, this would not have been significant with a 2-sided test). The difference between the 2 groups was statistically significant (p=0.0053) with the applicant-proposed success criteria and would have been significant even with the use of 2-sided testing.
Results of the initial FDA approval study have also been published in separate manuscript in the peer-reviewed literature (Zigler, 2007).
Some of the subjects included in the initial FDA approval study were followed for 3 more years (a total of 5 years of post-operative follow-up) as part of a post-approval study. Results were published on the FDA website (FDA ProDisc® 1) and in the peer-reviewed literature (Zigler, 2012). There was an 85.1% follow-up in the ProDisc®-L group and a 74.7% follow-up in the fusion group. Composite statistical success was not significantly different between the 2 groups at 5 years (p=0.7438). Secondary outcomes including ODI, SF-36, and VAS Pain were not statistically significantly different between the 2 groups at 5 years.
In addition to the FDA premarket approval study comparing ProDisc®-L to fusion, a study was published comparing surgical outcomes in patients treated with the Pro-Disc® II to conservative management in patients with lower back pain by authors with no reported competing interests (Hellum, 2011). Patients were considered for inclusion with DDD only at the lower spinal levels and were excluded if they had symptoms of nerve root impingement. The primary outcome was mean change in ODI at 1 and 2 years post-operatively with a pre-specified difference of 10 points being considered the minimum clinically meaningful difference. A total of 86 patients were randomized to each group. One and 2 years post-operatively, both groups had improvement with a reduction from 42.8% to 33.0% at 1 year and 30.0% at 2 years in the rehabilitation group. The group treated with disc arthroplasty had a reduction from a mean ODI of 41.8% at baseline to 22.3% at year 1 and 21.2% at year 2. The between group difference in outcomes was considered statistically significant (p < 0.001 and p = 0.001 at 1 and 2 years respectively) favoring disc arthroplasty. However, the between group difference at 2 years was reported as 8.4 points on the ODI, which was below the pre-specified threshold for a clinically important difference of 10 points.
activL® Artificial Disc
activL® received a premarket approval decision on 6/11/2015.
Per FDA website: “This device is indicated for reconstruction of the disc at 1 level (L4-L5 or L5-S1) following single-level discectomy in skeletally mature patients with symptomatic DDD with no more than grade I spondylolisthesis at the involved level. DDD is defined as discogenic back pain with degeneration of the disc confirmed by patient history, physical examination, and radiographic studies. The activL® artificial disc is implanted using an anterior retroperitoneal approach. Patients receiving the activL® artificial disc should have failed at least 6 months of nonoperative treatment prior to implantation of the device" (FDA activL® 1).
The premarket approval clinical study for activL® was a non-inferiority study comparing activL® to a control group who received disc arthroplasty with 1 of the other 2 disc replacement devices available in the United States, CHARITÉ® or ProDisc®-L, in patients with single-level DDD, the results of which are available on the FDA’s website (FDA activL® 2). Specifically, the study enrolled subjects between the ages of 18 and 60 years with single-level DDD between L4 and S1 who had no prior spine surgery except prior microdiscectomy, laminotomy, hemilaminectomy, nucleoplasty, or intradiscal electro-thermal annuloplasty and had not improved with 6 months of conservative management. Subjects were required to have a diagnosis of DDD with back pain with radiographic findings consistent with disc degeneration but without leg pain suggestive of radiculopathy, myelopathy, or significant sagittal stenosis < 8 mm. Subjects were also required to have an ODI score of at least 40/50. An abbreviated list of additional exclusionary criteria is as follows: morbid obesity, active infection, psychosocial disorders, autoimmune disease, recent investigational drug or device use, spinal tumor, isthmic spondylolisthesis, significant lumbar scoliosis, viral infection precluding surgery, insulin-dependent diabetes, prior intra-abdominal inflammation, prior nephrectomy, or history of thromboembolic disease. Clinical effectiveness was established at 24 months post-operatively with a primary composite endpoint consisting of ODI improvement (number who achieved a 15 point improvement), neurological status, radiographic range of motion status, device status, and no serious device related adverse events. A second composite clinical endpoint was also requested by the FDA, which did not include range of motion. It appears that this was not a pre-specified analytic decision. Data were also reported at 12 months, 3 years, and 4 years, in addition to numerous secondary endpoints. This study was designed to demonstrate non-inferiority, but if non-inferiority was demonstrated there was a pre-specified plan to test for superiority of activL® over the control group’s treatment.
A total of 376 patients were enrolled and proceeded to surgery, 52 of whom were not randomized, leaving a total of 324 randomized subjects, 218 to activL® and 106 to the control group. Complete follow-up data at 24 months was available for 223 subjects, but in reporting results, subjects for whom follow-up data was unavailable were considered failures. Data was also reported for patients who were not randomized, but this data is not considered in the primary efficacy analysis. On the originally specified composite measure, including range of motion, 42.2% of subjects in the activL® group and 28.3% of subjects in the control group were found to achieve success. activL® was both non-inferior to (p < 0.0001) and superior to (p = 0.02) control. Much of the difference between the 2 groups on the composite outcome appeared to be due to range of motion differences, leading to the FDA’s request for analysis with a composite endpoint not including this component. On the composite measure of overall success excluding range of motion, 61.9% of subjects in the activL® group and 52.8% of subjects in the control group were found to achieve success. The activL® group had non-inferior (p = 0.0004) but not superior (p = 0.1485) outcomes compared with the control group. In each of the individual components of the composite endpoint, only range of motion was significantly different (p = 0.0065). Success on ODI (specified as a 15 point improvement) was achieved in 75.2% of the activL® group and 66.0% of the control group, a non-significant difference (p = 0.0874). While it was not part of the pre-specified primary outcome, and completeness of follow-up data was limited, results are reported for 3 and 4 years of post-operative follow-up for the 2 composite outcome measures (with and without range of motion) treating losses to follow-up as treatment failures. Using a 2-sided Fisher exact test, differences between the 2 treatment groups were not significant at either time point for either composite outcome.
The results of the FDA approval study for activL® have also published in the peer-reviewed literature (Garcia, 2015). We are unaware of additional longer term follow-up data in other subsequently published studies.
Other Studies
A study comparing lumbar total disc replacement using multiple disc replacement systems compared with fusion has also been published with follow-up at 2 years (Berg, 2009) and at 5 years (Sköld, 2013). In this study 80 patients were randomized to receive 1 of 3 prosthetic discs, and 72 patients were randomized to receive fusion. The 3 prosthetic discs used included the CHARITÉ®, ProDisc®, and Maverick™ artificial discs. The fusion methods used were posterior lumbar interbody fusion or posterolateral fusion. Results were not further stratified by disc or fusion method. The primary outcome measured was global assessment of symptoms rated on a 5 point scale from “worse” to “totally pain free.” However, a specific threshold does not appear to have been selected for defining treatment success in this study. Prior research has used any level of improvement on the global assessment as a threshold for defining a clinically important difference (Hägg, 2003).
At 2 years the pain free fraction of each treatment arm was 30% for disc replacement and 15% for the fusion group, a statistically significant difference (p = 0.031). However, the differences between the proportion of patients who achieved other levels of improvement were not statistically significantly different at 2 years. It appears that 70 of the 80 subjects in the disc replacement group had some improvement, and 62 of the 72 subjects in the fusion group had some improvement. Therefore, if the threshold of any level of improvement is used to define treatment success, then from the data reported, the difference in success was statistically insignificant at 2 years post-operatively (p = 0.815). Data from 5 year outcomes shows a statistically significant difference in the proportion of patients who were totally pain free 38% versus 15% for the disc replacement and fusion groups respectively (p = 0.002). However, if a threshold of any level of improvement is used as a marker of treatment success, then 71 of the 80 disc replacement subjects had success, and 62 of 72 fusion patients had success, an insignificant difference (p = 0.634).
A number of meta-analyses of lumbar total disc replacement have also been performed with varying results.
A meta-analysis compared lumbar total disc replacement to fusion for the treatment of lower back pain with 2 and 5 year post-operative results (Yajun, 2010). This meta-analysis included a total of 5 randomized controlled trials. The meta-analysis included only 1 and 2-level disease and included CHARITÉ®, ProDisc®, Maverick™, and FlexiCore® devices. Anterior and posterior fusion approaches were included. This analysis found a statistically significant difference favoring disc replacement over fusion at 2 years. There was a mean difference of 5 points on the VAS and a mean difference of 4 points on the ODI 2 years post-operatively. The study authors described this difference as clinically insignificant. There was no statistically significant difference 5 years post-operatively.
A Cochrane review and meta-analysis examined lumbar total disc replacement for the treatment of lower back pain (Jacobs, 2013). A total of 7 randomized controlled trials had results pooled. The trials included CHARITÉ®, ProDisc®, Maverick™, and FlexiCore® devices in subjects with both single and 2-level disease. Treatments used in the control group included fusion and rehabilitation. Included studies reported results for up to 2 years post-operatively. There was a mean difference of roughly 5 points on the VAS and a mean difference of roughly 4 points on the ODI between disc replacement and control treatment. The authors specified clinically important differences of 15 on the VAS and 10 on the ODI. Using these criteria for clinical importance, the statistically significant findings favoring disc replacement were concluded by the authors to be clinically insignificant.
Another, meta-analysis combining results of 6 randomized-controlled trials (Wei, 2013) compared total disc replacement to fusion surgery in the treatment of lower back pain using 2 year post-operative outcomes. The meta-analysis contained studies concerning both single level and 2-level disease and included CHARITÉ®, ProDisc®, and Maverick™ devices. Fusion techniques included both anterior and posterior approaches. The meta-analysis used random effects models and found results favoring disc replacement on the VAS of pain as well as the ODI. The authors concluded that wider spread adoption of the use of artificial disc replacement would be appropriate.