Protons 101: Was the MIRAGE MRI Prostate Cancer Trial Positive?
And if so, how does that compare to the proton Esophageal cancer trial?
Take home: The MIRAGE trial has been largely lauded as a great positive trial showing technology can reduce toxicity in prostate cancer. Likely, you’ll get a great ad by ViewRay if you open up the trial publication claiming “NO GI Toxicity” - so it has certainly earned hype. But if I had prostate cancer, for me personally, I don’t believe I would travel for that machine. In contrast, the Esophageal trial comparing protons to photons was largely received as a negative trial, but if I had esophageal cancer - especially if I were strong enough and the cancer was early enough to have surgery - I would travel for protons. This article attempts to explain my apparent non-consensus assessment of those two trials.
First off, both trials are great accomplishments within the field of radiation oncology - kudos to the authors, designers, and staff that created the concepts and carried it over the line. Excellent work and I love that we are validating technology! Sincerely - excellent work to both teams!
MRI MIRAGE TRIAL:
First let’s look at the MIRAGE data (ref 1) looking at few details beyond the headlines.
Trial Summary:
My simple version: Can MRI technology allow us to have less acute toxicity if we use the technology to decrease margins from 4mm with CT to 2mm with MRI in the setting of SBRT for prostate cancer. They looked at acute Gr2 GU toxicity with short follow up. Secondary metrics are GI acute Gr2 toxicity, IPSS (patient bladder reported toxicity) and EPIC-26 (patient bowel reported toxicity) outcomes. There is no mention of cancer control metrics within the primary analysis of the trial.
Patient Characteristics:
Patient characteristics are pretty balanced - a few more high and very high risk patients in the CT arms and consequently more ADT utilization in CT arm (I’m a bit surprised to see 12% difference in that category for a randomized trial, but can’t infinitely balance arms). I’ve included 2 reference articles where that difference in ADT utilization can contribute to difference in bowel toxicity (see refs). There is also slight imbalance with a bit higher GI comorbidities in CT arm (I point these three difference out, as they ultimately are more different than the dosimetry between the two approaches). So a couple of items slightly favoring the MRI arm for GI toxicity risk in my view.
Dosimetry differences between the two approaches:
As you see below, there is no reason we should see a difference in bowel toxicity that I can see. In fact, from this table, rectal dosing and anal canal dosing was higher than in the CT-Guidance arm. Maybe there is some “integral dose” / volume argument - but realistically, if bowel toxicity is different with no difference in dosimetry metrics, we still don’t understand what creates bowel toxicity very well.
This leaves us with very minor differences in bladder dose at high doses - 3.7cc vs. 1.9cc (both are really quite low. If you don’t live in a prostate dominated world, just know the bladder differences are subtle).
So there is a little less bladder treated to high dose and the overall treated volume is smaller with MRI (both make sense - the margins on the PTV were less). Finally, anal canal dose is higher with MRI despite a smaller volume (I’m not clear why - perhaps machine delivery differences but not clear to me - I could not find a comment in the paper). Those are the dosimetric differences.
Outcomes:
Here is the take home slide showing the measured improvement. In simple terms, bowel differences were generally stronger than bladder differences based on p values.
On the GU side, the differences in the trial were primarily in urinary frequency and urinary retention. (There was one Gr3 complication and it happened in the CT arm but to me that’s clearly too few events to make a clear statement). Gr2 toxicity in prostate cancer is almost always starting an Rx - Flomax or something similar (see ref 2 for the precise definitions). I dug looking for the answer in the paper and don’t see catheter listed so I bet nearly all CTCAE differences are differences in prescribing a temporary medication for obstructive symptoms.
At the same time we see that rectal spacer and increasing the volume of target from primary site to pelvic lymph nodes did NOT affect outcomes from what was presented.
Note: they only present the GU analysis. I assume this is due to it being the primary objective, but in both “Key Points” and the “Results” of the abstract, both GI and GU toxicity reductions are emphasized. To me, that emphasis then requires the same MVA (Multivariate Analysis) for GI toxicity - especially with rectal spacer and pelvic lymph node radiation relating more to that aspect of toxicity.
On subset analysis GU toxicity differences were seen in the smaller prostate group only (<50cc). Mathematically this is the subset where a bigger margin difference ends up being a bigger relative advantage. In the larger prostates >50cc, outcomes were the same.
My summary:
The trial shows a little less toxicity with margin reductions. From the paper data, I think it shows a little less Gr2 toxicity on for bladder. I’m personally less convinced of the bowel data presented (but in my clinical experience with SBRT it wouldn’t shock me that smaller treatment volume is slightly better).
If we believe that bladder treatment of <2cc at 39Gy is the difference, it is surprising that rectal spacing gel and pelvic treatment differences can’t be measured. Further the bowel dosimetry wasn’t better which, I think, should at least bring up questions as to the reproducibility of the bowel toxicity findings.
Finally, I personally really dislike not having any tumor outcome in any trial that potentially can be held up as “important”. I’ve argued against this since the early robotic surgery trials looking only at toxicity and I dislike it here. Decreasing margins does have some, at a minimum, potential for increased recurrence and it should be, at least, a secondary outcome.
But my personal take? I like pushing aggressively for lower toxicity. I think this shows the technology probably helps a little. If I lived in the backyard of an MRI linac, I’d likely choose that program over a typical linac platform. I think the treatment team involved is top tier and delivering top-tier world leading, super high quality treatment. That said, I’d be more inclined to travel for the MD’s expertise than the machine.
At the end of the day, it appears to be a small incremental step forward in a site with minimal treatment related toxicity. I think it was a positive trial and important for radiation oncology to show an outcome difference via technology. But I believe in technology. I moved my practice for access to proton therapy for head and neck cancer trying to push for less toxicity - so by default, I’m a believer that we can do better. So lets now look at the Esophageal RCT and compare.
Esophageal Randomized Phase IIB trial: Protons vs. Photons
Trial Summary:
The basic question: Protons have a dosimetric advantage over IMRT. Can this improve outcomes by reducing toxicity in patients with esophageal cancer treated with concurrent chemotherapy radiation (about half of whom went on to surgery)? (ref 3)
The two primary metrics were total toxicity burden (TTB) and progression free survival (PFS). TTB is a created cumulative toxicity scale created at MD Anderson. The goal is reasonable, measure cumulative toxicity over time between two treatments, but it is not a common well-known metric. And although elegant and seemingly well designed, I believe it ends up being a weakness and argument against the trial outcome. In the trial both modalities deliver the same dose so the likely expectation was to see equivalent PFS with less TTB. Perhaps, if the toxicities directly resulted in death, there might be some hint of OS difference but certainly not by design and not powered for that endpoint and PFS should be unchanged.
So unlike the MIRAGE trial, the technology advantage is not utilized to increase dose or reduce margins, but rather simply to attempt to demonstrate a reduction in toxicity within a well established standard of care therapy where toxicity is relatively high.
Patient Characteristics:
145 assigned - 107 evaluable. Arms seem well balanced - extensive Table 1 of patient characteristics shows only Zubrod performance status was different in favor of the IMRT patients (total, IMRT, PBT arms with p=0.02).
Quickly addressing the drop from 145 randomized to 107 evaluated. Main issues were insurance denial on the PBT arm and requested PBT in the IMRT arm as shown below. Not ideal but data openly presented and expected in any proton therapy trial today and again, an extensive Table 1 shows well balanced arms.
Dosimetry Differences:
Pretty simple here. Coverage is as good while dose to normal structures is less. Basically a restatement of what protons can provide over IMRT.
Outcomes:
The upside:
Total toxicity was markedly reduced in the PBT arm, even with 80% of the proton plans being delivered with passive scanning techniques. IMRT had 2.3 times the total toxicity score of PBT. In the ~50% of patients undergoing surgery, post-operative complication scores were 7.6 times greater in the IMRT arm.
Here is the complete TTB list - this is stuff, that in large measure, you do not want. This comes from cross referencing the document describing the TTB methodology (ref 4).
Below is the primary visual for the paper. It is both good and bad in my assessment. First, you have to take a moment to really look at it - that is bad in today’s world. The red and orange are worse TTB outcomes and simply those outcomes are only found in the IMRT arm. (But you could set the color key different for this non-verified metric and, while the trend would persist, the coloring might be a less difference).
In the superior figure (A), you see the breakdown of two arms with an additional comparison of surgical patients vs. non-surgical patients. The biggest difference is in the surgically treated arms - compare the top grouping (Surgical IMRT) to grouping 3 (Surgical Proton).
In the lower figure (B), we see labels like PNA (pneumonia) and AFIB (atrial fibrillation). The key to the abbreviations is the table 1 included above from the TTB reference article. This 2nd figure again shows greater toxicity in the IMRT arm. Pretty clear graphics especially if this has been your life’s research for the better part of a decade. But as an outsider, having to pull together reference papers etc, it is less intuitive and therefore I think less impressive rather than some simple metric that you can read and appreciate in one sentence - like “Gr3 toxicities were greater by xxx amount.”
Here are two items from the supplemental data PDF that, to me, are simpler. The first shows that TTB is worse with IMRT. (IMRT toxicity risk is clearly pushed much farther to the right). It is quite clear these treatments do not result in the same outcomes, albeit the figure is derived from math on top of a rather complex baseline metric.
The graph below shows Gr4 lymphopenia comparisons illustrating the type of Gr4 toxicity differences at various time points. I’d just repeat: this is NOT Gr2 GU toxicity - start a script of Flomax toxicity.
To me, this portion of data looks very strong. It is strongest in those undergoing surgery, which is a group that will have the best long-term outcome potential and so, I see it as very impressive data. I don’t love the presentation of the trial due to complexity, but protons (even with passive scatter used in 80%) outperformed IMRT at a world class leading institution for IMRT concurrent esophageal cancer treatment.
The downside of the data:
Patient Quality of life data - I’ll summarize, nothing different. (I’ll need to come back to pluses and minuses of QoL data in a future post)
PFS and OS - no benefit (but consistent with design).
And therein is the problem. Paragraphs for the upside explanation and literally “no difference” for the downside explanation.
My summary:
The deeper I look, the more this trial does demonstrate a very significant win for proton therapy. Restating the results, in patients undergoing surgery, the risks of anastomotic leaks, ARDS, pulmonary embolism, reintubation, and stroke were 7.6 times more severe if treated with IMRT instead of PBT.
(Read that again - it is a massive difference. It does include a weighting metric you must agree with, but even if you don’t think the weighting is perfect, complications are still worlds apart. )
And the toxicity differences are in FAR more important items than the MIRAGE trial. One cannot and should not compare the post-operative complication rate difference in this trial to Gr2 prostate cancer toxicity. Unfortunately some of outcome differences in this trial, from my perspective, got buried in the presentation and publication.
I was in the room during the data presentation and it didn’t feel like “a big win”. In fact, it felt like a negative trial. But looking back and looking much deeper, I don’t think that is the case, but in today’s world 95% will never look again. First impressions matter.
The problem was / is. On first glance QoL, PFS, OS - all the same. So very easy statements of “no difference” easily wash away a benefit seen in a new rather complex toxicity metric. TTB is a metric that takes any well educated reader probably 30 minutes (minimum) time to have any chance to really interpret for themselves across multiple references. No trial is perfect and in hindsight, the choice of TTB as a primary metric appears to weigh heavily on the assessment of outcome in this trial.
Substack is saying I’ve rambled long-enough today. :) A closer look at MRI MIRAGE and rectal spacing data is next.
REFERENCES:
MRI MIRAGE Trial:
https://jamanetwork.com/journals/jamaoncology/fullarticle/2800541CTCAE v4.03: Criteria used in MIRAGE Trial
https://www.eortc.be/services/doc/ctc/ctcae_4.03_2010-06-14_quickreference_5x7.pdfEsophageal Cancer Randomized Trial:
https://ascopubs.org/doi/10.1200/JCO.19.02503Bayesian / Total Toxicity Burden Design Publication:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4809549/
Two References linking ADT to bowel toxicity:
Sanda MG, Dunn RL, Michalski J, Sandler HM, Northouse L, Hembroff L, et al.. Quality of life and satisfaction with outcome among prostate-cancer survivors. N Engl J Med. (2008) 358:1250–61. 10.1056/NEJMoa074311 [PubMed] [CrossRef] [Google Scholar]
Stensvold A, Dahl AA, Brennhovd B, Småstuen MC, Fosså SD, Lilleby W, et al.. Bother problems in prostate cancer patients after curative treatment. Urol Oncol. (2013) 31:1067–78. 10.1016/j.urolonc.2011.12.020 [PubMed] [CrossRef] [Google Scholar]
www.protons101.com, home of the original Protons 101 website.
Content for the Protons101 blog written by Mark Storey MD.