RTOG 0815: Dose Escalation and ADT. Questions answered?

A more modern trial that answers so many questions "definitively", or does it.

May 21, 2023

Let me begin by saying this is a study to know - an important part of our literature - it is that well done. Kudos to all involved! Today we’ll look at some components from perhaps a different perspective. This discussion is likely a bit different than the more common narrative and as usual, I might very well be wrong and the narrative supported by many might very well be the more valid assessment. But maybe this perspective might make you consider additional paths or ask different questions - and that - in and of itself, would be reason enough to write.

RTOG 0815

It is released as two articles - the cancer outcomes and a paired patient reported toxicity outcome paper.

Here are the highlights from the abstracts:

Cancer Outcomes: RESULTS
Median follow-up was 6.3 years. Two hundred nineteen deaths occurred, 119 in arm 1 and 100 in arm 2. Five-year OS estimates were 90% versus 91%, respectively (hazard ratio [HR], 0.85; 95% CI, 0.65 to 1.11]; P = .22). STAD resulted in reduced PSA failure (HR, 0.52; P <.001), DM (HR, 0.25; P <.001), PCSM (HR, 0.10; P = .007), and salvage therapy use (HR, 0.62; P = .025). Other-cause deaths were not significantly different (P = .56). Acute grade ≥3 adverse events (AEs) occurred in 2% of patients in arm 1 and in 12% for arm 2 (P <.001). Cumulative incidence of late grade ≥3 AEs was 14% in arm 1 and 15% in arm 2 (P = .29).
Cancer Outcomes: CONCLUSION
STAD did not improve OS rates for men with IRPC treated with dose-escalated RT. Improvements in metastases rates, prostate cancer deaths, and PSA failures should be weighed against the risk of adverse events and the impact of STAD on quality of life.

Toxicity: CONCLUSION
Compared with dose-escalated RT alone, adding TAS demonstrated clinically meaningful declines only in EPIC hormonal and sexual domains. However, even these PRO differences were transient, and there were no clinically meaningful differences between arms by 1 year.

Today’s Commentary:

Issue one:
When is dose escalation not dose escalation?

I get it - they put “dose escalation” into the title of the study back in 2008 and they even carry it forward as “High-Dose RT” on the header of each page. But is it today? It is 7920 in 44 fractions - an equivalent dose to 74-75Gy. And yes, I’m fully aware that 11% of men did have a brachytherapy boost (so around 170 total men), but medicine is hard. And finding “true” findings in trials is amazingly difficult even with great trial structures - look at the trouble we have simply reproducing any trial result more broadly across medicine (ref 2,3,4). So, to say that we clearly know things about brachy (a true high dose approach) when 9/10ths of this data is relatively low dose radiation (by today’s standards), is not something I would choose to hang my hat upon - at a minimum, it makes a difficult job even harder.

And 74-75Gy is relatively low dose for disease beyond low risk and maybe some favorable intermediate risk. Remember, FLAME-SIB “low dose” arm is nearly 5% more dose than 90% of men in this trial and it was demonstrated to be clearly inferior to the SIB dose escalation arm with respect to biochemical failure. And yet here in the paper and online we run titles of “dose escalation”. Is it higher than the old days of 66-70 - like when I trained? - Yep - and yep, I’m that old. So relatively speaking is it “dose escalation”? Absolutely. But for 90% of the cohort, it is now a proven inferior dosing for control in higher risk disease.

Note: if you are a resident or student of the field, you need to have broad context - Who you are reading? What is their slant? That is honestly why I post things on something labeled - Protons 101 and have a tag line on dose - hopefully that helps you gain context. Especially today, I believe understanding the author’s background and focus is critical to accurate filing of information.

And on today’s topic, my read of the common narrative is that my thinking on dose lies beyond the broad consensus. Rather than have you hunt around, I brought it here. Below, I believe, is a more representative stance from an ASTRO website editorial - it is from 2021 regarding what we learned from RTOG 0815 based on earlier data releases (ref 5). Some numbers changed and some p values flipped but realistically, I’d say the gist of the paper remains unchanged. (Editorial; ref 6) - (it is not the entire editorial but the pieces are unedited and I tried to keep context - it appears, on my read, to oppose a few articles on this site on some level)

Multiple experts have reported based on retrospective data that the use of dose-escalation, especially with dose-escalation with a brachytherapy boost, obviates the need for ADT in intermediate-risk disease. To this effect, national guides have followed their recommendation to allow omission of ADT when a brachytherapy boost is used for men with intermediate-risk disease.
Dose-escalation could either be solely from external beam RT (EBRT) to 79.2 Gy or via brachytherapy boost (11.4% of enrolled patients). Very importantly, the trial stratified by number of intermediate-risk factors (a close proxy to FIR versus UIR), ACE-27 comorbidity status, and use of a brachytherapy boost Thus, RTOG 0815 was designed to answer the three unproven opinions listed above.
Despite the short follow-up, there was a 5% absolute improvement in OS and a 15% relative improvement in OS favoring RT plus ST-ADT (HR 0.85, 95% CI 0.65-1.11). Notably, these results are consistent with EORTC 22991, which recently reported long-term (12-year) outcomes. This trial showed that the addition of short-term ADT improved event-free survival, irrespective of lower versus higher doses of RT. There was a meaningful 6.5% absolute improvement in DM-free survival at 12 years (p=0.065).
While some unfamiliar with appropriate statistical methodology might state the p-value was not significant in the brachytherapy boost subset and conclude there was not benefit of ADT in these men, this is invalid. Due to the trial being only powered for the overall cohort size, one must show through a significant treatment interaction that a given subgroup has a different relative benefit. With a clear negative treatment interaction this means that there is similar relative benefit irrespective of dose intensification method. This provides the strongest level of evidence to date for the benefit of ADT across these patient populations.
RTOG 0815 definitively proved wrong many experts who have stated that dose-escalation obviates the benefit of ST-ADT in intermediate-risk prostate cancer. ADT benefits men similarly whether treated with dose-escalated EBRT and those treated with a brachytherapy boost.
NCCN and other guidelines should be updated to reflect that RT+ADT should be recommended based on level 1 evidence, irrespective of dose-escalation method. Additionally, we need to reassess guidelines that state that all FIR patients do not derive benefit from short-term ADT, as it is clear some do.

A pretty strong opinion on the topic. Always good to consider all perspectives. At least now, it is easy to compare and contrast the perspectives for yourself.

If you learn anything, consider sharing this or the broader site. Thanks for the support!

A Hazard Rate Discussion:

First, I really like the write up of the trial. The abstract focuses on hazard rates but they do a very nice job in suppling both the hazard rates and the Kaplan-Meier curves for a variety of endpoints - numbers at risk, fails, censored - YAY!

Hazard rates are a common modern trend, and a new twist since I left residency. Its probably “better math” but like anything can be misused. For me, I like absolute benefit to help me decide whether a patient needs ADT. I tend to use a hazard rate less often.

Do I think that ADT has some benefit from low risk to very high risk disease? Yes. Is the hazard rate approximately the same? Seems to be in our data. Do I agree with ADT for some favorable risk patients - today, I do not. Maybe that changes down the road but it won’t be based on 10 yr old data (we come back to this below in Two sides of the same coin).

The point I try to emphasize here is to focus on doing very best radiation and achieving the highest outcomes possible without ADT and then the relative discussion of risks / benefits of the ADT toxicity will shift towards less use. If we compromise the primary treatment - radiation - to more moderate levels, then ADT has a larger absolute benefit.

Here is my example - not perfect, but not a bad quick example - it is using PSA Failure from the trial which has the most events so the resulting largest absolute numbers:

RTOG 0815: PSA Failure curve with additional hazard rate projections demonstrating a shrinking level of return.

Above, the blue KM (Kaplan-Meier) curve is no ADT, the red KM curve adds short term - 6 month ADT (from our study). It produces a downward shift in the slope of the line at an expense of one year of toxicity. This is not a transient effect where we are delaying recurrence by a year or so, it appears to be a real change in slope.

But if we can push the blue KM line to the red KM line with better radiation - which I believe TODAY is clearly supported in the literature - producing a 5 yr rate of disease free survival >90%, then the effect of ADT - keeping the same hazard rate - produces the lower “new” black line I added (with 11.7 beside it - those numbers are the approximate angles / slope of the line).

Or in an even more favorable scenario, SBRT truly achieves 95% biochemical disease free survival across low, intermediate, and high risk disease as per the prospective meta-analysis that I’ve discussed at length previously here. This outcome would then make the black line our new baseline. The red line shifting risk lower then represents the same hazard rate reduction from that excellent radiotherapy alone endpoint.

Note: I don’t think data supports 95% biochemical disease free survival for all comers with an SBRT approach based on published literature today - PACE will be a good measuring stick - just an example using what “we” most commonly cite.

And here are the resultant effects in absolute values - 14% failure falls to 8%. In my proposed approach 7.5% turns to 4.5%. If you use 95% as your baseline, the ADT benefit falls to around 4.5% to 2.7%. So here, depending upon the baseline outcome, the benefit falls from a 6 percent difference in biochemical disease free survival to a 1.8 percent difference in biochemical disease free survival. The number needed to treat increases from 17 to stop one biochemical recurrence to 55. And the toxicity for that patient cohort remains stable despite less than 1/3th the benefit. I believe we have pushed the balance point clearly in favor of less ADT usage.

It is not that ADT isn’t “beneficial” or that the hazard rate changes. I really don’t know if people argued that or not - I assume they were closer to my stance, but again, I don’t know what others think and I didn’t follow this debate that closely for the past decade. But unlike the editorial which seems to imply a pretty stark difference in the two perspectives (on my read): “RTOG 0815 definitively proved wrong many experts who have stated that dose-escalation obviates the benefit of ST-ADT in intermediate-risk prostate cancer.”, I’m still not so sure. Even after quite a bit of time considering the topic, I don’t believe they are mutual exclusive as I’ll discuss below. But first issue two.

Issue two:
When I don’t trust patient quality of life surveys?

Quality of life surveys are valuable. They add a different perspective. They are not - just like any other metric - foolproof and I believe they can be one of the more inconsistent items that we consider. Don’t leave in disgust - hear me out - I just said they are valuable and yes we need them and they can ballpark things - sometimes better than others, but they ain’t perfect.

A far fetched analogy:

Lets say you have a great metric for “bed comfort” - you know, how good is your bed that you sleep in and are you happy with it. It has been verified and is reproducible enough to make the cut and be implemented in your new study.

We now take two people now with this “validated” survey. Both have 10 year old beds. One buys a new bed and the other has just been sentenced to serve a 6 month jail term.

(and yes, I get the irony that in this scenario ADT is equated to jail time - it made me laugh)

Survey 1: baseline - both are ok - bed isn’t as good as it was 10 yrs ago, but both rate the bed in a similar manner consistent with our validated survey.

Survey 2: one week post the changes - I’d be shocked if the new bed didn’t crush the jail cot. This is what we study and what we validate - approximately equal starting populations and then discerning changes from that baseline.

Survey 3: 12 months later. Both are back at home and have returned to normal daily patterns. Now, despite one person have a new bed that is now 1 year old and the other having the 11 yr old bed, I’m less certain of my expectations. Would you be surprised to little or no difference in the surveys? I would not. I assume the new bed is nicer than the old bed but perspectives matter and any bed at home is far nicer than the jail cot - or at least I assume.

Your “validated” survey was from a balanced baseline. It is not validated for completely different paths with different histories and experiences. Sure the questions are the same, but the recovery from very different points back to some “new” baseline is very often not part of the validation and really how can it be?

My prior dosimetrist and facility manager served in the military - he was on the border of North and South Korea for I think about 2 years. He spoke of terrible conditions - tent life, jungle mosquitos, foot rot etc. etc. And at work, on a busy frustrating day, he could often be heard saying “It ain’t Korea.”

Going through hell makes summer more tolerable. Perspectives matter.

Data Driven Example 1:

Maybe the surveys do work as we want and maybe there really is “no clinical difference” but labs say otherwise. Here is some data from Memorial Sloan Kettering on testosterone recovery post ADT (ref 7). This seems more reasonable.

Just look at the blue line - faster recovery with shorter ADT use. At 1 yr, half have good recovery. At 2 yrs, over 20% still don’t have full recovery to where they were with short course ADT.

You can believe that the patient reported outcome surveys are fully complete representations of a persons quality of life and deem that beyond one year all is “basically equal”, stating “there were no clinically meaningful differences between arms by 1 year”. Or, as I tend to do, be a bit more conservative and look beyond the headlines across larger collections of data and consider where this new data point should be filed in your mental filing cabinet of our literature.

At best, 6 months of ADT is a 1 yr massive quality of life impact - six months of treatment and around 6 months to recover. Based on my clinical experience and based on measurable labs, I think it is a bit more than that - at least for some men. Is it magnitudes more similar at one year compared to 3 months? Absolutely, but I would caution that massive changes in ones perspective are difficult to accurately access. We are amazingly adaptable creatures.

Data Driven Example 2:

The second, to me, clear example that demonstrates just how much we downplay these toxicities is the CTCAE table of results paper, where only 17% of men on ADT have ANY sexual / reproductive function toxicity within 90 days of ADT. Even long-term half the men had zero toxicity. From my clinical perspective, both are laughably poor representations of the effect of ADT on sexual toxicity. For some odd reason, we seem to have inherent bias to downplay these issues and I think you see clear evidence for that in this publication - both in the data and write-up.

Therefore my wording in the conclusion would have been more reflective of the toxicity difference that we KNOW exists across the entire timeline of active treatment. In a way to look more broadly at data we have rather than to use so many words like “even”, “only” and our favorite “transient” to emphasize the recovery of (my term) most men.

When I read this conclusion, I consider the narrative of the editorial - presented well before the release of this paper and then look to decide whether this common narrative played into the evaluation of the data. Or I think more precisely, to what extent did that narrative play into the composition of the headline conclusions. I think it is, at least, something readers should consider.

Two Sides of One Coin:

Here’s how I reconcile the two perspectives: the one in the editorial vs. the one which is labeled as “definitively proved wrong.”

There appears to be hazard rate improvement across all disease with ADT. I think of this way - there are low risk cases that we simply misclassify - missing the high risk features or something. Nothing is perfect. Therefore there are some cases - either for missed known or currently unknown factors that need more than just local treatment. Or perhaps there are cases that need ADT to modify the radiation impact for some cancers. But simply, some lie within a box of unknowns.

And this leads to one focus of work - an attempt to use genetics or AI histology or something different, to solve the genetic / metastatic / ADT radiation sensitizing component to the issue. And this path likely will demonstrate some real value. Across cancer, with improvements in technology, we are constantly improving our precision along a similar path / approach - be it better imaging or updates to our staging models or new prognostic tests - we continue to generate progress via this approach.

The counter to this argument is to use radiation specific technology to push on dose and right sizing the target - essentially to shrink the size of the black box of unknowns. Stated differently, I’d like to make the work in finding the genetic difference as difficult as possible by using brute force to push the number at risk for failure as low as possible. If 30 people in 100 fail, the cost / benefit ratio for any prognostic test is magnitudes easier to achieve than if 5 in 100 fail. Will it make the other approach obsolete? No. But we can make it far less impactful. I think of it as an argument in favor of our field and in favor of the value we provide.

One approach leans on radiation a bit more, one leans towards tech that is generally beyond our field a bit more.

That’s it - two approaches. I think both are required for us to push forward. As I discussed previously, I just want to see the pendulum swing back towards the middle - I think we are too far away from “brute force”. And yes, I consider SBRT in lung cancer to be a wonderful example of “brute force” - brute force with crazy good technology to push dose and elevate cure rates by delivering great dose to a precise target. Yep. I like doing more of that.

Why on Substack and not longer Social Media Posts?

And I’ll just add this is why these articles are not short. These are complex topics - far too long for social media and instead of trying to discuss topics there - in the PVP arena of “social media medicine” - I think it is better addressed here, in a place where I can take weeks to unroll a topic if that is what is needed. Substack is a writing platform - it makes this work fun and far less effort and grants a better opportunity to unpack complex topics.

Where I differ with the Editorial:

Ultimately, where I disagree with the editorial is in tone and language. I would shift away from 75 Gy being considered dose-escalation - I believe data demonstrates we are beyond that today. That choice of wording, from my perspective, represents part of a broader narrative of accepting a de-emphasis regarding dose escalation and allowing a shift to address these issues via additional systemic approaches or developing prognostic panels.

And secondly I don’t agree in the tone where the editorial appears to present that one perspective is correct and the other scientists are wrong. That it has been “definitively” proven to a point - that only with a misunderstanding of math can you arrive at a different result. I’d like to see us move past that type of language in science.

Consider for a moment that in 5 years, we have a new immunotherapy targeting a specific receptor that is, in fact, persistently downregulated, with short-term ADT use making that option less effective after the use of any ADT. The systems of the body are amazingly complex and interactions can be in the thousands.

After 25 years in oncology, the most certain scenario is that there are no absolutes in medicine - if you see me writing something akin to that level of “certainty”, please comment - I need to correct it :)

My Summary:

I think it is more just two perspectives. Two different paths to a common goal. But I would encourage us to really consider, not only the benefit of ADT but the toxicity - and not just the headlines from the patient reported outcomes section.

Here is the CTCAE table from the primary paper which I haven’t seen get much publicity: Grade 3 and higher toxicity moves from a crude 2.3% to 11.8%. (17 men increased to 86 men)

So an increase of reported Gr3 or higher toxicity of 69 men. In contrast the DM rate reduction was 28 men to 7 men. And as a radiation oncologist, I think a lot of those failures, we’ll now salvage with one week or less of treatment delaying (at least) progression to systemic treatment and likely minimizing any prostate cancer specific mortality difference.

In the end, I’m hopeful that we continue to press on both levers to improve outcomes. I really hope we hold the toxicity of ADT and the costs of genetic / AI prognostic tools to the same high value oriented standards we hold for our own primary treatment approach. I continue to believe that radiation oncology represents massive value. And here, I’ll try to present different looks at our data - largely from a position of belief in the tremendous value we provide in oncology.

In the end, you can decide - hopefully based on thoughtfulness and thoroughness in your own review of the data. And together, I think there is amazing opportunity for us to continue to push our field forward.

I added one final reference below - a large meta-analysis. I didn’t want anyone to think I missed it. If interested in this topic, in general, I see it as supportive of RTOG 0815, the hazard rate argument, and from my perspective, it supports two side of the same coin argument. As always, I encourage you to read it and decide what questions we should be asking moving forward.

REFERENCES:

Current RTOG 0815 Links: (Outcomes then PRO)
https://ascopubs.org/doi/abs/10.1200/JCO.22.02390
https://ascopubs.org/doi/abs/10.1200/JCO.22.02389
Reproducibility in Science
https://www.ahajournals.org/doi/10.1161/CIRCRESAHA.114.303819
Raise standards for preclinical cancer research
https://www.nature.com/articles/483531a
Exposure to US Cancer Drugs With Lack of Confirmed Benefit After US Food and Drug Administration Accelerated Approval
doi:10.1001/jamaoncol.2022.7770
Dose Escalated Radiotherapy Alone or in Combination With Short-Term Androgen Suppression for Intermediate Risk Prostate Cancer: Outcomes From the NRG Oncology/RTOG 0815 Randomized Trial
https://www.redjournal.org/article/S0360-3016(21)00909-3/fulltext
Astro Editorial:
https://rb.gy/im7l6
TESTOSTERONE RECOVERY PROFILES AFTER CESSATION OF ANDROGEN DEPRIVATION THERAPY FOR PROSTATE CANCER
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7546513/
Androgen deprivation therapy use and duration with definitive radiotherapy for localised prostate cancer: an individual patient data meta-analysis
https://www.thelancet.com/journals/lanonc/article/PIIS1470-2045(21)00705-1/fulltext

Gaurav Shukla

Nice essay. As a brachytherapist I was hoping to see more patients on the trial getting brachy boost so the question of dose could be better answered.

Mack Roach says 4 mo ADT is plenty. With relugolix and quick T recovery, could that confer some absolute benefit with less toxicity?

Expand full comment

1 reply by Mark Storey MD

1 more comment...

Protons 101