Intermediate Risk Prostate Cancer: How good is good enough?

With recent PACE-B results, are genomic or AI approaches even needed?

Oct 29, 2023

The impact of PACE-B is broad. Hopefully this story drives that point home.

The other day I was in a multi-specialty tumor board and a pretty typical patient was presented. PSA <10, Gleason 3+4 disease, with a palpable nodule - lets just say T2b. But the Decipher returned “high risk” so they were adding in ADT.

But should we? In fact, now that we have achieved such a high cure rate in this subset, should we order additional tests at all? Today, we’ll look at the math and see if there is really any reason to complicate the workup and add these additional tests.

If we moved from 85% to 96% in about 10-15 years based on simply hitting the target better, let’s just assume today we’re nearly at 96%-97% - not unreasonable. So the entire opportunity is a fraction of less than 5%.

And, yes, I know. I’ve argued time and again we have swung too far AWAY from optimizing results, but I’ve also argued many times for a hazard rate argument. And now that PACE-B is out at greater than 95% control from disease relapse at 5 years, I’m going to take the hazard rate side of the coin.

As always, you read and decide. And if you learn things or even just consider different perspective, please support and subscribe to receive new articles.

With that case as an example, let’s look at some of these other tests and consider whether the basic question: what is their potential benefit?

Before we begin, I’m not making the argument for patients who do not meet PACE B criteria. We’re talking about candidates clearly within the framework of PACE B data. I’ll come back to higher risk disease later in a different discussion.

So the example case: it clearly is well within PACE B criteria, in fact, 44% of PACE B was T2c disease. I upped the clinical stage to demonstrate this is technically (in America per NCCN) an Unfavorable Intermediate Risk Cancer - GG2 with T2b - so two intermediate risk factors but still well within PACE B.

(Re-read: T2c cohort size in PACE-B is 44%. I think this speaks to the fact that the US market treats more favorable disease than most other countries. T2c is really quite rare in my clinic as an example and I think if you read / hear the work out of Tata Memorial, their patient cohorts in India are far more advanced disease than we see. Very important context to remember when you read / consider global literature.)

Expectations are clearly then north of 95% for this patient with radiation alone. And as I wrote, results continue to improve. So realistically, treated today in a high volume program, I’d guess results of 96% -97%. Really crazy to type that number, but multi-institutional prospective data is north of 95% from ~2016-2017. And yes, we are better today than we were then. Remember in the prior 15 years, the bar appears to have moved from the low 80s to 95% for this group of patients.

So let’s say the failure risk for the first 5 years is 3.5%. If you want to jump out to 10 yrs - it will likely fall in a 5.5% range - not quite doubling with this strong initial five year data. Examples / references would be in great low nadir brachytherapy series for your examples - they hit 95% and then are actually quite flat for many years (Ref1 - if PACE B published kinetics, one could fine tune this estimate, but that is our ballpark).

It is within that failing cohort that lies the entirety of the potential benefit. Total group that you MIGHT help is around 6% at 10 years who fail. The ONLY argument for a higher failure rate is somehow this is “higher” risk than the average PACE B patient and really that argument just doesn’t work - 30% PSA>10, 44% T2c, 80% 3+4.

If that is our first puzzle piece, let’s grab a few others:

We’ll grab data from Decipher and Artera. I pick those because I think they are supported with data as well as any. I use Decipher on occasion and I’ve written extensively on Artera. First Decipher:

Decipher Validation: RTOG 0126 - Abstract

I generally, just go to website and pull up the advertisement references - a good list of what they think is important. Here we have 5 to choose from. I went through them and I’ll just pick one of the stronger ones - a validation of Decipher in RTOG 0126 with big names: Spratt, Michalski, Berlin among others. We won’t even nitpick anything within the paper, we’ll just take the hazard rates, as presented and apply that data.

As a refresher - here are RTOG 0126 Results - context is always important:

The 8-year cumulative rates of distant metastases were 4% for the 79.2-Gy arm and 6% for the 70.2-Gy arm (HR, 0.65; 95% CI, 0.42-1.01; P = .05). The ASTRO and Phoenix biochemical failure rates at 5 and 8 years were 31% and 20% with 79.2 Gy and 47% and 35% with 70.2 Gy, respectively (both P < .001; ASTRO: HR, 0.59; 95% CI, 0.50-0.70; Phoenix: HR, 0.54; 95% CI, 0.44-0.65).

RTOG 1026 Outcomes: Phoenix failure by dose.

Notice that globally the study demonstrates about 1 in 5 failures develop metastatic disease (similar in both arms). And further, even with 35% failing, only 6% developed metastatic disease in the 70.2Gy arm. And yes, about 84% were Gleason 7 so a pretty good comparative group.

Ok: now the validation from this shows:

The hazard rate for the test to predict distant metastatic disease was 1.28 (1.06-1.54).

Again this is an analysis of 215 patients from within the initial trial of 1532, presenting the hazard rate of Decipher “high” risk vs. “low” risk via a dichotomous look at a continuous variable. Exactly how reproducible that hazard rate might be could be the topic of multiple articles. But, here today, we’ll give it the “full” power of 1.28 per the paper.

Ok that is one puzzle piece.

Lets jump to Artera.ai. Again, we’ll go quickly. This one is validated against RTOG 9408 and if you want far more nuance beyond this quick summary - I have two articles that represent a far deeper dive:

Artera AI Discussion: PART 1, PART 2.

Yep, I spent several weeks on this evaluation and my subsequent write up of the product trying to consider how powerful and valid this approach might be.

Again, good to review context of RTOG 9408 outcomes:

RTOG 9408 trial outcomes - biochemical failure

Using this AI approach, 2/3rds of the men see no benefit to ADT while 1 in 3 men see a benefit to ADT. (but remember 1 in 3 are failing). Here are the model outcomes for DM risk stratified by this predictive model.

Ok so now we have three pieces to the puzzle - two genomic options you can order in your clinic and the reality of outcomes from randomized MODERN prostate cancer trial.

Let’s do some puzzlin, ie basic math:

Remember at this point, I’m giving each test basically the data they promote as published. I wrote two articles specific to Artera approach if you are interested in the nuance, but today, we’ll stay simple and just give them the full benefit of the doubt.

Each test calls about 1 in 3 men with intermediate risk disease “high risk” - Decipher might be closer to 30% but they are in a 1 in 3 ballpark.

So if you take all your intermediate risk disease patients and just order these extra approaches lets just say that 30 men return with an indication to intensify treatment, ie 30 men per 100 will now be on ADT. (rounding down benefits the NNT analysis in favor of the test).

What benefit does your patient population experience - potentially?

That is the math question - we’ll break it down several ways.

We have about 5.5% of men at 10 years having a risk of failure. Historical data says about 1 in 5 failures will be metastatic in nature. This puts us around 1.1% of men in PACE B will have metastatic disease out at 10 years.

We’ll start with Decipher and just keep things very simple. Applying the 1.28 HR, we can drop the 1.1 to 0.85 - so an effective improvement of 0.25 per 100 men test or per 30 treated with ADT.

So test 400, give ADT to 120, to benefit 1.

What about Artera. It produces a HR of 0.33. Again for today, we’ll just use the marketed result at face value and be very crude with the math. 1.1% falls to 0.36%. A gain of 0.74%.

Test 135, give ADT to 45, to benefit 1.

Hmmm… Yes those are ballparks but, lets think of it some other ways we can consider getting to a realistic answer.

Let’s say we take the 3.5% failure rate and double it to 7% (a clear over estimate but I’m trying to give these things a chance.)

7% and 1 in 5 develop metastatic disease - 1.4%. Ok, lets be generous and round up to 2% just ‘cause. So say 2 men per 100 in PACE B MIGHT development metastatic disease - remember this is a generous “double rounded up” estimate.

So if you take the stronger hazard rate of 0.33 from Artera and apply it to a 2% metastatic rate more precisely, you might parse that cohort into two cohorts of ~2.5% and ~0.8%. (HR is ~0.33 while maintain average risk of 2%) So generously we have a 1.7% effect on metastatic rate.

This gives us testing 100, ADT for 33 to get 1.7% potential benefit. Still about 60 tests with 20 ADT courses for 1 to benefit. And this is amazingly generous.

Anyway you slice it, this appears to be rather small odds of benefit even using what are really the optimistic evaluations of the test. To me, it is far more likely a poor use of resources resulting in a gross over intensification of treatment in a population that likely just doesn’t need extra stuff done. And if you pair that with the simple acknowledgement from the Artera paper that only 1 in 3 men with Unfavorable Intermediate Risk benefit from ADT, it makes sense that it is a fraction of that in more favorable risk disease and again, a fraction of that with diminishing failures.

And I didn’t even discuss salvage options - SBRT to nodes or newer imaging that will certainly find the recurrences at lower volumes of disease with patients having better and better salvage options. In seven years it is likely that data supports salvage with 1-3 fractions of radiation for the majority of the 1%-2% of men that fail distantly. The realistic answer is very likely that we are in a 40-100 needed to treat with ADT to benefit ~1 person for the FUTURE INITIATION of ADT difference between the two arms. Sure, you might avoid one failure (a bad outcome for the patient just due to mental stress), but when failure means less over time, it really becomes a small difference if you back up and consider the extra ADT treatment given (range above 20 - 120 NNT with testing required at 3x that figure).

This kind of math doesn’t make sense to me.

As always author of one, editorial review board of that same one. If I made a gross mis-step, please comment or reach out.

I wrote this summary back in May - in reference to the release of RTOG 0815.

RTOG 0815: Dose Escalation and ADT. Questions answered?

Mark Storey MD

May 21, 2023

Read full story

Here is an excerpt from that article that seems quite pertinent:

Two Sides of One Coin:

Here’s how I reconcile the two perspectives: the one in the editorial vs. the one which is labeled as “definitively proved wrong.”

There appears to be hazard rate improvement across all disease with ADT. I think of this way - there are low risk cases that we simply misclassify - missing the high risk features or something. Nothing is perfect. Therefore there are some cases - either for missed known or currently unknown factors that need more than just local treatment. Or perhaps there are cases that need ADT to modify the radiation impact for some cancers. But simply, some lie within a box of unknowns.

And this leads to one focus of work - an attempt to use genetics or AI histology or something different, to solve the genetic / metastatic / ADT radiation sensitizing component to the issue. And this path likely will demonstrate some real value. Across cancer, with improvements in technology, we are constantly improving our precision along a similar path / approach - be it better imaging or updates to our staging models or new prognostic tests - we continue to generate progress via this approach.

The counter to this argument is to use radiation specific technology to push on dose and right sizing the target - essentially to shrink the size of the black box of unknowns. Stated differently, I’d like to make the work in finding the genetic difference as difficult as possible by using brute force to push the number at risk for failure as low as possible. If 30 people in 100 fail, the cost / benefit ratio for any prognostic test is magnitudes easier to achieve than if 5 in 100 fail. Will it make the other approach obsolete? No. But we can make it far less impactful. I think of it as an argument in favor of our field and in favor of the value we provide.

One approach leans on radiation a bit more, one leans towards tech that is generally beyond our field a bit more.

Yep, I still like that line of thinking.

Summary:

In my PACE-B summary, I tried to describe that trial as more than an SBRT trial, it is a victory for radiation trial. 95%-96% cured at 5 years post-treatment.

PACE B: 95% Cure Rate at 5 yrs!!

Mark Storey MD

October 15, 2023

Read full story

Simply phenomenal!!

And further, it makes sense. Condense the treatment and toxicity goes up a bit but importantly, it is now measured and for many, minor. You can have clear conversations with your patients today - maybe clearer following the full publication but really granular conversations today regarding shifts in fractions.

And because PACE-B results are so darn strong, it impacts other decisions. That is what I’ve tried to touch upon today. Don’t think of it as just an SBRT trial. It is a standard of care setting outcome trial for radiation therapy expectations and it has implications for your discussions regarding surgery and carries implications for things like Decipher, further dose escalation above 40 Gy and, yes, your ADT usage patterns.

Hazard rates, numbers at risk, math, statistics - they are all important to understanding our literature and our science. And realistically, you can make them more complex or approach them more simply as I like to do with simple stories or examples. And yes, different people will land on different topics with different risk / benefit perspectives. That is great, but I’d encourage everyone to be data driven in your practice - not driven by marketing. Listen and consider but verify. Not just reading the abstract or headlines or marketing material, but know the context of the underlying study(ies) where the test was validated.

And once that work is done, if you can add in compassion and caring for individuals, good discussions of risks and benefits with your patients, along with patience and understanding, then, at least from my perspective, you have the opportunity to be truly impactful in the lives of those you care for.

Thanks as always for following along as we search for better. In the next few weeks, we return to our payment model in the US, have a look at a new SABR indication with 100% control, and hear a story from one of the founders of high dose / limited fraction radiation. At least that’s the plan.

Share Protons 101

Protons 101