ChatGPT: It crushed doctors on patient questions, so can it replace Protons101?
A new publication showed physicians preferred chatbot answers to physician answers - which begs the question, do I really need to be writing this stuff?
protons101.com
Home to the musings of a radiation oncologist - with a slant on protons and dose and optimizing cancer outcomes.
That’s right, a new study out today clearly shows that for general questions posted on the internet, a chatbot gives better answers than physicians (ref 1). And the voting was performed by licensed healthcare providers:
Here are just two examples if you haven’t seen the comparisons
Gotta admit - yep. I agree with the results here. Chatbot did pretty well.
I found it quite fascinating. So today, we turn to ChatGPT and see if it can just crank out these articles that I’ve been working on. Hopefully it doesn’t just crush me, but this will be a relatively “live” look. A quicker, lighter article on a Friday.
I started by trying to pick a pretty good topic for the ol’ chatbot. I went back through my list of articles and landed on my non-inferiority piece - that seemed like a reasonably fair starting point.
And so we begin:
can you tell me about non-inferiority trials?
Yes, I can definitely help you with that!
Non-inferiority trials are a type of clinical trial that aims to show that a new treatment is not worse than an existing treatment by more than a pre-specified margin of difference.
These trials are typically conducted when a new treatment has potential advantages over an existing treatment, such as improved safety or tolerability, but may not be expected to provide a significant benefit in terms of efficacy.
To conduct a non-inferiority trial, researchers typically select a margin of difference that represents the maximum amount of inferiority that they are willing to accept between the new and existing treatments. This margin is often referred to as the non-inferiority margin.
The trial then compares the new treatment to the existing treatment, typically using a placebo or active control group as a reference. If the new treatment is shown to be not worse than the existing treatment by more than the pre-specified margin of difference, then the trial is considered a success.
Non-inferiority trials are used in many areas of medicine, including oncology, infectious diseases, cardiovascular disease, and psychiatry. However, it is important to note that non-inferiority trials can be challenging to design and interpret correctly, and require careful consideration of factors such as sample size, choice of endpoint, and statistical analysis methods.
Nicely done - not full of flavor but seems good and cranks out text quickly. Let’s try and get some critiques of this approach which was a primary point of my article.
what are some of their problems?
Non-inferiority trials have some inherent problems that can make them challenging to design and interpret correctly. Some of these problems include:
Choice of non-inferiority margin: Choosing an appropriate non-inferiority margin is critical to the success of a non-inferiority trial. If the margin is set too wide, it may be difficult to conclude that the new treatment is truly not worse than the existing treatment. On the other hand, if the margin is set too narrow, it may be too difficult for the new treatment to meet the non-inferiority criteria.
Choice of comparator: The choice of comparator in a non-inferiority trial can also be challenging. The comparator must be an active treatment that is known to be effective, but not necessarily the best available treatment. Selecting an inappropriate comparator can lead to a biased estimate of the non-inferiority margin.
Sample size: Determining an appropriate sample size for a non-inferiority trial can be challenging. The sample size must be large enough to provide adequate statistical power to detect a difference between the new and existing treatments, but not so large that the trial becomes unnecessarily expensive or time-consuming.
Interpretation of results: Interpreting the results of a non-inferiority trial can be difficult, as the trial is designed to show that the new treatment is not worse than the existing treatment, rather than demonstrating superiority. As a result, it is important to carefully consider the clinical relevance of the non-inferiority margin when interpreting the trial results.
Regulatory issues: Non-inferiority trials may also face regulatory challenges, as some regulatory agencies may be hesitant to approve a new treatment that is not superior to an existing treatment. As a result, it is important to carefully consider the regulatory landscape when designing and conducting non-inferiority trials.
Again, a bit dry but reads pretty well I must say and the answer was actually formatted prettier but that didn’t copy here well. And probably I could ask it to jazz it up a bit and add more commentary and it would - quickly. But we move on instead.
Next, I ask a question about non-inferiority trials establishing standard of care in radiation and it gives two examples. Breast cancer - whole breast vs. partial breast radiation and prostate cancer looking at hypofractionated approaches.
It was a more general answer and realistically, spot on. I think those are two great examples of non-inferiority design within radiation oncology. And for a moment I’m thinking: “every time I write something, I need to be talking to this bot to look for strengths / weaknesses. I mean I really need to be integrating this into my daily workflow somehow.”
So with question four I ask a little bit more nuanced question just to see where it would go.
are there any current non-inferiority trials in radiation going on for head and neck cancer?
There are ongoing non-inferiority trials in radiation oncology for head and neck cancer. One example is the RTOG 1016 trial, which is a phase III non-inferiority trial comparing cetuximab and cisplatin as concurrent chemotherapy with radiation therapy in patients with locally advanced head and neck cancer. The trial aims to show that cetuximab is non-inferior to cisplatin in terms of overall survival.
Another ongoing non-inferiority trial in head and neck cancer is the DE-ESCALaTE trial, which is a phase III non-inferiority trial comparing cetuximab with cisplatin as concurrent chemotherapy with radiation therapy in patients with human papillomavirus (HPV)-positive oropharyngeal cancer. The trial aims to show that cetuximab is non-inferior to cisplatin in terms of overall survival.
There are also ongoing non-inferiority trials in head and neck cancer that are exploring different radiation therapy approaches. For example, the HNCG18 trial is a phase III non-inferiority trial comparing intensity-modulated radiation therapy (IMRT) to volumetric-modulated arc therapy (VMAT) in patients with head and neck cancer. The trial aims to show that VMAT is non-inferior to IMRT in terms of toxicity and disease control.
Overall, non-inferiority trials continue to be an important tool in radiation oncology research for head and neck cancer, as well as for other types of cancer. These trials are critical for identifying treatment approaches that provide comparable efficacy and safety to existing treatments while offering other advantages such as reduced toxicity or cost.
Wow. This all seemed reasonable and in fact, quite impressive. All seemed to flow nicely and it spits out text with such speed. From an impression standpoint, this was it - the high water mark. It is simply cruising and I’m pretty impressed with it reeling off trials and giving easy to read text.
And so I proceeded.
This time I went a little deeper and asked it about the MD Anderson Phase IIb prospective trial comparing IMRT to proton therapy (this is my next to publish article and I thought - need to see what it thinks) It’s been out since 2020 and I thought maybe it would know about it - maybe not, but maybe. It is a specific question to a specific reference and I just wanted to see how recent the database goes.
do you know about the phase IIb esophageal cancer trial comparing IMRT to proton therapy?
I do not have knowledge of any specific phase IIb trial comparing intensity-modulated radiation therapy (IMRT) to proton therapy for esophageal cancer. However, there have been several clinical trials comparing these two radiation therapy techniques for various cancer types, including esophageal cancer.
One example is the Proton BEam Radiation Therapy versus IMRT for Esophageal Cancer (PROBE) trial, which was a phase III randomized controlled trial comparing proton beam therapy with IMRT for patients with locally advanced esophageal cancer. The trial aimed to assess the efficacy and toxicity of the two treatments and to determine whether proton beam therapy offered any advantages over IMRT.
The PROBE trial found that there was no significant difference in overall survival or disease-free survival between the two treatment groups. However, patients who received proton beam therapy had significantly lower rates of severe toxicity, particularly with regards to the esophagus and lungs.
While I do not have knowledge of a specific phase IIb trial, ongoing research continues to evaluate the effectiveness and safety of different radiation therapy techniques, including IMRT and proton therapy, for the treatment of esophageal cancer.
“Wait… what? A PROBE trial? A phase III completed that I don’t know about?”, well this isn’t good I thought - I should know this. So I read the findings the AI reports: the results here seem to mirror the phase IIb trial - OS and DFS the same but less toxicity. “Wow - why don’t I know this.” I wonder to myself.
So I go looking around… and I find nothing.
But it said, “there have been several clinical trials” so I asked:
what is another phase III esophageal cancer trial comparing protons to photons?
One example of a phase III esophageal cancer trial comparing proton therapy to photon therapy is the Randomized Trial of Proton vs. Carbon Ion Radiation Therapy in Patients With Esophageal Cancer (Gastrointestinal Tumor Study Group-1203, NCT01528626).
This trial is designed to compare the efficacy and safety of proton therapy and carbon-ion therapy (a type of particle therapy) with photon therapy (IMRT or VMAT) for the treatment of locally advanced esophageal cancer. The trial aims to enroll 360 patients who will be randomized to receive either proton therapy, carbon-ion therapy, or photon therapy.
The primary endpoint of the trial is progression-free survival at 24 months, and secondary endpoints include overall survival, local control, and toxicity. The trial will also evaluate quality of life and cost-effectiveness of the different treatments.
This trial is still ongoing, and final results are not yet available. However, it is an important study that will help to further our understanding of the potential benefits and limitations of different radiation therapy techniques for the treatment of esophageal cancer.
“Ok - clearly errors are here”, as I try to step out of AI bot haze. “The title is Proton vs. Carbon so this wouldn’t be a proton vs. photon trial. But really? There is a proton vs carbon vs. IMRT phase trial III trial I don’t know about?”, still almost trying to make it true.
Fortunately for me and my sanity, this one begins to be easy to track down and verify as wrong - it gives a study number of 1203 - closest I find is EORTC-1203-GITCG - the "INNOVATION"-trial - it’s a chemotherapy esophageal trial - so ah wrong - certainly not a proton vs. photon trial. Well it has a NCT number as well, so I head to ClinicalTrials.gov and perform a quick search:
So ah.. strike two. Back to the prompt:
I can't find the PROBE trial you reference, can you give me a link or protocol reference number?
I apologize for the confusion. It appears that there is no trial named "PROBE" that compares proton beam therapy with IMRT for esophageal cancer.
However, there have been several trials that have compared proton beam therapy to other radiation therapy techniques for esophageal cancer. One example is the randomized phase II/III trial comparing proton beam therapy with intensity-modulated radiation therapy (IMRT) for esophageal cancer (NCT01447654).
Another example is the randomized phase II/III trial comparing proton therapy with chemoradiation therapy for esophageal cancer (NCT01993810).
I apologize for any confusion my previous response may have caused, and I hope this information is helpful.
“Ok.. maybe this time it did find Steven Lin’s trial publication”, I think to myself - seriously - this is what I thought. We’ve been so trained to believe what we type into these prompts that it is a terribly difficult habit to break.
A quick search of “NCT01447654” returns:: “Inhibition of the Renin Angiotensin System With Losartan in Patients With Hypertrophic Cardiomyopathy (INHERIT)”.
Goodness, that isn’t even close.
“Well, maybe that last reference is correct, I mean it does seem to be trying and the chatbot is rather empathic.” I think as I fall again for its wordsmith trickery. A search for “NCT01993810” lands on: “Comparing Photon Therapy To Proton Therapy To Treat Patients With Lung Cancer.”
For a second, when I started reading the title I thought - “wow”. But then nope. Lung cancer and esophageal cancer are, in fact, quite different.
My summary using what I learned from ChatGPT today:
DE-ESCALATE! PROBE! INNOVATION! and INHERIT! - Yes!! The AI path to nowhere!
Just follow the made up NCT numbers, the made up references, and if you use this thing, hope that you have the knowledge base to discern when it goes of the rails and just begins to make stuff up.
Scary times in a way. The line between science and fiction continues to be blurred and the level of knowledge required to differentiate between the two is being raised. We have been conditioned to believe what we see if it “looks” reasonable. And this thing spits out reasonable all day long - possibly wrong, but written with phrasing to make it “seem correct”.
What a bad experience. I worry that so very much information in journals will be simply inaccurate as we move forward - large sections of introductions and discussions made up via a bot - it will sound correct and plausible, but many times it will be incorrect or lack context. Hopefully I’m wrong. For me this taught me to place more trust on specific authors that I know. And then do even more work performing my own research and review of the literature. Not sure what else to do.
For the moment, looks like I’m going to keep writing these things.
www.protons101.com
Home to the musings of a radiation oncologist - with a slant on protons and dose and optimizing cancer outcomes.
REFERENCES:
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum
https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2804309Randomized Phase IIB Trial of Proton Beam Therapy Versus Intensity-Modulated Radiation Therapy for Locally Advanced Esophageal Cancer
https://ascopubs.org/doi/10.1200/JCO.19.02503
This is the second article I have seen about AI improperly citing articles or pointing to articles that don’t exist. It seems like it isn’t able to cite just one source when it synthesizes an answer to a deeper question that requires references. To me it looks like it may be splicing sentences together from multiple sources into something that appears to be coherent, but isn’t necessarily factual. I wonder if it can differentiate between old research and new research that has been updated and if it would mix the results from both studies.