Assessing the Promise of AI in Oncology - Episode 5

Assessing the Promise of AI in Oncology: Call for Evidence

November 27, 2023

Toufic A. Kachaamy, MD, Douglas Flora, MD, LSSBB, FACCC

Partner | Cancer Centers | <b>City of Hope</b>

In this fifth episode of OncChats: Assessing the Promise of AI in Oncology, Toufic A. Kachaamy, MD, and Douglas Flora, MD, LSSBB, FACCC, discuss the need for evidence to support the utilization of different artificial intelligence tools in healthcare.

EP. 1: Assessing the Promise of AI in Oncology: Definitions and Goals

EP. 2: Assessing the Promise of AI in Oncology: Potential for Precision Medicine

EP. 3: Assessing the Promise of AI in Oncology: A New Journal

EP. 4: Assessing the Promise of AI in Oncology: A Diverse Editorial Board

Now Viewing

EP. 5: Assessing the Promise of AI in Oncology: Call for Evidence

EP. 6: Assessing the Promise of AI in Oncology: Potential Role in Cancer Screening, Diagnosis, Staging

EP. 7: Assessing the Promise of AI in Oncology: Improving Patient Interactions and Experience

EP. 8: Assessing the Promise of AI in Oncology: Matching Patients With Clinical Trials

EP. 9: Assessing the Promise of AI in Oncology: Looking to the Future

In this fifth episode of OncChats: Assessing the Promise of AI in Oncology, Toufic A. Kachaamy, MD, of City of Hope, and Douglas Flora, MD, LSSBB, FACCC, of St. Elizabeth Healthcare, discuss the need for evidence to support the utilization of different artificial intelligence (AI) tools in healthcare.

Kachaamy: Let’s shift a little bit to something that is not just oncology based, but [has to do with] healthcare in general. With fast-changing technology like artificial intelligence, interacting with healthcare, which is a more conservative slow-moving area, one of the problems we are facing is that technological advances change fast but the impact on human beings and [their] biology [can take] decades. How do we measure the impact of fast-changing technology on clinical outcomes, knowing that clinical outcomes might take years and by then, the technology might have had multiple generations of evolution? Do you have any thoughts on that?

Flora: That’s a vexing problem, right? We, as providers, can’t afford hallucinations in our ChatGPT because lives depend upon it. [As such,] I think that we’re going to hold these tools to the same academic rigor that we do drug choices, radiation choices, or other [choices]. I think the promise is enormous [but] we have to do it responsibly, and there are patients’ lives that will be saved if we can introduce these [tools] carefully but quickly.

[That said,] I do think the level of evidence that’s going to be required is probably a little bit more dependent upon the tools being deployed. For instance, it’s not difficult for us to prove that we’ve given time back to doctors by removing documentation burdens; the tools already exist, there are many companies that are doing voice recognition, that are taking conversations and translating them into actionable notes and placing orders for MRI, CT, or other [tests], or are writing letters for reference for our graduating fellows, or are doing peer-to-peer reviews.

So, those rote mechanical things don’t need a lot of evidence; those are efficiencies that we’re finding that we can institute quickly to give doctors time back in the rooms with their patients. The things I think you and I are more concerned about are how do we prove the validity of a tool if we’re asking to make decision analytics, if we’re having [these tools] guide drug selection or therapeutics? There are tools that are being developed to validate these, as well. I think that’s what this journal is about; it’s to say, “Okay, well, if you’re going to tell me that the tool designed by PathAI is better than an academic pathologist in identifying prostate cancer in a hematoxylin and eosin slide, I want to see the data and I would like to see how you train the module and I’d like to understand the increases in accuracy, and I want to know the false-positive rate.” All [this information] should be published in peer-reviewed journals. I don’t know if this is common knowledge, but there are about 300 papers in AI and medicine produced each day and there hasn’t been a repository where those things can be shared with an editorial staff to really dig in deeply to make sure it’s legitimate, and that’s what we want to do. Rather than taking the word of the salesforce from some vendor who may be well intentioned, well educated, and well informed, we need to make it evidence based before I use it in my clinic.

Kachaamy: Where do you see the world of having the highest level of evidence [and] meta-analysis of randomized controlled trials [RCTs], fitting with a new tool in artificial intelligence? [For example, we know that] even [with] colonoscopy, we’ve seen studies designed 10 years ago come up with outcomes now and they’re being criticized [with regard to their] methodology. If the tool has changed 10 times, can we rely on surrogate markers to get outcomes? Do we still need the RCTs when we’re looking at AI? What are your thoughts on that? I know that there might not be an answer.

Flora: Right. We could talk for an hour about just that, and people are. I think there will be a lot of [discussion] about this. Obviously, you know, RCTs are critical, important, and incontrovertible when it comes to evidence-based decisions for the safety of an intervention. I think we try to adhere to that as much as possible. When you talk about real-world evidence or surrogates, I think there’s some promise there in areas where you can’t safely or efficiently conduct a trial and you can’t wait for those answers. You need those [answers] as quickly as you can to save lives.

I do think, again, it depends upon the AI tool. If the tool is being used in a form of diagnosis that supplements rather than replace one of our existing standards, then maybe the work is easier to prove and you’re not replacing a standard of care [SOC], you’re augmenting a SOC. Even then, the control groups get what we’re used to giving but perhaps in a lung cancer screening example, the analysis of the 3D images by the pattern recognition of this deep-learning module could be tested prospectively next to a control group of CT radiologists who are reading those CT scans to see whether we added to the accuracy, reduced the false positivity, or found nodules when they were even in their more nascent early footsteps. I think that’s publishable; I think that is trackable, and I think it’s important that we do so, even if it slows down progress a little bit. It just can’t slow things down for 10 years. We can’t wait that long.

Check back on Monday for the next episode in the series.