How to Make Clinical Decisions With AI in Oncology

Travis J. Osterman, DO;

How to Make Clinical Decisions With AI in Oncology

March 27, 2024

Travis J. Osterman, DO

Oncology Fellows, Vol. 16/No. 1, Volume 16, Issue 1

In Partnership With:

Partner | Cancer Centers | <b>Vanderbilt-Ingram Cancer Center</b>

Speculation on the impact of large language models or artificial intelligence in the future of health care is commonplace, causing excitement and concern.

"Any sufficiently advanced technology is indistinguishable from magic.”¹

Arthur C. Clarke’s third adage is frequently quoted and may apply to the first experience one has with contemporary large language models (LLMs) like OpenAI’s ChatGPT 4.0,² Google’s Gemini,³ or Meta’s Llama 2.⁴ Depending on the prompt, these tools may appear to have human-like intelligence or a magical understanding. Speculation on the impact of LLMs or artificial intelligence (AI) more broadly in the future of health care is commonplace,^5-7 causing excitement and concern about their use in health care settings.

Here we take a pragmatic approach to the near-term opportunities and risks to AI and LLMs in oncology. Specifically, we will focus on a few core ideas every oncologist should understand to make decisions about using these tools in their practice.

Brief History of AI and LLMs

AI and LLMs represent an incremental advancement in computer science and statistics. The phrase “risk factor” is ubiquitous in medicine but was coined following the groundbreaking Framingham study.⁸Today, we hardly consider calculating a likelihood ratio to determine the accuracy of a physical exam predicting a phenotype or using a student t test to understand the difference in response between 2 populations.

Machine learning is broadly classified as a method to use a set of parameters to predict an outcome. In statistics, we call those parameters independent variables; the outcome is the dependent variable. In computer science, we use features to predict labels. The idea is consistent. Current LLMs may contain trillions of parameters. These parameters are used both to understand the prompt and to generate a natural language response to the prompt, which is why these models are also deemed as “generative” AI. The term “model” encompasses the black box that has been trained and is now ready to make predictions on new data.

Clinical Utility

The ability of AI and LLMs to impact oncology care delivery is unclear. Some believe that AI will more efficiently make care decisions with fewer mistakes. I feel that this will be unlikely in the near term. As mentioned above, each model must be trained on large datasets with labeled outcomes. Three things result from this:

Those models will be, at best, only as good as the care delivered in the training set.
Rare scenarios will be challenging to train models because of the lack of data.
Fields that advance quickly, such as oncology, will pose additional challenges as treatment recommendations based on past practice, but not incorporating recent clinical trial results, will be erroneous. Opportunities remain to improve notewriting, prior authorization, patient-facing health record portals, and others.

Risks

The 2 primary risks to AI and LLMs are confabulation and probabilistic prediction. Confabulation occurs because the model may give a result that appears to be factual but is entirely fictitious. For instance, an attorney was fined in 2023 for submitting case law that was erroneously produced by an LLM.⁹ In oncology, we have no tolerance for models incorrectly citing results of clinical trials or, worse, making up a clinical trial that did not occur. A second risk to AI is probabilistic prediction. When predicting an area under the curve, the same inputs will always yield the same output: deterministic output. With many of the AI models in this space, the same prompt run many times may provide very different outputs. We have not yet determined the best methods to offset this.

Cost

Simple clinical decision support, like drug interactions before prescribing chemotherapy, can be calculated in the electronic health record (EHR). Many AI models have a pay-per-query model. While EHR vendors are rapidly working to add these features, the financial impact of this is unclear as more AI models are included in clinical practice.

Summary

AI and LLMs will continue to be incorporated into the oncology practice. Like the invention and subsequent adoption of the EHR, these models will solve old problems and create new ones. The rising generation of oncologists should stay apprised as these fields grow to understand the trade-offs between opportunities and risks. I feel these tools will help us deliver better patient care, but it will take longer to optimize their benefits than many people think.

Travis J. Osterman, DO, is a medical oncologist, informatician, associate vice president for Research Informatics at Vanderbilt University Medical Center, as well as director of Cancer Clinical Informatics at the Vanderbilt-Ingram Cancer Center in Nashville, Tennessee.

References

Clarke AC. Profiles of the Future: An Inquiry Into the Limits of the Possible. Holt, Rinehart, and Winston; 1984.
OpenAI. GPT-4 is OpenAI’s most advanced system, producing safer and more useful responses. Accessed February 2, 2024. https://openai.com/gpt-4
‎Google. ‎Bard: a conversational tool by Google. Accessed February 2, 2024. https://bard.google.com
Meta. Discover the power of Llama. Accessed February 2, 2024. https://llama.meta.com
Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29(8):1930-1940. doi:10.1038/s41591-023-02448-8
Sallam, M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare (Basel). 2023;11(6):887. doi:10.3390/healthcare11060887
Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47(1):33. doi:10.1007/s10916-023-01925-4
Dawber TR, Moore FE, Mann GV. Coronary heart disease in the Framingham study. Am J Public Health Nations Health. 1957;47(4 Pt 2):4-24. doi:10.2105/ajph.47.4_pt_2.4
Neumeister L. Lawyers submitted bogus case law created by ChatGPT. A judge fined them $5,000. Associated Press News. June 22, 2023. Accessed February 2, 2024. https://apnews.com/article/artificial-intelligence-chatgpt-fake-case-lawyers-d6ae9fa79d0542db9e1455397aef381c

Download Issue: Vol. 16/No. 1