Study Shows ChatGPT Struggles With Patient-Specific Queries, Novel Therapies in Hematologic Malignancies

October 16, 2025

OncClub | <b>Assessing ChatGPT's Answers to Potential Patient Queries in Hematologic Malignancies</b>

A study found ChatGPT 3.5 was more likely to give incomplete or inaccurate answers to specific hematologic malignancy queries vs general questions.

ChatGPT version 3.5 struggled to provide current and specific information regarding patient-specific queries and novel therapies compared with more general inquiries about the management of hematologic malignancies, according to findings from a study published in Future Science OA.1

To conduct the study, researchers input questions from oncologists and reputable web resources into ChatGPT 3.5, and subsequent responses from the artificial intelligence (AI) platform were rated by hematology-oncology experts regarding accuracy and applicability for patients, ranging from a score of 1 (strongly disagree) to 5 (strongly agree). The 10 questions included general queries (n = 5) and highly specific questions regarding novel therapies and mutations (n = 5).

ChatGPT’s Hematologic Cancer Question-Answering Capabilities: Key Takeaways

When reviewed by hematology oncology physicians, ChatGPT’s responses to general questions about hematologic cancer received a higher average score (3.38) compared with questions regarding novel therapies and specific mutations (3.06).
None of the 10 questions asked of ChatGPT achieved a score of 5 (“strongly agree”), indicating that no topic met a strong consensus among reviewers as accurate, clear, and comprehensive enough for physician recommendation to a patient.
Because ChatGPT 3.5 was found to lack the latest information and accurate answers on specialized topics—often attributed to its knowledge cutoff date—physicians should vet and approve chatbot-generated information before it is provided to patients.

Findings showed that general queries received an average score of 3.38 from the 4 clinician reviewers compared with 3.06 for the highly specific questions (κ = 0.164). Notably, none of the questions received a score of 5 from any of the 4 reviewers. The mean score for 9 of the 10 questions ranged between 3.0 and 3.8. The lone question to fall below this average at 2.25 was, “How can I lower my measurable residual disease?”

“ChatGPT may lack the latest information and accurate answers on specialized topics due to its reliance on training data,” lead study author Tiffany Nong, a third-year medical student at Florida State University College of Medicine in Tallahassee, and colleagues wrote in a publication of the research. “Due to the fact that AI is malleable and these studies have shown that AI does not present 100% accurate or updated information needed to effectively and safely educate patients, a physician will always be needed, at least at this time, to approve AI information.”

Why Was ChatGPT Version 3.5 Selected for this Study?

In recent years, multiple updated versions of ChatGPT have been release for public use, including version 5.0 released in August 2025.2 During this study, questions were submitted to ChatGPT in July 2024, when ChatGPT version 3.5 was available to all users and version 4.0 was limited to paid subscribers.1

“ChatGPT 3.5’s knowledge cutoff of September 2021 likely contributed to outdated information on recent therapeutic advances, such as FLT3 inhibitors like midostaurin [Rydapt] and quizartinib [Vanflyta],” study authors wrote. “Machine learning models rely on training data, and when ChatGPT only has a small number of sources available, it may pull information from less reliable sources.”

How Was this Study of ChatGPT Conducted?

Researchers formulated questions spanning various hematologic malignancy topics, created in conjunction with a hematology oncologist and organization websites, including the National Cancer Institute and the American Cancer Society.

The questions were crafted to reflect the evolving needs of patients over the course of their treatment journey, with general inquiries reflecting common early questions and specific questions highlighting more nuanced needs.

The 10 questions were input into ChatGPT separately for each of the 4 reviewers to verify consistency, and the questions were submitted in new chats set to privacy mode to avoid bias in responses. The 4 reviewers—3 male and 1 female—were hematology oncologists with subspecialties in leukemias.

What Were the Limitations of the Study?

Authors noted that although the 10 questions reflected common issues posed by patients with hematologic malignancies, they represented a small sample of potential oncology-related questions patients may have. Additionally, the answers limited to ChatGPT version 3.5, and the authors noted that these results could not be generalizable across other versions of ChatGPT and other AI chatbots. They also acknowledged that answers were generated at a single time point in July 2024, which does not reflect how ChatGPT evolves with training.

They also noted that the phrasing of questions may not entirely reflect patient phrasing, as technical level and complexity may vary.

“As chatbots become more prevalent in cancer care, they may help triage routine patient questions and concerns, allowing oncologists to allocate more time or be more efficient during their visits. However, successful implementation requires protocols for physicians to effectively vet and approve chatbot-generated responses before they reach the patient,” study authors concluded.

References

Nong T, Britton S, Bhanderi V, Taylor J. ChatGPT’s role in the rapidly evolving hematologic cancer landscape. Future Sci OA. 2025;11(1):2546259. doi:10.1080/20565623.2025.2546259
Introducing GPT-5. Open AI. August 7, 2025. Accessed October 15, 2025. https://openai.com/index/introducing-gpt-5/