Giving Tuesday Giving Tuesday

Donate today and help mobilize high performing teams from 10 vital program areas as they train future leaders, provide passionate care, and support patients and their families in their toughest moments.

AI-powered medical recommendations biased and unreliable

November 27, 2024

A Sunnybrook study suggests that medical recommendations generated by popular generative artificial intelligence (AI) programs such as ChatGPT are heavily influenced by human-like biases that deviate from rational logic and lead to faulty medical decisions.

The study, published in the New England Journal of Medicine AI, found that pre-trained AI models display cognitive biases – described as the human tendency to deviate from logical thought, mathematical precision, and rational judgment – and that magnitude of bias can be larger than among practicing clinicians.

“Generative AI models do not have the capacity to think intuitively, yet might still exhibit human-like biases due to human calibration; therefore, replicating multiple pitfalls of reasoning,” says Dr. Donald Redelmeier, co-author of the new study, a staff internist, and a senior scientist at Sunnybrook Health Sciences Centre.

“They may be particularly prone to biases in medicine, where uncertainty and complexity are widespread. Furthermore, the allure of sophisticated algorithms and the immensity of medical knowledge can prevent unskilled users from detecting such biases. These biases are much harder to identify and correct than standard obscenities or sexism in AI programs.”

Generative AI chatbots are powered by large language models that can compose medical histories, produce differential diagnoses, formulate medical recommendations, and even pass licensing examinations, say the authors. These programs are sometimes consulted by patients searching for information and guidance on medical conditions.

“These capabilities emerge after applying computational resources to vast training quantities, and require no expert fine-tuning,” says co-author Jonathan Wang, a medical student at Sunnybrook Research Institute. “However, training texts are rarely neutral, not always factual, and have variable quality, ranging to include science fiction and love songs. Generative AI models, therefore, can inadvertently produce misinformation such as factitious therapies.”

The study also suggests there are notable differences between AI models in susceptibility to cognitive bias, and especially when compared with the recommendations of human clinicians; a common feature of AI models is that medical recommendations initially appear authoritative and sensible.

“Generative AI is increasingly being used for medical applications, and it’s important to be mindful of their limitations, and balance them with trained clinical interpretation,” adds Dr. Redelmeier, also a Canada Research Chair in Medical Decision Sciences and a professor in the Temerty Faculty of Medicine and Institute for Health Policy Management & Evaluation at University of Toronto. “Future enhancements might provide some safeguards against human-like biases in generative AI models, but more research will be needed to tell.”