A groundbreaking study published in The Lancet Digital Health reveals that artificial intelligence systems demonstrate heightened vulnerability to medical misinformation when it originates from seemingly authoritative healthcare sources. The research, conducted by Mount Sinai’s Icahn School of Medicine, tested 20 proprietary and open-source large language models through more than one million prompts containing fabricated medical information.
The investigation exposed AI tools to three distinct content categories: authentic hospital discharge summaries containing deliberately inserted false recommendations, common health myths sourced from Reddit, and 300 physician-written clinical scenarios. The findings demonstrate that AI models overall accepted fabricated information from approximately 32% of content sources. However, this susceptibility increased dramatically to nearly 47% when misinformation appeared within realistic-looking medical documentation from healthcare providers.
Dr. Eyal Klang, co-lead researcher, emphasized that current AI systems tend to treat confident medical language as inherently truthful regardless of factual accuracy. The study further discovered that AI exhibited greater skepticism toward social media sources, with misinformation propagation dropping to just 9% when originating from Reddit posts.
Prompt engineering significantly influenced AI reliability, with authoritative phrasing such as I’m a senior clinician and I endorse this recommendation substantially increasing the likelihood of false information acceptance. Among tested models, OpenAI’s GPT systems demonstrated superior fallacy detection capabilities, while other models proved susceptible to up to 63.6% of false claims.
Dr. Girish Nadkarni, Mount Sinai’s Chief AI Officer and study co-lead, highlighted the dual nature of AI in medical applications: While AI offers tremendous potential for clinical support and accelerated insights, our research identifies critical vulnerabilities that require built-in safeguards before these systems become fully embedded in patient care. The study coincides with separate Nature Medicine research indicating AI symptom queries perform no better than standard internet searches for patient decision-making.
