
arXiv:2605.28025v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to provide public-facing health information, yet existing safety evaluations overlook whether responses preserve comparable medical information across different user phrasings of the same question. To address this, we introduce the Medical Information Response Audit (MIRA), a bilingual, controlled benchmark that assesses whether LLMs provide comparable medical information across user-side language, register, and health literacy signals. MIRA contains 4,320 prompts built from 60 medically reviewed
The increasing public-facing use of LLMs for health information necessitates robust evaluation methods that current safety benchmarks overlook, specifically regarding consistency across user inputs.
This benchmark addresses a critical gap in LLM safety, ensuring that medical information provided by AI is reliable and consistent, regardless of how a user phrases their query or their level of health literacy.
The introduction of MIRA provides a standardized tool for auditing LLMs on medical information consistency, pushing developers to build more robust and equitable AI systems for healthcare.
- · Healthcare consumers
- · LLM developers investing in safety
- · Fair AI advocacy groups
- · LLM providers delivering inconsistent health information
- · Developers neglecting robust safety benchmarks
LLMs used in healthcare will face increased scrutiny and demand for consistent, high-quality medical information.
Greater investment will be directed towards developing LLMs with improved natural language understanding and multilingual capabilities for equitable healthcare access.
The benchmark could become a de facto standard, influencing regulatory frameworks for AI in health and potentially accelerating the adoption of specialized, verified medical LLMs over general-purpose ones.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI