Universal Boosts, Specific Suppressors: Sparse Autoencoder Steering of Medical Vision-Language Models

arXiv:2605.24977v1 Announce Type: cross Abstract: Medical vision-language models (VLMs) often hallucinate findings when generating chest X-ray reports: they fabricate findings that are not present in the image, miss important ones, or locate them incorrectly. We mitigate this without weight updates by decoding-time residual steering on a per-token sparse autoencoder (SAE) basis: Top-$K$ SAEs on late layers, causal steering against clinical errors, then combined suppress/boost intervention at inference time. On the MIMIC-CXR test split, our inference-only method improves the quality of generate
The proliferation of powerful large medical Vision-Language Models (VLMs) and the increasing complexity of their outputs necessitate advanced methods for control and error mitigation, making research into steering mechanisms timely.
This development offers a novel, inference-time method to improve the reliability and safety of AI in critical applications like medical diagnosis, directly addressing a major hurdle for clinical adoption.
Clinical diagnostic AI systems can now be made more robust against hallucinations and errors without requiring extensive retraining, accelerating their path to deployment and trustworthiness.
- · Healthcare AI developers
- · Medical diagnostic companies
- · Patients
- · AI safety researchers
- · Companies relying solely on black-box VLM deployment without error mitigation
Improved accuracy and reduced hallucination in medical VLM outputs for chest X-ray reports.
Accelerated integration of AI into clinical workflows due to enhanced reliability and trust.
The methodology could generalize to other high-stakes AI applications beyond medicine, enabling more controllable and safer AI systems across industries.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL