Respecting Modality Gap in Post-hoc Out-of-distribution Detection with Pre-trained Vision-Language Models

arXiv:2605.26661v1 Announce Type: cross Abstract: Out-of-distribution (OOD) detection has emerged as a popular technique to enhance the reliability of machine learning models by identifying unexpected inputs from unknown classes. Recent progress in pre-trained vision-language models (VLMs) has enabled zero-shot OOD detection without access to in-distribution (ID) training data; in this setting, existing methods commonly treat text embeddings of class names as class prototypes. In this paper, we challenge the widely adopted text-as-prototype paradigm by theoretically showing that off-the-shelf
The rapid advancement and widespread adoption of pre-trained vision-language models necessitate robust methods for identifying unexpected inputs to ensure AI reliability and prevent failures in real-world deployments.
Improving Out-of-Distribution (OOD) detection for VLMs is crucial for deploying reliable and safe AI systems, particularly in sensitive applications where unexpected inputs could lead to critical errors or security vulnerabilities.
The paradigm for zero-shot OOD detection in VLMs is shifting from a simplistic 'text-as-prototype' approach to more nuanced methods that respect the inherent 'modality gap' between vision and language embeddings, improving accuracy and resilience.
- · AI safety researchers
- · Developers of VLM-powered applications
- · Industries deploying AI in critical infrastructure
- · Machine learning models with poor OOD detection
- · Naive VLM deployment strategies
- · Applications vulnerable to adversarial attacks
Enhanced reliability and trustworthiness of AI models, especially in autonomous systems and sensitive decision-making.
Accelerated adoption of AI in high-stakes domains due to increased confidence in model robustness against unforeseen inputs.
New regulatory frameworks and certification standards for AI systems, placing a higher emphasis on OOD detection capabilities.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI