When No Answer Is Correct: Diagnosing Absent Answer Detection for MLLMs in Video Understanding

arXiv:2606.08239v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have made substantial advancements in video understanding, yet the reliability of their responses remains underexplored. This work presents a diagnostic study of absent answer detection for MLLMs in video understanding, where the correct answer is deliberately excluded from the candidate set and a reliable model is expected to recognize that no valid option exists. We evaluate the absent answer detection behavior under three settings: multiple-choice questions augmented with an ``None of the Above'' option
The rapid advancement and deployment of Multimodal Large Language Models (MLLMs) necessitate a deeper understanding of their reliability and failure modes, especially before widespread high-stakes applications.
Reliable MLLM performance in scenarios where no correct answer exists is critical for preventing misdirection, maintaining trust, and ensuring safe autonomous decision-making in complex environments.
This research provides a diagnostic framework to systematically evaluate and improve MLLMs' ability to recognize when no valid option is present, moving beyond simple accuracy metrics to assessing their understanding of ambiguity.
- · AI safety researchers
- · MLLM developers and evaluators
- · Video analytics companies
- · Undiagnosed MLLMs in critical applications
- · AI systems lacking robustness in 'no answer' scenarios
Improved MLLM robustness by incorporating 'absent answer detection' into training and evaluation protocols.
Increased trust in MLLM applications, particularly in fields requiring high precision and safety, such as autonomous systems or medical diagnostics.
The development of a new class of AI models intrinsically designed to quantify and communicate their own uncertainty and limitations, leading to more transparent and explainable AI.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI