Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation

arXiv:2510.18439v3 Announce Type: replace Abstract: Hallucination, where models generate fluent text unsupported by visual evidence, remains a major flaw in vision-language models and is particularly critical in sign language translation (SLT). In SLT, meaning depends on precise grounding in video, and gloss-free models are especially vulnerable because they map continuous signer movements directly into natural language without intermediate gloss supervision that serves as alignment. We argue that hallucinations arise when models rely on language priors rather than visual input. To capture thi
The proliferation of vision-language models makes the challenge of hallucination increasingly critical, especially as these models are applied to high-stakes domains like sign language translation.
Addressing hallucinations in vision-language models is crucial for their reliability and adoption in sensitive applications where accuracy and grounding in real-world data are paramount.
This research highlights a method to detect and potentially mitigate hallucinations in sign language translation, moving towards more trustworthy and visually grounded AI systems.
- · AI researchers in vision-language models
- · Developers of sign language translation technologies
- · Deaf and hard-of-hearing communities
- · Developers of ungrounded or hallucinating vision-language models
- · Users relying on inaccurate AI translation
Improved reliability and trust in vision-language translation models, particularly for sign language.
Accelerated development of more robust AI models that deeply integrate visual grounding to prevent fabricated outputs.
Enhanced accessibility and communication for sign language users through more accurate and dependable AI tools, potentially leading to wider societal integration.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL