CORA: Analyzing and bridging thinking-answer gap in Multimodal RLVR via Consistency-Oriented Reasoning Alignment

arXiv:2606.14691v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has successfully elicited the reasoning capabilities of large language models, motivating its extension to multimodal scenarios. Existing methods primarily focus on improving the visual coverage of reasoning traces and mitigating visual hallucinations, but underestimate the semantic inconsistency between the reasoning process and the final answer. In this paper, we delve into thinking-answer inconsistency in RLVR for large vision-language models (LVLMs), showing thorough analyses of rollouts c
The rapid advancement and deployment of large vision-language models (LVLMs) necessitates addressing critical issues like reasoning consistency to ensure reliable AI agent performance.
Improving the coherence between AI reasoning and answers is crucial for deploying more trustworthy and effective multimodal AI systems, particularly in sensitive applications.
This research highlights and begins to address a key limitation in multimodal reinforcement learning, paving the way for more robust and reliable AI agentic behavior.
- · AI researchers
- · Developers of multimodal AI agents
- · Industries adopting autonomous AI systems
- · Companies relying on inconsistent multimodal AI
- · Users experiencing unreliable AI outputs
More accurate and trustworthy reasoning in large vision-language models will emerge.
Enhanced reliability of multimodal AI agents will accelerate their adoption in complex decision-making roles.
The increased sophistication of AI reasoning could lead to new benchmarks and competitive landscapes in AI development.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL