PRISMR: Overcoming Parse Collapse in Multimodal Listwise Ranking via Parameterized Representation Internalization

arXiv:2606.12942v1 Announce Type: new Abstract: Generative listwise ranking with Large Multimodal Models (LMMs) aims to capture global list context in a single forward pass, but its effectiveness degrades in long-context multimodal scenarios. We identify a recurring failure mode, parse collapse, where the autoregressive decoder produces fluent yet incomplete rankings by silently omitting candidates and terminating early. This failure stems from limited context utilization rather than simple formatting mistakes, making prompt engineering and constrained decoding insufficient. We propose PRISMR
The increasing complexity and context length requirements of multimodal LMMs highlight current architectural limitations as they scale.
Overcoming parse collapse is critical for the reliable deployment and effective utilization of advanced multimodal AI systems in real-world applications.
This research addresses a fundamental limitation in generative listwise ranking, enabling LMMs to process and rank longer, more complex multimodal data accurately.
- · AI developers
- · Multimodal AI applications
- · Natural language processing
- · Generative AI
- · Inefficient LMM architectures
- · Applications requiring extensive manual prompt engineering
LMMs become more effective and reliable in long-context multimodal ranking tasks.
This improvement accelerates the adoption of LMMs in domains requiring precise long-form data analysis and structured output.
Enhanced LMM capabilities could lead to new types of AI agents that can autonomously process and synthesize complex multimodal information more accurately.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI