Structural Anchor Pruning: Training-Free Multi-Vector Compression for Visual Document Retrieval

arXiv:2601.20107v2 Announce Type: replace-cross Abstract: Recent Vision-Language Models (e.g., ColPali) enable fine-grained Visual Document Retrieval (VDR) but incur prohibitive multi-vector index storage overhead. Existing training-free pruning methods either rely on heuristic layer choices or degrade sharply under aggressive compression, leading prior work to argue that effective high-compression pruning requires query-dependent training. We challenge this view with Structural Anchor Pruning (SAP), a self-calibrating, training-free, and query-agnostic index-time pruning framework with three
The proliferation of large vision-language models necessitates more efficient indexing and retrieval mechanisms to overcome prohibitive storage and computational overheads.
This development allows for more resource-efficient deployment and scaling of fine-grained visual document retrieval systems, expanding their practical applicability for intelligence and analytics.
The ability to achieve high compression for multi-vector indexes without training significantly reduces the cost and complexity of deploying advanced VDR systems.
- · AI/ML developers
- · Cloud infrastructure providers
- · Digital archives and libraries
- · Intelligence agencies
- · Companies relying on inefficient, high-storage VDR solutions
- · Legacy document management systems
More widespread adoption of visual document retrieval across various sectors due to lower operational costs.
Increased ability to process and search through vast amounts of visual and text data, enhancing competitive intelligence and research.
New applications emerging from the ability to quickly and cheaply analyze large visual datasets, potentially impacting fields from legal discovery to medical imaging.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL