A Pilot Study on Curator-Guided Multilingual Art Description for Blind and Low-Vision Audiences with Small Vision-Language Models

arXiv:2605.31080v1 Announce Type: cross Abstract: Blind and low-vision (BLV) audiences remain underserved by visual art descriptions, particularly across languages and in museum settings where privacy and intellectual-property constraints may favour small on-premise vision-language models (VLMs). This pilot study investigates curator-guided multilingual art description with Qwen2.5-VL-3B-Instruct for German, Romanian, and Serbian. We construct a parallel BLV-oriented caption corpus from artwork images and metadata, and compare language-specific LoRA adapters with a single multilingual adapter
The study leverages recent advancements in small vision-language models (VLMs) and the growing demand for inclusive AI applications, particularly for underserved communities.
This pilot demonstrates the practical application of small, on-premise VLMs for specialized, privacy-sensitive tasks, indicating a broadening utility of AI beyond large, general-purpose models.
The focus shifts from purely large, cloud-based AI solutions to validating the effectiveness of smaller, specialized, and customizable models for niche applications, particularly with multilingual and accessibility considerations.
- · Museums and cultural institutions
- · Accessibility technology providers
- · Smaller VLM developers
- · BLV audiences
- · One-size-fits-all AI solution providers
Increased accessibility of visual art for blind and low-vision audiences through AI-generated descriptions.
Development of more specialized, privacy-preserving AI models for on-premise institutional use in various sectors.
Potential for a competitive market for 'small AI' solutions tailored for specific industries, data sovereignty, and ethical considerations.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI