Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task

arXiv:2605.20626v1 Announce Type: new Abstract: We present the University of Florida Gators submission to the AmericasNLP 2026 shared task on cultural image captioning for Indigenous languages. Our two-stage pipeline generates a Spanish intermediate caption with Qwen2.5-VL, then produces the target-language caption using retrieval-augmented many-shot prompting with Gemini 2.5 Flash. We achieve 164.1%, 131.7%, and 122.6% improvements over the shared task baseline for Bribri, Guaran\'i, and Orizaba Nahuatl captioning, respectively, in our dev set evaluation and maintain >150% improvements for th
The continuous advancements in large language models and multimodal AI, coupled with increased focus on linguistic diversity, enable more effective cultural preservation efforts through technology.
This development demonstrates concrete progress in applying advanced AI to long-tail linguistic challenges, potentially expanding the reach and utility of AI for culturally specific applications beyond dominant languages.
The ability to generate high-quality captions for indigenous languages using retrieval-augmented methods changes the feasibility and quality of cultural content localization and preservation.
- · Indigenous language communities
- · AI developers specializing in long-tail languages
- · Cultural preservation organizations
- · Academics in computational linguistics
- · Monolingual content platforms
- · Traditional translation services without AI integration
Improved accessibility and understanding of culturally specific visual content for indigenous language speakers.
Increased demand for culturally relevant datasets and multimodal AI models trained on a broader range of languages.
The potential for AI to become a critical tool in reversing language endangerment and empowering cultural digital sovereignty.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL