Conversational Domain Adaptation of IndicTrans2 across 21 Indic Languages via Experience Replay and Model Soups

arXiv:2606.29024v1 Announce Type: new Abstract: IndicTrans2 is the strongest open English to Indic translation system, but like most systems it is trained on general text and tends to sound stiff on casual, conversational input. We adapt IndicTrans2-1B to conversational register across all 21 Indic languages using only public data (OpenSubtitles, BPCC-H-Daily, Tatoeba). Plain fine-tuning improves conversational chrF but forgets the general domain (it drops 3.9 chrF on FLORES for Hindi). Mixing general data back into training (experience replay) and then averaging the fine-tuned weights with th
The continuous improvement of large language models makes domain adaptation for specific use cases like conversational AI a current focus, driven by the need for more natural and culturally relevant interactions.
This development enhances the practical usability of AI translation for a significant linguistic demographic, moving towards more natural and contextually appropriate AI communication in non-English contexts.
AI-powered English-to-Indic language translation systems can now handle casual, conversational input more effectively without significantly compromising general domain performance.
- · Indic language speakers
- · AI service providers targeting India
- · Developers needing conversational AI in Indic languages
- · Companies with operations in India
- · Generic translation services with stiff outputs
Improved user experience for AI applications in Indic languages.
Increased adoption of AI tools and services in Indic-speaking regions due to better localization.
Potential acceleration of digital content creation and consumption in Indic languages, fostering local digital economies.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL