
arXiv:2606.06038v1 Announce Type: new Abstract: We study English-to-Prakrit machine translation in a low-resource setting where the target language is unsupported by IndicTrans2. We adapt the multilingual model by mapping Prakrit to the Hindi language tag (hin_Deva) without modifying the tokenizer, vocabulary, or architecture. Using a 1,474-pair Maharashtri Prakrit parallel corpus and evaluation on a 20-sample Ardhamagadhi test set, we report corpus BLEU improvements over an untuned baseline. The results indicate that script-compatible language routing can enable feasible transfer to unsupport
This research explores a practical approach to machine translation for low-resource languages, demonstrating how existing multilingual models can be cleverly leveraged without significant architectural changes.
It provides a blueprint for expanding AI language capabilities to a wider range of languages, particularly ancient or lesser-used ones, fostering digital inclusivity and cultural preservation.
The ability to adapt existing, powerful multilingual models like IndicTrans2 to unsupported low-resource languages by simply re-routing language tags, circumventing the need for extensive new model training or dataset creation.
- · Linguists and researchers of ancient languages
- · Developers of multilingual AI platforms
- · Low-resource language communities
- · Academic AI research
- · Creators of bespoke, from-scratch models for every new language
Increased accessibility and utility of machine translation for historically under-represented languages.
Potential for further research into script-compatible language routing and other transfer learning techniques for similar language groups.
Long-term preservation and digitization efforts for linguistic heritage, potentially impacting cultural or historical studies through automated translation tools.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL