
arXiv:2606.02806v1 Announce Type: new Abstract: We introduce Padyam2Gadyam, a dataset for the task of poem-to-prose translation from 13th-17th Century Telugu Classical Poetry to contemporary Telugu and English prose. The dataset consists of 600 poems and their human-verified Telugu and English prose translations. We evaluated 5 contemporary Large Language Models (LLMs) on their ability to do poem-to-prose translation into Telugu and English. Our results indicate that while there are differences across LLMs, their overall performance leave a large room for improvement in both languages. Through
The proliferation of powerful LLMs is driving the need for new, diverse datasets to evaluate and improve their capabilities beyond mainstream use cases.
This development highlights the ongoing challenges and opportunities in advanced natural language processing, particularly for culturally specific and historically nuanced content.
The creation of Padyam2Gadyam provides a novel benchmark for evaluating the effectiveness of LLMs in complex, cross-domain translation tasks, especially for less resourced languages.
- · AI researchers
- · Linguists
- · Cultural preservation initiatives
- · Developers of less common language models
- · Monolingual content creators
- · LLMs lacking cultural and linguistic diversity
- · Users relying solely on English-centric AI tools
The new dataset will facilitate targeted improvements in LLM performance for translating classical poetry into modern prose.
Improved translation of classical texts could lead to greater accessibility and appreciation of diverse cultural heritage globally.
This could inspire similar dataset creation efforts for other niche and under-represented linguistic and cultural domains, broadening AI's scope.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL