Structure-Preserving Document Translation via Multi-Stage LLM Pipeline: A Case Study in Marathi

arXiv:2606.28796v1 Announce Type: cross Abstract: Government documents in India are predominantly issued in regional languages such as Marathi, creating substantial accessibility barriers for non-native readers, interstate administrative bodies, and policy analysts. Although recent advances in neural machine translation have improved sentence-level translation quality, existing systems largely neglect document structure, formatting integrity, and domain-specific terminology, thereby limiting their applicability to official documentation. This paper presents a structure-preserving Marathi-to-En
The proliferation of advanced LLMs and increasing digital governance initiatives are driving demand for nuanced, structure-preserving translation solutions for official documents.
Accurate, structure-preserving translation of critical government documents can significantly enhance administrative efficiency, cross-border accessibility, and policy analysis, especially in nations with high linguistic diversity.
The ability to translate complex, domain-specific government documents while maintaining structural integrity and formatting improves usability over existing sentence-level translation methods.
- · Indian government
- · LLM developers
- · Multinational organizations operating in India
- · Policy analysts
- · Manual translation services for official documents
- · Generic machine translation services
Increased accessibility of government services and information to a wider, multilingual population.
Potential for more streamlined inter-state and international administrative collaborations and policy harmonization.
Reduced administrative friction and potential for economic growth in regions previously hampered by linguistic barriers.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG