AgriGov: A Structured Multilingual Dataset Curation for Indian Government Schemes for Farmers

arXiv:2606.08272v1 Announce Type: cross Abstract: AgriGov is a curated, trilingual (English-Hindi-Marathi) dataset designed to address the scarcity of domain-grounded multilingual resources for agricultural policies and farmer welfare schemes. Initially, we collected and structured data from 50 government schemes sourced from trusted portals using automated scraping techniques, organizing it into predefined semantic fields (e.g., title, eligibility, application process, documents, exclusions). Translations were performed using a pipeline combining Google Translate API, MarianMT, and human post
The proliferation of AI models necessitates high-quality, domain-specific multilingual datasets to bridge language barriers and ensure equitable access to government services.
This development highlights the critical need for structured governmental data in local languages, directly impacting AI's utility for public service delivery and reducing digital divides in India.
The availability of a curated, trilingual dataset for Indian agricultural policies will enable the development of more effective and accessible AI applications for farmers, potentially improving scheme uptake and welfare.
- · Indian farmers
- · AI developers in India
- · Agricultural technology sector
- · Indian government (public service delivery)
- · Bureaucratic inefficiencies (gradually)
Improved understanding and access to government schemes for farmers through AI-powered interfaces.
Increased adoption of agricultural policies and welfare programs, leading to better farmer outcomes and economic stability.
The dataset could serve as a blueprint for other multilingual government data initiatives, fostering a more inclusive and AI-enabled public sector globally.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI