
arXiv:2605.27379v1 Announce Type: new Abstract: We present Soro, a family of Tajik-specialized conversational large language models (LLMs) designed for real-world deployment under tight compute and connectivity constraints in Tajikistan. Starting from open-weight Gemma 3 checkpoints, we perform Tajik-only continual pretraining on a curated 1.9-billion-token corpus spanning filtered web text, PDF documents, and curriculum-aligned educational materials, followed by supervised instruction tuning on 40K Tajik teacher-style examples. To enable rigorous evaluation despite the limited coverage of Taj
The proliferation of open-source foundational models and increasing global awareness of AI's strategic importance enable smaller nations like Tajikistan to develop localized AI capabilities.
This initiative demonstrates a growing trend among nations to develop language-specific AI models, reducing reliance on major power-aligned technologies and protecting linguistic and cultural specificity.
The availability of a custom-built, lightweight LLM for Tajik potentially empowers local innovation, improves digital inclusion, and enables greater control over domestic AI deployment.
- · Tajikistan (government, populace)
- · Open-source AI community
- · Linguistic minorities
- · Developers in emerging markets
- · Monopolistic global AI providers
- · Language-agnostic AI deployment strategies
Tajikistan gains an AI chatbot tailored to its language and local needs, potentially improving access to information and services.
This success could inspire other smaller nations or linguistic groups to pursue similar sovereign AI development strategies, decentralizing AI creation.
A fragmentation of the global AI landscape might emerge, with nation-specific or locale-specific models becoming common, leading to a complex web of interoperability challenges and opportunities.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI