
arXiv:2606.04694v1 Announce Type: new Abstract: Small language models (SLMs) are efficient and scalable, but their multilingual capabilities degrade severely at sub-billion scales, especially for Southeast Asian (SEA) languages. We introduce DuDi, a dual-signal multilingual distillation framework that combines an online sequence-level signal with off-policy and on-policy token-level signals. DuDi further uses a cross-lingual verbalizer to refine teacher feedback and improve teacher-student transferability in multilingual settings. Experiments on SEA-HELM across multiple model families, scales,
The proliferation of AI models created in dominant languages (English, Chinese) necessitates new methods for efficient cross-lingual adaptation, especially for underrepresented languages like those in Southeast Asia, to ensure broader utility and equitable access.
This development is crucial for expanding the applicability of AI to a global user base, especially in regions with diverse linguistic landscapes, reducing the computational burden of developing separate models for each language.
Multilingual capabilities of smaller language models, particularly for Southeast Asian languages, can now be significantly improved through dual-signal distillation and cross-lingual verbalizers, making these models more accessible and effective.
- · Southeast Asian language users
- · Developers of small language models (SLMs)
- · AI companies targeting diverse linguistic markets
- · Researchers in cross-lingual NLP
- · Monolingual large language models (LLMs) in niche markets
- · AI development approaches that neglect linguistic diversity
Improved multilingual SLMs will make AI more accessible and useful for non-English speaking populations, especially in Southeast Asia.
This could accelerate digital inclusion, foster local content creation, and empower economic development in linguistically diverse regions.
Long-term, it may shift the center of gravity for AI innovation to include more diverse linguistic and cultural contexts, reducing Western model dependency.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL