
arXiv:2605.27358v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) has become the de facto architecture for hundred-billion-parameter language models, yet its advantages at sub-billion scales for on-device deployment remain largely unexplored. To close this gap, we present MobileMoE, a family of on-device MoE language models with sub-billion active parameters (0.3-0.9B active and 1.3-5.3B total) that establish a new Pareto frontier for on-device LLMs. We first formulate an on-device MoE scaling law that jointly optimizes MoE architecture under mobile memory and compute constraints, ident
The push for more efficient and capable AI models on edge devices is intensifying, driven by hardware advancements and increasing demand for localized processing.
This breakthrough allows for advanced AI capabilities to run directly on mobile devices, reducing reliance on cloud infrastructure and enhancing privacy and responsiveness.
On-device AI systems can now leverage MoE architectures for significantly improved performance and efficiency at sub-billion parameter scales.
- · Smartphone manufacturers
- · On-device AI developers
- · Consumers of AI-powered mobile applications
- · Edge computing hardware providers
- · Cloud-centric AI model providers
- · Companies relying solely on large, centralized models for mobile applications
On-device AI models become more powerful and efficient, enabling new classes of mobile applications.
Increased adoption of edge AI could shift data processing away from large data centers, impacting cloud infrastructure demand.
The proliferation of highly capable on-device AI might accelerate the development of personalized, context-aware AI agents on mobile platforms.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG