RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress

arXiv:2512.23995v2 Announce Type: replace-cross Abstract: Mixture-of-Experts architectures have become the standard for scaling large language models due to their superior parameter efficiency. To accommodate the growing number of experts in practice, modern inference systems commonly adopt expert parallelism to distribute experts across devices. However, the absence of explicit load balancing constraints during inference allows adversarial inputs to trigger severe routing concentration. We demonstrate that out-of-distribution prompts can manipulate the routing strategy such that all tokens ar
The increasing adoption of Mixture-of-Experts (MoE) architectures in Large Language Models (LLMs) makes the discovery of their vulnerability to Denial-of-Service (DoS) attacks on routing a critical and timely finding.
This research reveals a significant security and performance vulnerability in a foundational AI architecture, which could impact the reliability and cost-efficiency of large-scale AI deployments.
Adversarial inputs can now be proven to manipulate MoE routing, leading to severe load imbalance and potential DoS attacks, necessitating new approaches to model robustness and inference system security.
- · AI security researchers
- · Companies developing robust MoE load balancing solutions
- · Organizations prioritizing AI model resilience
- · Developers relying on current MoE routing without explicit load balancing
- · Users vulnerable to prompt-based DoS attacks
- · LLM providers with unmitigated MoE vulnerabilities
Immediate efforts will focus on developing and implementing more robust routing and load-balancing mechanisms for MoE architectures.
The discovery could lead to a re-evaluation of MoE security postures and potential short-term delays in large-scale MoE deployments for critical applications.
This vulnerability might prompt a broader industry focus on 'adversarial robustness' not just in model outputs, but also in internal architectural mechanisms and resource allocation.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG