State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space

arXiv:2601.04266v2 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) models are widely deployed in safety-critical embodied AI applications such as robotics. However, their complex multimodal interactions also expose new security vulnerabilities. In this paper, we investigate a backdoor threat in VLA models, where malicious inputs cause targeted misbehavior while preserving performance on clean data. Existing backdoor methods predominantly rely on inserting visible triggers into visual modality, which suffer from poor robustness and low insusceptibility in real-world settings
The increasing deployment of Vision-Language-Action models in real-world, safety-critical applications like robotics highlights the urgency of understanding and mitigating novel security vulnerabilities.
This research reveals a new, stealthy attack vector ('State Backdoor') against embodied AI, threatening the reliability and safety of autonomous systems and potentially undermining public trust.
The understanding of AI security expands beyond visible triggers to include more sophisticated, state-dependent backdoor attacks, demanding more robust and proactive defensive measures in VLA model development.
- · AI security researchers
- · Developers of robust VLA models
- · Sovereign entities developing secure AI infrastructure
- · Developers of insecure VLA models
- · Embodied AI applications without strong security protocols
- · Users vulnerable to VLA model manipulation
This research will drive an immediate focus on developing new detection and mitigation strategies for 'State Backdoor' attacks in VLA models.
Increased investment in secure AI development will become a critical differentiator for companies and nations deploying advanced robotics and autonomous systems.
The potential for sophisticated, untraceable attacks could lead to regulatory pressure for mandatory security standards in AI, particularly within critical infrastructure and defense applications.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG