How VLAs Fail Differently: Black-Box Action Monitoring Reveals Architecture-Specific Failure Signatures

arXiv:2605.28726v1 Announce Type: cross Abstract: We discover that VLA architectures fail in fundamentally different, predictable ways at the motor-command level. Running VQ-BeT, Diffusion Policy, and ACT on identical evaluation protocols (n=450 episodes across PushT and ALOHA 14-DOF bimanual manipulation), we find: (1) direction reversal rate is a universal failure predictor across all three architectures (AUROC=0.93, 0.79, 0.91; p<0.001); (2) jerk monitoring is predictive only for discrete-token architectures, following a discrete-to-continuous gradient (0.88, 0.69, 0.41); (3) velocity viola
This research provides timely insights into the failure modes of leading robotic architectures, crucial as AI agents are integrated into more complex physical systems.
Understanding the predictable ways different VLA architectures fail allows for more robust design, monitoring, and deployment of robotic systems, impacting their reliability and safety.
The ability to predict and potentially prevent failures based on architecture-specific 'signatures' enables a new level of diagnostics and control for VLAs, improving their operational integrity.
- · Robotics developers
- · AI safety researchers
- · Automation industries
- · AI agent providers
- · Developers ignoring failure modes
- · Unreliable AI-driven hardware
Improved reliability and faster deployment cycles for robotic systems using VLAs due to better failure prediction.
Reduced operational costs and increased safety in sectors adopting advanced automation, accelerating their integration.
Enhanced public trust and regulatory acceptance of AI-driven robotics in critical applications, potentially broadening their societal impact.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG