Capability and Robustness Cannot Both Be Free: An Information-Theoretic Bound for Vision-Language-Action Models

arXiv:2605.25889v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models are increasingly deployed on real robots, where each predicted action is executed and each failure carries a safety cost. They reach high success rates on clean inputs but collapse under small adversarial perturbations. A $16/255$ PGD attack on OpenVLA-7B drops LIBERO success from above $95\%$ to under $5\%$. Empirical defenses recover some robustness at a cost in clean accuracy, but the literature does not say whether the trade-off has a theoretical floor. We prove that it does. For any VLA policy with discr
The increasing deployment of Vision-Language-Action models in real-world robotic applications makes understanding their fundamental limitations, especially regarding robustness, critically timely.
A theoretical bound limiting the simultaneous achievement of high capability and robustness in VLA models implies that practical deployments will always involve trade-offs, impacting safety, reliability, and deployment timelines.
This research shifts the design paradigm for VLA models from aiming for both perfect capability and robustness to strategically managing the inevitable trade-off between the two, particularly in safety-critical applications.
- · AI safety researchers
- · Developers of specialized robust control systems
- · Manufacturers of resilient robotic hardware
- · Developers neglecting adversarial robustness
- · Companies deploying VLA models without robust testing
- · Users expecting infallible robot performance
Further research will focus on optimizing this capability-robustness trade-off and developing new architectures that push these theoretical limits.
Regulatory bodies might develop new certification standards for VLA systems that explicitly account for this trade-off, particularly in high-stakes robotic applications.
This fundamental limitation could accelerate the development of hybrid human-AI systems where humans compensate for AI's robustness failures, especially in unstructured or adversarial environments.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG