
arXiv:2606.14238v1 Announce Type: cross Abstract: Safety certification of Vision-Language-Action (VLA) driving planners under ISO 21448 (SOTIF) rests on an Operational Design Domain (ODD) specification that answers two complementary questions: when does the planner start to fail, and how severely does it fail once it does? We evaluate Alpamayo R1, a 10B-parameter open-weight driving VLA, on 15,968 (clip, attack) pairs. We find a conservative-aggregate gap: an aggregate safe threshold of $\sigma \leq 50$ under a 15% average displacement error (ADE) budget masks well-sampled scenarios that toler
The increasing complexity and deployment of Vision-Language-Action (VLA) models in critical applications like autonomous driving necessitate rigorous and nuanced safety certification before widespread adoption.
This research provides a framework for understanding the failure modes and severity of VLAs in specific driving scenarios, which is crucial for establishing trust and regulatory pathways for AI-driven autonomous systems.
The focus shifts from broad safety thresholds to scenario-specific safety envelopes, allowing for more precise risk assessment and targeted mitigation strategies for VLA-powered autonomous vehicles.
- · Autonomous vehicle developers
- · AI safety researchers
- · Regulatory bodies (ISO 21448)
- · AI model auditing firms
- · Companies relying on blanket safety assumptions
- · Developers ignoring scenario-specific risks
- · Traditional, less granular safety assessment methodologies
Improved safety and reliability standards for autonomous driving systems powered by VLAs.
Accelerated development and adoption of safer autonomous vehicles, potentially reducing accidents due to human error.
Enhanced public trust in AI-driven mobility solutions, paving the way for broader integration of AI into other safety-critical infrastructure.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI