
arXiv:2605.23163v1 Announce Type: new Abstract: End-to-end autonomous driving via Vision-Language-Action (VLA) models demands a precarious balance between high-fidelity trajectory planning and efficient inference. Existing paradigms typically fall short: autoregressive (AR) VLAs are memory-bandwidth-bound on edge hardware and prone to exposure-bias drift, while full-sequence diffusion models preclude KV-cache reuse and suffer from "logical leakage" that violates the fundamental perceive-then-plan causality. We present Fast-dDrive, a block-diffusion VLA that performs bidirectional refinement wi
The AI and robotics sectors are rapidly advancing, pushing the boundaries of autonomous systems, making efficient VLA models for autonomous driving a timely and critical development.
This development addresses key limitations in existing VLA models, promising more reliable and efficient autonomous driving systems, which is crucial for widespread adoption and safety.
The ability to perform bidirectional refinement on critical inference tasks changes the landscape for VLM applications in autonomous systems, enabling more sophisticated and less error-prone operations.
- · Autonomous vehicle developers
- · AI hardware manufacturers
- · Logistics and transportation companies
- · Consumers of autonomous services
- · Companies relying on less efficient VLA architectures
- · Traditional human-driven transport services
More robust and reliable autonomous driving systems become feasible, accelerating deployment.
Increased adoption of autonomous vehicles could disrupt transportation logistics and urban planning.
Reduced human error in driving could lead to significantly lower accident rates and insurance changes.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL