OmniDrive: An LLM-Choreographed Multi-Agent World Model with Unified Latent Co-Compression for Multi-View Driving Video Generation

arXiv:2606.17536v1 Announce Type: cross Abstract: Generative world models for autonomous driving face two unresolved tensions: heterogeneous control injection, where free-form language, HD-maps, trajectories, and camera poses reside in incompatible representational spaces, and post-hoc cross-view fusion, where per-camera latents fail to encode global 3-D geometry. We trace both to a single root cause: the absence of a shared symbolic interlingua aligning language, geometry, and pixels at the latent-token level. We present DRIVE-CHOREO, an LLM-choreographed multi-agent world model that recasts
The rapid advancement of large language models (LLMs) and the increasing demand for autonomous driving solutions are converging, enabling more sophisticated world models.
This development addresses critical challenges in autonomous driving by unifying disparate data types and improving 3D geometric understanding, paving the way for more robust and reliable self-driving systems.
The ability to choreograph multi-agent world models with LLMs and achieve unified latent co-compression fundamentally changes how autonomous driving systems perceive and interact with their environment.
- · Autonomous vehicle manufacturers
- · AI research institutions
- · Urban logistics companies
- · Sensor technology providers
- · Companies with less sophisticated perception systems
- · Traditional control system designers
- · Manual driving instructors
- · Insurance companies (long-term risk models)
More accurate and reliable autonomous driving systems will emerge.
This will accelerate the adoption of Level 4 and Level 5 autonomous vehicles in diverse environments.
Urban planning and infrastructure development will adapt to a future dominated by autonomous mobility, potentially reducing traffic congestion and improving safety.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI