Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations

arXiv:2603.11417v2 Announce Type: replace-cross Abstract: End-to-end autonomous driving models are typically trained on multi-city datasets using supervised ImageNet-pretrained backbones, yet their ability to generalize to unseen cities remains largely unexamined. When training and evaluation data are geographically mixed, models may implicitly rely on city-specific cues, masking failure modes that would occur under real-world domain shifts when generalizing to new locations. In this work, we formulate zero-shot cross-city transfer as a controlled representation-level stress test for end-to-en
The proliferation of end-to-end autonomous driving models highlights the critical need for robust generalization capabilities beyond training environments, making cross-city transfer a timely research focal point.
Achieving zero-shot cross-city generalization is fundamental for the widespread deployment and safety of autonomous vehicles, reducing the need for costly and time-consuming localized retraining.
The focus now shifts towards developing more resilient and universally applicable autonomous driving models that can operate effectively in entirely new urban environments without prior exposure.
- · Autonomous vehicle developers
- · Smart city infrastructure providers
- · AI research in robust generalization
- · Companies relying on hyper-localized autonomous driving solutions
- · Traditional mapping and data collection services
Autonomous driving deployments become significantly faster and less resource-intensive.
Reduced barriers to entry for AV companies in new markets, intensifying competition.
Standardization of autonomous driving AI architectures across diverse geographies accelerated by generalizable models.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG