LIBERO-Occ: Evaluating and Improving Vision-Language-Action Models under Scene-Induced Occlusion via Viewpoint Imagination

arXiv:2606.10862v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models achieve strong performance on standard manipulation benchmarks, but most evaluations assume that task-relevant objects are fully visible. This assumption often fails in realistic settings, where occlusion makes manipulation partially observable. In this paper, we study \textit{scene-induced occlusion} as a fundamental challenge for VLA models and introduce \textbf{LIBERO-Occ}, an occlusion-oriented extension of LIBERO. Experiments show that state-of-the-art VLAs suffer substantial performance degradation unde
The paper highlights a critical limitation in current Vision-Language-Action (VLA) models, specifically their susceptibility to scene-induced occlusion, a challenge becoming more apparent as these models move from benchmarks to real-world applications.
This research is important because it identifies a fundamental obstacle to the robust deployment of VLA models in complex, unstructured environments, pushing for more resilient AI systems.
The focus for VLA model development is shifting to address partial observability and occlusion handling, moving beyond idealized benchmarks to more realistic performance criteria.
- · AI hardware manufacturers
- · Robotics companies
- · Research institutions
- · Software developers
- · VLA models without occlusion handling
- · Unsupervised robotics applications
- · Companies relying on idealized robotics environments
LIBERO-Occ provides a new benchmark for evaluating and improving VLA models under occluded conditions.
Improved VLA models could accelerate the deployment of autonomous systems in diverse and challenging real-world scenarios, particularly in logistics and manufacturing.
Enhanced robotic perception and manipulation capabilities could contribute to more generalized AI agents, reducing the need for human intervention in physical tasks.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI