Why Can't I Open My Drawer? Mitigating Object-Driven Shortcuts in Zero-Shot Compositional Action Recognition

arXiv:2601.16211v3 Announce Type: replace-cross Abstract: Zero-Shot Compositional Action Recognition (ZS-CAR) requires recognizing novel verb-object combinations composed of previously observed primitives. In this work, we tackle a key failure mode: models predict verbs via object-driven shortcuts (i.e., relying on the labeled object class) rather than temporal evidence. We argue that sparse compositional supervision and verb-object learning asymmetry can promote object-driven shortcut learning. Our analysis with proposed diagnostic metrics shows that existing methods overfit to training co-oc
The proliferation of advanced AI systems and their application in complex human-like tasks necessitates a deeper understanding of their failure modes and biases.
Improving the robustness and reliability of compositional action recognition is critical for the safe and effective deployment of AI in robotics, autonomous systems, and human-computer interaction.
This research provides a diagnostic framework and highlights specific failure modes (object-driven shortcuts) in AI's ability to interpret complex actions, leading to more robust model development.
- · AI researchers
- · Robotics developers
- · Autonomous vehicle manufacturers
- · AI safety researchers
- · AI systems prone to shortcuts
- · Developers relying on brittle AI models
More accurate and reliable AI systems for understanding and executing complex tasks will emerge.
This will accelerate the deployment of AI in high-stakes environments where misinterpretation is costly, such as elder care or advanced manufacturing.
Increased trust in AI's ability to interpret human intent could lead to broader societal integration of AI, potentially leading to new human-AI interaction paradigms.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI