
arXiv:2603.09170v2 Announce Type: replace-cross Abstract: Achieving versatile and natural whole-body humanoid interaction control remains challenging due to the high cost of whole-body teleoperation data. We present ZeroWBC, a teleoperation-free framework that learns humanoid whole-body interaction from human egocentric videos paired with synchronized whole-body motion and text annotations. ZeroWBC adopts a generation-then-tracking formulation to tackle the static scene whole-body interaction control problem. Given an initial egocentric image and a language instruction, a fine-tuned Vision-Lan
The increasing availability of human egocentric video data combined with advancements in AI vision and language models is enabling new approaches for humanoid control.
This research opens a path to creating more natural and versatile humanoid interaction without expensive teleoperation, accelerating the development of capable robots.
Humanoid robots can now learn complex whole-body interactions from observation, rather than requiring direct human control, making their training more scalable and organic.
- · Humanoid robotics developers
- · AI model developers (Vision-Language)
- · Automation sector
- · Traditional teleoperation methods
More sophisticated humanoid robot capabilities in unstructured environments become feasible.
Accelerated deployment of humanoid robots into commercial and industrial settings.
Increased societal debate and policy development around human-robot interaction and integration.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI