SCRIPT: Scalable Diffusion Policy with Multi-stage Training for Language-driven Physics-Based Humanoid Control

arXiv:2605.22894v1 Announce Type: cross Abstract: Controlling physics-based humanoids from natural-language instructions is a critical step toward general-purpose embodied agents. However, existing methods remain constrained by a tension between semantic expressiveness and physical feasibility, often failing to jointly achieve faithful instruction following, high-quality motion, and stable long-horizon control. We propose SCRIPT, a scalable diffusion policy with a multi-stage training framework for language-driven physics-based humanoid control. The core of SCRIPT is a Joint Action-State-Text
The continuous advancements in AI, particularly in diffusion models and large language models, are enabling increasingly sophisticated control mechanisms for robotic systems, making unified human-robot interaction more feasible.
This development pushes the frontier of language-driven robotic control, crucial for creating truly versatile and autonomous embodied agents that can understand and execute complex human instructions in physical environments.
The ability to control physics-based humanoids with natural language moves from constrained, specific tasks to more versatile, long-horizon applications, bridging the gap between semantic understanding and physical execution.
- · Robotics research labs
- · AI software developers
- · Humanoid robotics manufacturers
- · Logistics and manufacturing sectors
- · Manufacturers of highly specialized industrial robots
- · Companies relying on manual labor for complex tasks
More capable and instruction-following humanoid robots emerge from the lab into more practical applications.
Reduced need for highly specialized coding and robotics expertise to program complex humanoid tasks, democratizing access to advanced robotics.
Accelerated integration of humanoid robots into diverse, unstructured environments, altering workflows in service, care, and industrial settings.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG