
arXiv:2605.03065v2 Announce Type: replace Abstract: Generative control policies (GCPs), such as diffusion- and flow-based control policies, have emerged as effective parameterizations for robot learning. This work introduces Off-policy Generative Policy Optimization (OGPO), a sample-efficient algorithm for finetuning GCPs that maintains off-policy critic networks to maximize data reuse and propagate policy gradients through the full generative process of the policy via a modified PPO objective, using critics as the terminal reward. OGPO achieves state-of-the-art performance on manipulation tas
The proliferation of generative AI models is naturally extending to physical control applications, making sample efficiency a critical bottleneck for real-world robotics.
This breakthrough offers a more efficient path to deploying sophisticated generative control policies on robots, potentially accelerating the development of capable autonomous systems.
Robot learning, especially for complex manipulation tasks, can now advance more rapidly due to improved sample efficiency in training generative control policies.
- · Robotics companies
- · Automation sector
- · AI research institutions
- · Companies reliant on traditional, less efficient robot training methods
Faster and more robust training of robotic systems using generative policies becomes feasible.
Accelerated deployment of advanced automation in industries requiring fine manipulation and adaptive control.
Enhanced capabilities of general-purpose robots could lead to broader economic and societal integration of autonomous physical agents.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG