Behavior Uncloning: Distilling Mode Redirection into Policy Weights without Inference-Time Steering

arXiv:2606.29201v1 Announce Type: cross Abstract: Behavior-cloned policies often learn multiple behavior modes from demonstration datasets, including modes that are unsafe or otherwise undesired at deployment. For example, a policy trained on diverse handover demonstrations may learn to pass a knife blade-first. Standard remedies such as data curation and inference-time steering either require access to the original demonstrations for full retraining or add substantial inference-time overhead. To address this gap, we propose MoRE(Mode Redirection), which redirects policy rollouts toward desire
The proliferation of behavior-cloned policies in real-world applications highlights the urgent need to refine and control their potentially undesirable behaviors without complex retraining or performance compromises.
This development allows for safer and more robust deployment of AI policies by efficiently removing problematic behaviors, thus accelerating the adoption of autonomous systems in sensitive applications.
AI policies can now be more effectively steered post-training to align with safety and ethical guidelines, reducing the need for costly data curation or performance-draining inference-time interventions.
- · AI developers
- · Robotics companies
- · Safety-critical AI applications
- · Development cycles relying on extensive data curation
- · Unsafe AI systems
Behavior-cloned policies become more reliable and trustworthy for deployment.
Reduced barriers to entry for AI systems in regulated industries due to enhanced safety mechanisms.
Increased societal acceptance and integration of autonomous AI agents in daily life and critical infrastructure.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI