Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success

arXiv:2601.18175v2 Announce Type: replace-cross Abstract: A widely used technique for improving policies is success conditioning, in which one collects trajectories, identifies those that achieve a desired outcome, and updates the policy to imitate the actions taken along successful trajectories. This principle appears under many names -- rejection sampling with SFT, goal-conditioned RL, Decision Transformers -- yet what optimization problem it solves, if any, has remained unclear. We prove that success conditioning exactly solves a trust-region optimization problem, maximizing policy improvem
This research provides a formal understanding of an existing, widely used AI policy improvement technique, solidifying its theoretical underpinnings amidst rapid advancements in AI agents and decision-making systems.
Understanding the optimization problem solved by 'success conditioning' provides a clearer theoretical framework for designing and improving AI agents, potentially leading to more robust and explainable systems.
The formal proof offers a principled path for developing and evaluating AI policy improvement methods, moving beyond purely empirical approaches for techniques like reinforcement learning and decision transformers.
- · AI researchers
- · AI developers
- · AI agent platforms
- · Ad-hoc AI development methods
Increased efficiency and predictability in training certain types of AI policies will occur.
This improved understanding could accelerate the development of more capable and reliable autonomous AI agents.
More robust AI agents could enable new applications across various sectors, collapsing workflows and increasing automation.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG