When Should the Teacher Move? Temporal Coupling and Stability in Self On-Policy Distillation

arXiv:2606.03532v1 Announce Type: new Abstract: Self on-policy distillation trains a student policy against a teacher derived from its own parameter history, yet the teacher's update schedule -- which governs the \emph{temporal coupling} between teacher and student -- has not been systematically studied as a stability variable. Through a controlled schedule sweep on Qwen3-8B, we establish that \emph{isolation periods}, defined as complete teacher freezing between updates, are the key structural property enabling stable learning, not teacher age. To characterize these underlying training dynami
The paper directly addresses a fundamental methodological challenge in self on-policy distillation by systematically studying the teacher's update schedule as a stability variable, which is crucial for advancing AI model training.
This research provides a key insight into stabilizing and improving the efficiency of training large language models and other AI systems, directly impacting the development pace and reliability of advanced AI.
The understanding of optimal teacher-student temporal coupling for stable learning within self on-policy distillation changes, emphasizing isolation periods over teacher age as the critical factor for stability.
- · AI researchers
- · Large language model developers
- · AI-powered product companies
- · AI models with unstable training architectures
- · Developers neglecting distillation stability
Improved stability in self on-policy distillation leads to more robust and performant AI models.
Faster and more efficient development cycles for advanced AI capabilities, potentially accelerating milestones.
Reduced computational costs and energy demands for training future generations of large AI models, impacting the compute and energy landscape.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG