
arXiv:2603.24472v3 Announce Type: replace-cross Abstract: Self-distillation has emerged as an effective post-training paradigm for LLMs, often improving performance while shortening reasoning traces. However, in mathematical reasoning, we find that it can reduce response length while degrading performance. We trace this degradation to the suppression of epistemic verbalization - the model's expression of uncertainty during reasoning. Through controlled experiments varying conditioning context richness and task coverage, we show that conditioning the teacher on rich information suppresses uncer
The proliferation of LLMs and increasing investment in their optimization makes understanding performance degradation in training methods like self-distillation critically relevant now.
This research reveals a critical limitation in a prevalent LLM optimization technique, suggesting that perceived improvements in efficiency might come at the cost of reasoning quality, particularly in complex cognitive tasks.
The understanding of self-distillation's 'black box' behavior is refined, indicating that a simple reduction in response length does not always equate to improved utility and can mask a degradation in core reasoning and explanation capabilities.
- · AI researchers focusing on model interpretability
- · Developers of LLMs for high-stakes reasoning applications
- · Sectors requiring explainable AI, like finance or healthcare
- · LLM developers solely prioritizing inference speed or brevity
- · Companies relying on over-optimized LLMs for complex reasoning
Further research will focus on mitigating the negative effects of self-distillation on epistemic verbalization while retaining its benefits.
New self-distillation techniques will emerge that explicitly preserve or enhance uncertainty expression in LLMs.
The development and deployment of agentic AI systems will become more robust as their underlying LLMs are trained to better articulate reasoning and uncertainty.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG