SIGNALAI·May 29, 2026, 4:00 AMSignal75Medium term

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

Source: arXiv cs.LG

Share
Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

arXiv:2605.28860v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) frequently induces catastrophic forgetting of prior capabilities. Recent work has shown that reinforcement learning (RL) retains prior capabilities more effectively than supervised fine-tuning (SFT), attributing this to policy-gradient updates remaining closer to the base policy \cite{shenfeld2025rl}. We extend this behavioral account to the mechanistic level and ask whether RL's advantage is mirrored by stronger preservation of internal computational circuits. We introduce differential circuit vulnerabili

Why this matters
Why now

Ongoing research into LLM fine-tuning and the challenges of catastrophic forgetting are central to improving AI model stability and performance.

Why it’s important

Understanding the mechanistic origins of catastrophic forgetting provides a critical pathway to developing more robust and efficient AI models, reducing retraining costs and improving continuous learning capabilities.

What changes

The understanding of why RL preserves circuits better than SFT shifts research focus towards leveraging RL's strengths or mitigating SFT's weaknesses for more stable model updates.

Winners
  • · AI researchers
  • · LLM developers
  • · Companies deploying AI agents
  • · AI-dependent industries
Losers
  • · Developers reliant solely on SFT
  • · AI applications requiring frequent and extensive retraining
Second-order effects
Direct

Improved model retention and reduced forgetting in large language models will lead to more stable and adaptable AI systems.

Second

This could accelerate the development and deployment of sophisticated AI agents and continuously learning systems across various sectors.

Third

Enhanced model stability might reduce the computational and energy burden associated with frequent model updates, indirectly impacting compute supply chains and energy consumption.

Editorial confidence: 90 / 100 · Structural impact: 60 / 100
Original report

This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.

Read at arXiv cs.LG
Tracked by The Continuum Brief · live intelligence network
Share
The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.

By subscribing, you agree to receive updates from THE CONTINUUM BRIEF. You can unsubscribe at any time.