
arXiv:2606.19222v1 Announce Type: cross Abstract: We propose MAST (Mechanism-Aligned Selective Targeting), a mechanism-guided method for unlearning RLVR-induced reasoning with substantially lower collateral damage than standard full-parameter updates. In matched SFT/RLVR checkpoints on Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, the SFT-to-RLVR increment differs sharply from the SFT update in token-level delta-log-probability, and full-parameter gradient ascent forgets only by damaging retain MATH and GSM8K. MAST ranks attention-projection tensors by off-principal energy, update magnitude, and forg
The increasing sophistication of large language models and their fine-tuning processes necessitates more precise control over learned behaviors, particularly in areas like reasoning.
This research provides a method for selectively removing undesirable or incorrect reasoning without degrading core capabilities, which is crucial for safety, reliability, and ethical deployment of advanced AI.
AI models can now be 'unlearned' with higher fidelity and less collateral damage, potentially improving iterative development and addressing biases or harmful outputs more efficiently.
- · AI developers
- · AI ethics and safety researchers
- · Companies deploying fine-tuned LLMs
- · None
More robust and controllable AI models can be developed and deployed faster.
This capability allows for more agile remediation of AI model misbehaviors post-deployment.
The precision of unlearning could lead to entirely new methods of AI model editing and capability modulation, creating more adaptable AI systems.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI