
arXiv:2602.19612v5 Announce Type: replace Abstract: Machine Unlearning (MU) enables Large Language Models (LLMs) to remove unsafe or outdated information. However, existing work assumes that all facts are equally forgettable and largely ignores whether the forgotten knowledge originates from pretraining or supervised fine-tuning (SFT). In this paper, we introduce DUET (Dual Unlearning Evaluation across Training Stages), a benchmark of 28.6k Wikidata-derived triplets annotated with fact popularity using Wikipedia link counts and LLM-based salience scores. Our experiments show that pretrained an
The paper addresses a critical, ongoing challenge in AI alignment and control, becoming more urgent as LLMs are deployed in sensitive applications requiring robust unlearning capabilities.
This research provides a more nuanced understanding of how knowledge is removed from LLMs, highlighting that not all facts are equally forgettable and that the unlearning process is influenced by the training stage.
The focus shifts from generic unlearning methods to strategies that account for fact salience and the origin of knowledge (pretraining vs. fine-tuning), leading to more effective and targeted unlearning techniques.
- · AI ethicists
- · Developers of custom/private LLMs
- · Users concerned about data privacy
- · Developers of undifferentiated general-purpose unlearning algorithms
- · Entities struggling with LLM ethical compliance
More sophisticated and reliable methods for removing unwanted information from large language models will emerge.
Improved unlearning could increase trust in AI systems and accelerate their adoption in regulated industries requiring data deletion capabilities.
The ability to precisely control what an LLM 'knows' might lead to new forms of intellectual property and content management within AI models, potentially impacting content licensing and model ownership.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL