Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attribution

arXiv:2606.13668v1 Announce Type: new Abstract: With the growth of LLMs' (Large Language Models) capabilities, there has been an increasing push to curate high quality datasets by filtering samples in the training data. In general, Data Attribution (DA) methods aim to estimate how individual samples in a training dataset can precondition a model to generate certain outputs. As an example, one might be interested in which samples in the data could be the source of toxic behavior after training the LLM. Many methods quantify this conditioning through the paradigm of influence functions. While me
The increasing scale and complexity of LLMs necessitate advanced data attribution methods to manage training data quality and mitigate risks like toxic outputs.
Understanding data attribution in LLMs is crucial for responsible AI development, ensuring model reliability, and addressing bias or unintended behaviors originating from training data.
New methodologies like Influcoder aim to make data attribution in LLMs more efficient and interpretable, allowing for targeted dataset curation and bias mitigation.
- · AI developers
- · Dataset curators
- · AI ethics and safety researchers
- · Enterprises deploying LLMs
- · Developers of uninterpretable AI systems
- · Suppliers of low-quality training data
Improved methods for identifying and correcting problematic training data samples in LLMs.
Reduced incidence of biased or toxic outputs from LLMs due to better data quality controls.
Increased trust and adoption of LLMs in sensitive applications as their training data becomes more auditable and controllable.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL