
arXiv:2606.19625v1 Announce Type: new Abstract: We use training-data attribution as an interpretable tool for capability discovery, mapping which regions of the pretraining corpus support social-reasoning versus STEM-reasoning in OLMo3-7B. Training-data attribution measures how strongly each training document influences a model's predictions on a benchmark, but document-level scores are too noisy to identify which corpus regions support which capabilities, and prior work has emphasized factual knowledge rather than reasoning. We compute gradient-based attribution (TrackStar via Bergson) over a
This research provides a method to understand how specific training data influences an LLM's capabilities, which is becoming critical as models grow in complexity and their societal impact increases.
Understanding the provenance of capabilities in LMs can lead to more robust, ethical, and performant AI systems by enabling targeted training and mitigating unwanted biases or limitations.
The ability to attribute specific model capabilities to portions of its training data allows for more deliberate and precise engineering of AI, shifting from black-box development to more informed design.
- · AI developers
- · ML researchers
- · AI ethics and safety organizations
- · Developers relying on opaque model development
- · AI systems with unexplainable internal functions
Researchers gain a precise tool to identify which data components contribute to specific reasoning abilities in large language models.
This capability allows for more efficient and targeted data curation, potentially accelerating the development of specialized and reliable AI agents.
Improved provenance could lead to regulatory requirements for 'explainability' in AI capabilities, influencing future AI development and deployment standards.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL