Faithfulness Evaluation for Decoder-only LLM Attributions with Controlled Retained Information

arXiv:2601.03089v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly evaluated with input attribution methods, yet comparing such explanations remains challenging. Existing soft-perturbation faithfulness metrics, such as Soft-NC and Soft-NS, can conflate attribution quality with the number of words retained during perturbation: attribution methods with larger average scores may keep more words and therefore obtain inflated scores. To address this issue, we propose $\pi$-Soft-NC and $\pi$-Soft-NS, an evaluation framework that compares attribution methods under
The rapid deployment and increasing reliance on Large Language Models (LLMs) across various applications necessitate robust and reliable methods for understanding their decision-making processes.
Improved faithfulness evaluation for LLM attributions is crucial for developing trustworthy AI, especially in sensitive domains, and for guiding responsible AI development.
This research introduces a more accurate framework for evaluating the faithfulness of LLM attribution methods, allowing for better comparison and selection of techniques to understand model behavior.
- · AI researchers
- · Developers of explainable AI (XAI) tools
- · Organizations deploying LLMs in critical applications
- · Poorly designed LLM attribution methods
- · Developers reliant on less rigorous evaluation metrics
The new evaluation metrics will lead to more robust and transparent LLM applications.
Increased trust in LLM outputs will accelerate adoption in regulated industries and high-stakes domains.
A clearer understanding of LLM reasoning will inform the design of more intrinsically interpretable and safer AI models.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG