What LLMs explain is not what they believe: Evaluating explanation sufficiency under models' own input beliefs

arXiv:2606.28615v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed in high-stakes domains, where free-text explanations such as chain-of-thought and post-hoc rationales are used to justify model outputs. Yet it remains unclear whether these explanations are sufficient, i.e., if they contain enough information to explain the model's output-generating process. We generalize classical sufficiency from feature attributions to arbitrary explanations and prove that explanation sufficiency can change depending on the input distribution, which must be explicitly def
The increasing deployment of LLMs in high-stakes domains necessitates a deeper understanding of their explanatory capabilities, moving beyond surface-level rationales.
This research highlights a critical limitation in evaluating LLM explanations, suggesting that current methods may not accurately reflect the models' internal decision processes, which could lead to significant risks in sensitive applications.
The understanding of LLM explanation sufficiency must now account for input distribution dependency, implying a need for more robust evaluation methods that probe a model's true 'beliefs' rather than just its output justifications.
- · AI explainability researchers
- · Developers of transparent AI systems
- · Regulatory bodies
- · LLM operators relying solely on current explanation methods
- · Users unaware of explanation limitations
- · Developers of opaque AI systems
Increased scrutiny and demand for more rigorous explanation methods for LLMs in critical applications such as healthcare and finance.
Development of new AI models and architectures designed with intrinsic explainability and verifiable 'beliefs' rather than post-hoc rationalizations.
Potential slowdown in widespread LLM adoption in highly regulated sectors until better explanation sufficiency metrics and techniques are widely implemented.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG