CAREF: Calibration-Aware Regularization for Explanation Faithfulness Without Rationale Supervision

arXiv:2605.27835v1 Announce Type: new Abstract: We introduce CAREF, a parameter-efficient fine-tuning framework that jointly optimizes predictive accuracy and explanation faithfulness via calibration-aware regularization. At its core, CAREF couples entropy-based calibration with token-level sparsity control through a single unified loss, the Calibration-Aware Regularization for Explanation Faithfulness (LSCED), without requiring rationale supervision. Evaluated on four NLE benchmarks (COS-E, ECQA, ComVE, e-SNLI) with Flan-T5, our lightweight CAREF-AQ variant attains the best average accuracy (
The rapid advancement of large language models necessitates improved explainability and faithfulness, as their applications become more critical and widespread, creating a 'pull' for solutions like CAREF.
Improving the trustworthiness and reliability of AI explanations is crucial for broader adoption, regulatory acceptance, and the safe deployment of increasingly autonomous systems, especially as AI integrates into sensitive sectors.
This development makes AI explanations more faithful without requiring expensive human-annotated rationale data, potentially accelerating the development and deployment of more transparent and accountable AI systems.
- · AI developers
- · AI ethics and safety researchers
- · Sectors requiring high AI explainability (e.g., healthcare, finance)
- · Users of large language models
- · Platforms reliant on opaque black-box AI
- · Techniques requiring extensive human rationale supervision
Increased capability to understand and debug complex AI models.
Faster and more widespread adoption of AI in regulated and high-stakes environments due to enhanced transparency.
Potential for new regulatory frameworks and industry standards to mandate explainability features directly inspired by or utilizing such transparent models.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG