The Perceived Fragility of Explanations in Audio Models: Manipulation of Attribution with Unchanged Predictions

arXiv:2606.14466v1 Announce Type: cross Abstract: This paper investigates the fragility of post-hoc explanation methods in audio deepfake detection. While previous work on explanation manipulation focused on images using standard $L_p$ metrics, we introduce a psychoacoustic framework that optimizes inaudible perturbations to decouple model attributions from final classifications. We evaluate this vulnerability across state-of-the-art architectures under strict prediction-preserving constraints. By evaluating the manipulation cost through domain-specific perceptual audio quality metrics alongsi
The rapid advancement and deployment of AI models, particularly in sensitive areas like deepfake detection, necessitates a deeper understanding of their vulnerabilities and the robustness of their explainability methods.
This research highlights a critical vulnerability in AI transparency, demonstrating that interpretations of AI decisions can be manipulated even when the core prediction remains accurate, impacting trust and accountability.
The understanding that post-hoc explanations for AI models can be unreliable and manipulable introduces a new layer of complexity for AI safety, auditing, and deployment, especially in high-stakes applications.
- · AI safety researchers
- · Cybersecurity firms specializing in AI red-teaming
- · Developers of robust explainable AI (XAI) methods
- · Platforms relying solely on current post-hoc XAI for trust
- · Organizations deploying AI without considering explanation vulnerability
- · Users who implicitly trust AI explanations
Increased scrutiny and demand for more robust and secure explainable AI (XAI) techniques across various domains.
Potential for new attack vectors and adversarial manipulations targeting AI explanations, leading to misinformed human decisions.
A broader philosophical debate on the ultimate trustworthiness of AI systems, even those that appear to provide explanations for their decisions.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI