
arXiv:2509.14959v3 Announce Type: replace-cross Abstract: In this paper, we investigate discrete optimal transport (DOT) as a black-box attack against modern automatic speaker verification (ASV) and anti-spoofing countermeasure (CM) systems. Our attack operates as a post-processing distribution-alignment step. Frame-level WavLM embeddings of generated speech (or another person speech) are aligned to an unpaired bona fide speech pool using entropic optimal transport and a top-k barycentric projection, followed by neural vocoding. Unlike gradient-based attacks, the proposed method requires no ac
The proliferation of advanced AI-driven audio systems creates a strong incentive for sophisticated adversarial attacks to emerge, challenging their robustness.
This research reveals a new and potent technique for audio adversarial attacks, directly impacting the security and reliability of critical AI applications like automatic speaker verification.
The vulnerability of audio AI systems to non-gradient-based, black-box attacks is significantly increased, requiring new defense strategies.
- · Cybersecurity researchers
- · Developers of robust AI defense mechanisms
- · Providers of less robust automatic speaker verification systems
- · Users relying on current audio anti-spoofing countermeasures
Increased pressure on AI developers to integrate advanced adversarial training and robust models into their audio systems.
Potential for new regulations or industry standards for the security and resilience of AI-driven voice authentication.
A shift in cyber warfare tactics to include sophisticated audio manipulation for intelligence gathering or disruption.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI