
arXiv:2510.10988v2 Announce Type: replace-cross Abstract: Learning-to-Defer (L2D) enables hybrid decision-making by routing inputs either to a predictor or to external experts. While promising, L2D is highly vulnerable to adversarial perturbations, which can not only flip predictions but also manipulate deferral decisions. Prior robustness analyses focus solely on two-stage settings, leaving open the end-to-end (one-stage) case where predictor and allocation are trained jointly. We introduce the first framework for adversarial robustness in one-stage L2D, covering both classification and regre
As AI models are increasingly deployed in hybrid decision-making systems, ensuring their robustness against adversarial attacks in real-world, end-to-end scenarios becomes critical for trust and reliability.
A strategic reader needs to understand that AI systems, even those designed for human-AI collaboration, are vulnerable to manipulation, which has implications for security, reliability, and deployment scaling in critical applications.
This research provides the first framework to address adversarial robustness in one-stage learning-to-defer systems, moving beyond theoretical two-stage analyses to practical, jointly-trained deployments.
- · AI robustness researchers
- · Developers of secure AI systems
- · High-stakes AI deployment sectors
- · Adversarial attackers
- · AI systems lacking robustness frameworks
- · Sectors reliant on unhardened hybrid AI
One-stage Learning-to-Defer (L2D) systems can now be systematically evaluated and defended against adversarial attacks, improving their reliability.
Increased robustness will accelerate the adoption of L2D in sensitive applications like finance, defense, and healthcare where trust is paramount.
The development of robust hybrid AI systems could lead to new regulatory standards and certification processes for AI safety and security.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG