Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents

arXiv:2602.13379v2 Announce Type: replace-cross Abstract: LLM-based agents are becoming increasingly capable, yet their safety lags behind. This creates a gap between what agents can do and should do. This gap widens as agents engage in multi-turn interactions and employ diverse tools, introducing new risks overlooked by existing benchmarks. To systematically scale safety testing into multi-turn, tool-realistic settings, we propose a principled taxonomy that transforms single-turn harmful tasks into multi-turn attack sequences. Using this taxonomy, we construct MT-AgentRisk (Multi-Turn Agent R
The rapid development and deployment of LLM-based agents necessitate advanced safety benchmarking as their capabilities expand into multi-turn, tool-using scenarios.
The safety of AI agents is paramount for their widespread adoption and integration into critical systems; this research highlights escalating risks and offers a new framework for assessment.
Existing safety benchmarks are now inadequate for multi-turn, tool-using AI agents, demanding new methodologies to prevent unforeseen harmful outcomes.
- · AI safety researchers
- · Organizations developing secure agent platforms
- · Governments establishing AI regulations
- · Developers neglecting multi-turn safety
- · Users engaging with unvetted AI agents
- · Companies relying on outdated safety benchmarks
Increased focus on robust safety protocols for autonomous AI agents will become a competitive differentiator.
New regulatory frameworks specifically addressing multi-turn agent safety and accountability will emerge.
The development of 'AI safety auditing' as a specialized and high-demand professional service will accelerate.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL