Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming

arXiv:2606.05233v1 Announce Type: cross Abstract: Recent computer-using-agent (CUA) red-teaming papers report prompt-injection attack success rates (ASR) of 42-98%, but these headline numbers cluster on retired models and on the most-vulnerable model in each paper's panel. We ask whether those techniques, reproduced as hand-crafted templates, still work against current frontier CUAs. We release CUA-HandCrafted, a public benchmark of 793 episodes spanning 24 multi-step web tasks, 56 attack templates, 8 attack families, and 4 system-prompt configurations. Against Claude Sonnet 4.6 and GPT-5.4 we
The proliferation of advanced AI agents necessitates immediate and robust safety testing to manage nascent risks as these systems become more integrated into critical functions.
This benchmark directly addresses the critical security vulnerabilities of frontier AI agents, providing essential tools and insights for developers and strategists concerned with safe, deployment and resilience.
The transparency and reproducibility of AI agent red-teaming are significantly improved, shifting the focus from historical attack success rates to current frontier models with a standardized methodology.
- · AI developers
- · Cybersecurity researchers
- · Organizations deploying AI agents
- · AI safety communities
- · Malicious actors exploiting AI agent vulnerabilities
- · AI models with weak safety protocols
Improved safety and resilience of AI agents through robust benchmarking and red-teaming practices.
Accelerated development of more secure and trustworthy AI agents, leading to broader adoption and integration into sensitive applications.
Enhanced public trust in AI technologies, potentially influencing regulatory frameworks and international standards for AI safety.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL