Which Defense Closes Which Threat? Attributing OWASP-LLM-Top-10 Coverage and Its Brittleness Under Paraphrasing

arXiv:2606.02822v1 Announce Type: cross Abstract: Production LLM applications stack several defense families -- refusal-phrase filters, token-budget controls, model allowlists, rate limits, tool-registry authentication -- yet existing breach-and-attack-simulation (BAS) benchmarks report a single aggregate coverage number, hiding which family closes which threat. We measure attribution. We add four OWASP-LLM-Top-10-aware agents to a 21-agent baseline scanner and target a lattice of four synthetic LLM endpoints: $L_0$ (no defenses), $L_1$ (refusal-only), $L_2$ (budget-only), and $L_3$ (full stac
The rapid deployment of LLM applications in production environments necessitates robust security measures, leading to increased research into understanding and measuring defense effectiveness.
This research provides critical insights into the efficacy of specific defense mechanisms against known LLM threats, moving beyond aggregate metrics to granular attribution.
Security practices for LLM applications will become more refined and targeted, enabling developers to select and layer defenses based on specific threat models and their proven effectiveness.
- · LLM application developers
- · Cybersecurity firms specializing in AI
- · Industries deploying LLMs
- · Malicious actors targeting LLMs
- · LLM security solutions based on aggregate metrics
Improved security posture for LLM-powered applications through better understanding of defense capabilities.
Development of more sophisticated, attribution-aware breach-and-attack simulation tools for AI systems.
Enhanced trust and broader adoption of LLM technologies across sensitive sectors due to demonstrable security advancements.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI