
arXiv:2605.31140v1 Announce Type: cross Abstract: Large Language Models (LLMs) remain highly vulnerable to diverse attacks, particularly in black-box settings where the internals of target models are inaccessible. Existing black-box defenses typically rely on pre-defined filtering heuristics, which often fail to generalize to unseen attack types and target model architectures. We introduce EvoDefense, an experience-guided co-evolving black-box defense paradigm. EvoDefense employs a guard LLM to detect malicious queries and an experience memory module to accumulate defense knowledge from previo
The proliferation of Large Language Models (LLMs) in various applications necessitates robust black-box defense mechanisms as their vulnerabilities become more apparent.
Sophisticated black-box defenses are critical for the secure deployment and trustworthiness of LLMs, directly impacting their commercial viability and adoption in sensitive areas.
The shift from predefined filtering heuristics to an 'experience-guided co-evolving black-box defense' represents a significant methodological change in AI security for LLMs.
- · AI security researchers
- · Organizations deploying LLMs
- · Developers of defensive AI architectures
- · Malicious actors targeting LLMs
- · Systems relying on static defense mechanisms
Increased resilience of LLMs against adversarial attacks, particularly in black-box scenarios.
Reduced incidence of successful LLM exploits, enhancing user trust and expanding LLM applications in critical infrastructure.
The acceleration of a defensive AI arms race, where defense mechanisms dynamically adapt to new attack vectors, pushing the frontier of AI security.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL