Beyond Pass/Fail: Using Process Mining to Understand How LLMs Resist (and Fail) Red Team Attacks

arXiv:2606.07833v1 Announce Type: cross Abstract: Standard AI red teaming evaluations reduce adversarial campaigns to a single binary outcome, attack success rate (ASR), not taking into account the sequential structure of how models resist or yield to attacks. We propose applying process mining, a discipline for discovering and analyzing process models from event logs, to red teaming traces. We conduct a controlled experiment pitting 60 HarmBench prompts against two LLMs, GPT-OSS 120B and Llama 3.3 70B, using 10 prompt mutation strategies over up to 110 attempts per prompt. From the resulting
This research emerges as AI red teaming and safety become critical for deploying LLMs, requiring more nuanced evaluation methods beyond simple pass/fail metrics.
Understanding the detailed resistance mechanisms of LLMs to adversarial attacks is crucial for developing more robust and secure AI, influencing regulatory frameworks and enterprise adoption.
The proposed application of process mining offers a richer, sequential analysis of LLM vulnerabilities and defenses, shifting evaluation from binary outcomes to detailed adversarial process flows.
- · AI safety researchers
- · LLM developers
- · Cybersecurity firms
- · Regulatory bodies
- · Malicious actors
- · Undifferentiated red teaming services
Improved understanding of LLM failure modes leads to more resilient and safer AI systems.
This detailed analysis could inform better guardrail design and AI governance policies for critical applications.
Enhanced AI security may accelerate broader adoption of LLMs in sensitive sectors, contingent on sustained progress in robustness.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI