
arXiv:2512.02318v4 Announce Type: replace-cross Abstract: This paper studies how multimodal large language models (MLLMs) undermine the security guarantees of visual CAPTCHA. We identify the attack surface where an adversary can cheaply automate CAPTCHA solving using off-the-shelf models. We evaluate 7 representative MLLMs on 18 real-world CAPTCHA task types, measuring single-shot accuracy, success under limited retries, end-to-end latency, and per-solve cost. We further validate our findings through a supplemental external dataset and an adaptive-attacker setting with session memory, while al
The rapid advancement and accessibility of multimodal large language models (MLLMs) have made it feasible to cheaply automate tasks previously thought to require human-like perception, such as solving visual CAPTCHAs.
This development highlights a critical new attack surface in cybersecurity, as traditional mechanisms for distinguishing humans from bots are increasingly vulnerable and easily circumvented by off-the-shelf AI models.
The reliance on visual CAPTCHAs for security must now be re-evaluated as MLLMs can nullify their effectiveness, forcing a shift to more sophisticated or multi-factor authentication methods.
- · Cybersecurity firms developing advanced bot detection
- · AI developers focused on defensive AI
- · Adversarial AI researchers
- · Websites and services relying solely on CAPTCHAs
- · Developers of traditional CAPTCHA systems
- · Sectors vulnerable to automated credential stuffing
Automated bots will have significantly easier access to online services and systems protected by visual CAPTCHAs.
A rapid innovation cycle will ensue in bot detection and human verification methods, moving beyond purely visual tests.
The definition of 'human' online will become increasingly blurred, potentially accelerating the development of robust 'proof-of-human' protocols across the internet.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI