SIGNALAI·Jun 15, 2026, 4:00 AMSignal75Short term

COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers

arXiv:2512.02318v4 Announce Type: replace-cross Abstract: This paper studies how multimodal large language models (MLLMs) undermine the security guarantees of visual CAPTCHA. We identify the attack surface where an adversary can cheaply automate CAPTCHA solving using off-the-shelf models. We evaluate 7 representative MLLMs on 18 real-world CAPTCHA task types, measuring single-shot accuracy, success under limited retries, end-to-end latency, and per-solve cost. We further validate our findings through a supplemental external dataset and an adaptive-attacker setting with session memory, while al

Why this matters

Why now

The rapid advancement and accessibility of multimodal large language models (MLLMs) have made it feasible to cheaply automate tasks previously thought to require human-like perception, such as solving visual CAPTCHAs.

Why it’s important

This development highlights a critical new attack surface in cybersecurity, as traditional mechanisms for distinguishing humans from bots are increasingly vulnerable and easily circumvented by off-the-shelf AI models.

What changes

The reliance on visual CAPTCHAs for security must now be re-evaluated as MLLMs can nullify their effectiveness, forcing a shift to more sophisticated or multi-factor authentication methods.

Winners

· Cybersecurity firms developing advanced bot detection
· AI developers focused on defensive AI
· Adversarial AI researchers

Losers

· Websites and services relying solely on CAPTCHAs
· Developers of traditional CAPTCHA systems
· Sectors vulnerable to automated credential stuffing

Second-order effects

Direct

Automated bots will have significantly easier access to online services and systems protected by visual CAPTCHAs.

Second

A rapid innovation cycle will ensue in bot detection and human verification methods, moving beyond purely visual tests.

Third

The definition of 'human' online will become increasingly blurred, potentially accelerating the development of robust 'proof-of-human' protocols across the internet.

Editorial confidence: 90 / 100 · Structural impact: 60 / 100

Original report

This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.

Read at arXiv cs.AI

#cs.CR #cs.AI

Tracked by The Continuum Brief · live intelligence network

The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.