FraudSMSWalker: Benchmarking Agentic Large Language Models for SMS-to-Webpage Fraud Detection

arXiv:2606.16659v1 Announce Type: new Abstract: SMS fraud is increasingly cross-channel: a message directs the user to a webpage, and the final risk depends on how the SMS claim aligns with the page content and requested user action. However, existing evaluations either focus on message-only smishing classification or expose URL and domain cues that allow models to rely on reputation shortcuts. To address this gap, we introduce \textbf{FraudSMSWalker}, a controlled benchmark for URL-masked SMS-to-webpage fraud judgment. FraudSMSWalker contains 699 bilingual chains, including 332 fraudulent and
The proliferation of advanced AI models and the increasing sophistication of multi-channel fraud necessitate new methods for detection that go beyond surface-level analysis, aligning with current AI development trends.
This benchmark addresses a critical vulnerability in current fraud detection by evaluating agentic LLMs on their ability to identify complex, cross-channel SMS-to-webpage fraud, which is increasingly prevalent and difficult to detect.
Existing fraud detection methods, often reliant on simple message or URL cues, are now shown to be insufficient, pushing the industry towards more complex, agentic AI solutions that can 'walk' through user journeys.
- · AI-powered cybersecurity firms
- · Financial institutions
- · Large Language Model developers
- · Consumers
- · Traditional fraud detection vendors
- · SMS fraudsters
- · Organizations with inadequate cybersecurity
Increased efficacy of fraud detection systems against multi-channel attacks.
A race among AI developers to create more sophisticated and autonomous 'agentic' models for cybersecurity applications, beyond just fraud.
A shift in cybercrime tactics towards more complex, human-like social engineering that LLMs may still struggle with, leading to an arms race.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL