Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 23h · [2 sources]

FraudSMSWalker: Benchmarking Agentic Large Language Models for SMS-to-Webpage Fraud Detection

Researchers have introduced FraudSMSWalker, a new benchmark designed to evaluate the capabilities of agentic large language models in detecting SMS-based fraud that directs users to malicious webpages. The benchmark masks URLs and other reputation shortcuts, forcing models to rely solely on the SMS content and sanitized webpage evidence to make fraud judgments. Initial evaluations show that while current agents can identify some suspicious cues, they struggle with maintaining accuracy for benign cases and often base their predictions on weak evidence. AI

IMPACT This benchmark aims to improve LLM agents' ability to detect sophisticated cross-channel fraud by removing reputation shortcuts.

Hugging Face
arXiv
SMS
web page
URL
FraudSMSWalker