New benchmark tests LLMs on SMS-to-webpage fraud detection

By PulseAugur Editorial · [2 sources] · 2026-06-15 12:53

Researchers have introduced FraudSMSWalker, a new benchmark designed to evaluate the capabilities of agentic large language models in detecting SMS-based fraud that directs users to malicious webpages. The benchmark masks URLs and other reputation shortcuts, forcing models to rely solely on the SMS content and sanitized webpage evidence to make fraud judgments. Initial evaluations show that while current agents can identify some suspicious cues, they struggle with maintaining accuracy for benign cases and often base their predictions on weak evidence. AI

IMPACT This benchmark aims to improve LLM agents' ability to detect sophisticated cross-channel fraud by removing reputation shortcuts.

RANK_REASON The cluster describes a new academic paper introducing a benchmark for evaluating LLMs on a specific task.

Read on arXiv cs.CL →

paper
safety

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.CL TIER_1 English(EN) · Y. H. Zhou, Z. M. Ma, Y. J. Zhou, Y. T. Li, H. X. Xiang, Y. M. Cheng, T. L. Chen, K. J. Zhang, Z. H. Nan, J. H. Ni, Z. Wu, Q. Y. Pan, S. Zhang, S. Cheng, M. Y. Luo · 2026-06-16 04:00

FraudSMSWalker: Benchmarking Agentic Large Language Models for SMS-to-Webpage Fraud Detection

arXiv:2606.16659v1 Announce Type: new Abstract: SMS fraud is increasingly cross-channel: a message directs the user to a webpage, and the final risk depends on how the SMS claim aligns with the page content and requested user action. However, existing evaluations either focus on …
arXiv cs.CL TIER_1 English(EN) · M. Y. Luo · 2026-06-15 12:53

FraudSMSWalker: Benchmarking Agentic Large Language Models for SMS-to-Webpage Fraud Detection

SMS fraud is increasingly cross-channel: a message directs the user to a webpage, and the final risk depends on how the SMS claim aligns with the page content and requested user action. However, existing evaluations either focus on message-only smishing classification or expose U…

COVERAGE [2]

FraudSMSWalker: Benchmarking Agentic Large Language Models for SMS-to-Webpage Fraud Detection

FraudSMSWalker: Benchmarking Agentic Large Language Models for SMS-to-Webpage Fraud Detection

RELATED ENTITIES

RELATED TOPICS