PulseAugur
EN
LIVE 12:06:51

New benchmark tests LLMs on SMS-to-webpage fraud detection

Researchers have introduced FraudSMSWalker, a new benchmark designed to evaluate the capabilities of agentic large language models in detecting SMS-based fraud that directs users to malicious webpages. The benchmark masks URLs and other reputation shortcuts, forcing models to rely solely on the SMS content and sanitized webpage evidence to make fraud judgments. Initial evaluations show that while current agents can identify some suspicious cues, they struggle with maintaining accuracy for benign cases and often base their predictions on weak evidence. AI

IMPACT This benchmark aims to improve LLM agents' ability to detect sophisticated cross-channel fraud by removing reputation shortcuts.

RANK_REASON The cluster describes a new academic paper introducing a benchmark for evaluating LLMs on a specific task.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Y. H. Zhou, Z. M. Ma, Y. J. Zhou, Y. T. Li, H. X. Xiang, Y. M. Cheng, T. L. Chen, K. J. Zhang, Z. H. Nan, J. H. Ni, Z. Wu, Q. Y. Pan, S. Zhang, S. Cheng, M. Y. Luo ·

    FraudSMSWalker: Benchmarking Agentic Large Language Models for SMS-to-Webpage Fraud Detection

    arXiv:2606.16659v1 Announce Type: new Abstract: SMS fraud is increasingly cross-channel: a message directs the user to a webpage, and the final risk depends on how the SMS claim aligns with the page content and requested user action. However, existing evaluations either focus on …

  2. arXiv cs.CL TIER_1 English(EN) · M. Y. Luo ·

    FraudSMSWalker: Benchmarking Agentic Large Language Models for SMS-to-Webpage Fraud Detection

    SMS fraud is increasingly cross-channel: a message directs the user to a webpage, and the final risk depends on how the SMS claim aligns with the page content and requested user action. However, existing evaluations either focus on message-only smishing classification or expose U…