PulseAugur / Brief
EN
LIVE 12:08:32

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. FraudSMSWalker: Benchmarking Agentic Large Language Models for SMS-to-Webpage Fraud Detection

    Researchers have introduced FraudSMSWalker, a new benchmark designed to evaluate the capabilities of agentic large language models in detecting SMS-based fraud that directs users to malicious webpages. The benchmark masks URLs and other reputation shortcuts, forcing models to rely solely on the SMS content and sanitized webpage evidence to make fraud judgments. Initial evaluations show that while current agents can identify some suspicious cues, they struggle with maintaining accuracy for benign cases and often base their predictions on weak evidence. AI

    IMPACT This benchmark aims to improve LLM agents' ability to detect sophisticated cross-channel fraud by removing reputation shortcuts.