PulseAugur
EN
LIVE 13:49:20

Expanded dataset boosts transformer models in smishing detection

Researchers have developed COVA-X, an expanded dataset containing 10,985 synthetic conversations designed to detect multi-turn smishing attacks, particularly those targeting the elderly. This new dataset, an improvement over the initial COVA dataset, addresses several issues in the generation pipeline to provide cleaner and more comprehensive data. The expanded dataset enabled the Longformer model to outperform XGBoost in detecting smishing attempts, achieving higher accuracy and macro F1 scores, which highlights the need for larger conversational corpora to leverage the full potential of transformer models. AI

IMPACT Improved datasets and models for smishing detection can enhance cybersecurity defenses against evolving scam tactics.

RANK_REASON The cluster contains a research paper detailing a new dataset and improved model performance on a specific task.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Expanded dataset boosts transformer models in smishing detection

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Carl Lochstampfor, Ayan Roy ·

    An Expanded Synthetic Conversation Dataset for Multi-Turn Smishing Detection

    arXiv:2606.06879v1 Announce Type: new Abstract: Our prior work introduced COVA, a synthetically generated multi-turn conversational smishing dataset of 3,201 labeled conversations, establishing baseline detection benchmarks across eight models. While XGBoost with TF-IDF features …

  2. arXiv cs.CL TIER_1 English(EN) · Ayan Roy ·

    An Expanded Synthetic Conversation Dataset for Multi-Turn Smishing Detection

    Our prior work introduced COVA, a synthetically generated multi-turn conversational smishing dataset of 3,201 labeled conversations, establishing baseline detection benchmarks across eight models. While XGBoost with TF-IDF features achieved the best performance, with 72.5\% accur…