Expanded dataset boosts transformer models in smishing detection

By PulseAugur Editorial · [2 sources] · 2026-06-05 03:46

Researchers have developed COVA-X, an expanded dataset containing 10,985 synthetic conversations designed to detect multi-turn smishing attacks, particularly those targeting the elderly. This new dataset, an improvement over the initial COVA dataset, addresses several issues in the generation pipeline to provide cleaner and more comprehensive data. The expanded dataset enabled the Longformer model to outperform XGBoost in detecting smishing attempts, achieving higher accuracy and macro F1 scores, which highlights the need for larger conversational corpora to leverage the full potential of transformer models. AI

IMPACT Improved datasets and models for smishing detection can enhance cybersecurity defenses against evolving scam tactics.

RANK_REASON The cluster contains a research paper detailing a new dataset and improved model performance on a specific task.

Read on arXiv cs.CL →

paper
safety

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Expanded dataset boosts transformer models in smishing detection

COVERAGE [2]

arXiv cs.CL TIER_1 English(EN) · Carl Lochstampfor, Ayan Roy · 2026-06-08 04:00

An Expanded Synthetic Conversation Dataset for Multi-Turn Smishing Detection

arXiv:2606.06879v1 Announce Type: new Abstract: Our prior work introduced COVA, a synthetically generated multi-turn conversational smishing dataset of 3,201 labeled conversations, establishing baseline detection benchmarks across eight models. While XGBoost with TF-IDF features …
arXiv cs.CL TIER_1 English(EN) · Ayan Roy · 2026-06-05 03:46

An Expanded Synthetic Conversation Dataset for Multi-Turn Smishing Detection

Our prior work introduced COVA, a synthetically generated multi-turn conversational smishing dataset of 3,201 labeled conversations, establishing baseline detection benchmarks across eight models. While XGBoost with TF-IDF features achieved the best performance, with 72.5\% accur…

COVERAGE [2]

An Expanded Synthetic Conversation Dataset for Multi-Turn Smishing Detection

An Expanded Synthetic Conversation Dataset for Multi-Turn Smishing Detection

RELATED ENTITIES

RELATED TOPICS