PulseAugur
EN
LIVE 07:29:00

New Dataset and SLM Evaluation for Multi-Source Cybersecurity Logs

Researchers have developed a new multi-source cybersecurity dataset, combining system, network, and browser logs with detailed MITRE ATT&CK technique labels. This dataset, comprising 870 sessions and approximately 2.3 million events, addresses the limitations of existing datasets by providing granular malicious activity labeling. To demonstrate its utility, three Small Language Models (SLMs) – Qwen2.5-1.5B, Llama-3.2-3B, and Phi-4-Mini – were fine-tuned using Low-Rank Adaptation (LoRA). The fine-tuning significantly improved performance across all models and metrics, with chunk classification accuracy jumping from around 8% to 90-97%, though technique identification remained a challenge with a best exact-match accuracy of 42%. AI

IMPACT This dataset and evaluation could advance the development of more robust AI-powered cybersecurity threat detection systems.

RANK_REASON The cluster describes a new academic dataset and evaluation of existing models, fitting the research bucket. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.LG TIER_1 English(EN) · Md Rayhanur Rahman ·

    Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation

    Multi-stage cyberattacks span system, network, and browser logs. Detecting them requires correlating events across all three sources. Machine learning methods can learn these cross-source patterns, but they need labeled multi-source data. Existing public datasets fall short. Netw…