PulseAugur
EN
LIVE 05:07:22

New dataset combines system, network, and browser logs for cybersecurity

Researchers have developed a new multi-source cybersecurity dataset by combining system, network, and browser logs from Windows endpoints. This dataset, containing 870 sessions and approximately 2.3 million events, is labeled with specific MITRE ATT&CK technique IDs, addressing a gap in existing public datasets. To test its utility, three Small Language Models (SLMs) – Qwen2.5-1.5B, Llama-3.2-3B, and Phi-4-Mini – were fine-tuned using Low-Rank Adaptation (LoRA). The fine-tuning significantly improved chunk classification accuracy from around 8% to 90-97%, though technique identification remained a challenge with a best exact-match accuracy of 42%. AI

IMPACT This new dataset and fine-tuned SLM evaluations could improve multi-stage cyberattack detection capabilities.

RANK_REASON The cluster describes a new academic dataset and evaluation of existing models on that dataset, published on arXiv.

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New dataset combines system, network, and browser logs for cybersecurity

COVERAGE [2]

  1. arXiv cs.LG TIER_1 English(EN) · Abir Ashab Niloy, Ahmed Ryan, Imamul Hossain Rafi, Md Erfan, Md Rayhanur Rahman ·

    Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation

    arXiv:2606.18190v1 Announce Type: cross Abstract: Multi-stage cyberattacks span system, network, and browser logs. Detecting them requires correlating events across all three sources. Machine learning methods can learn these cross-source patterns, but they need labeled multi-sour…

  2. arXiv cs.LG TIER_1 English(EN) · Md Rayhanur Rahman ·

    Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation

    Multi-stage cyberattacks span system, network, and browser logs. Detecting them requires correlating events across all three sources. Machine learning methods can learn these cross-source patterns, but they need labeled multi-source data. Existing public datasets fall short. Netw…