PulseAugur
EN
LIVE 08:23:53

Open-source LLMs fall short on complex cyber threat intelligence classification

A new research paper evaluates the performance of seven open-source large language models (LLMs) on classifying complex cyber threat intelligence (CTI) reports. The study constructed a dataset of 2,076 human-annotated sentences mapped to 114 MITRE ATT&CK techniques. The highest-performing LLM achieved a micro-averaged F1 score of 0.22, indicating that current open-source LLMs are not yet sufficient for production-grade ATT&CK classification. The research found a positive correlation between LLM parameter size and performance, but prompt strategy and temperature did not yield significant gains. AI

IMPACT Current open-source LLMs demonstrate insufficient capability for complex cyber threat intelligence classification, highlighting a need for further research and development in this domain.

RANK_REASON The cluster contains an academic paper evaluating LLM performance on a specific task.

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Open-source LLMs fall short on complex cyber threat intelligence classification

COVERAGE [2]

  1. arXiv cs.LG TIER_1 English(EN) · Ahmed Ryan, Saad Sakib Noor, Md Erfan, Shaswata Mitra, Sudip Mittal, Md Rayhanur Rahman ·

    Evaluating Open-Source LLMs for Multi-Label ATT&CK Technique Classification on CTI Reports

    arXiv:2606.18166v1 Announce Type: cross Abstract: Classifying Cyber Threat Intelligence (CTI) using MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) is essential for proactive defense, but historically required extensive human effort. Pre-Large Language Mo…

  2. arXiv cs.LG TIER_1 English(EN) · Md Rayhanur Rahman ·

    Evaluating Open-Source LLMs for Multi-Label ATT&CK Technique Classification on CTI Reports

    Classifying Cyber Threat Intelligence (CTI) using MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) is essential for proactive defense, but historically required extensive human effort. Pre-Large Language Model (LLM) automation sped up this process, but could n…