Open-source LLMs struggle with complex cyber threat intelligence classification

By PulseAugur Editorial · [1 sources] · 2026-06-16 17:04

A new study published on arXiv evaluates the performance of seven open-source large language models (LLMs) in classifying cyber threat intelligence (CTI) reports using the MITRE ATT&CK framework. Researchers developed a dataset of 2,076 human-annotated sentences from complex CTI reports, mapping them to 114 unique ATT&CK techniques. The highest-performing LLM achieved a micro-averaged F1 score of 0.22, indicating that current open-source LLMs are not yet sufficient for production-grade ATT&CK classification. The study found a positive correlation between LLM parameter size and performance, but prompt strategy and temperature did not yield significant gains. AI

IMPACT Current open-source LLMs are insufficient for production-grade ATT&CK classification, highlighting a gap in their ability to process complex, real-world threat intelligence.

RANK_REASON Academic paper evaluating open-source LLMs on a specific task. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Md Rayhanur Rahman · 2026-06-16 17:04

Evaluating Open-Source LLMs for Multi-Label ATT&CK Technique Classification on CTI Reports

Classifying Cyber Threat Intelligence (CTI) using MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) is essential for proactive defense, but historically required extensive human effort. Pre-Large Language Model (LLM) automation sped up this process, but could n…

COVERAGE [1]

Evaluating Open-Source LLMs for Multi-Label ATT&CK Technique Classification on CTI Reports

RELATED ENTITIES

RELATED TOPICS