A new study published on arXiv evaluates the performance of seven open-source large language models (LLMs) in classifying cyber threat intelligence (CTI) reports using the MITRE ATT&CK framework. Researchers developed a dataset of 2,076 human-annotated sentences from complex CTI reports, mapping them to 114 unique ATT&CK techniques. The highest-performing LLM achieved a micro-averaged F1 score of 0.22, indicating that current open-source LLMs are not yet sufficient for production-grade ATT&CK classification. The study found a positive correlation between LLM parameter size and performance, but prompt strategy and temperature did not yield significant gains. AI
IMPACT Current open-source LLMs are insufficient for production-grade ATT&CK classification, highlighting a gap in their ability to process complex, real-world threat intelligence.
RANK_REASON Academic paper evaluating open-source LLMs on a specific task. [lever_c_demoted from research: ic=1 ai=1.0]
- cyber threat intelligence
- Hugging Face
- large language model
- MITRE Adversarial Tactics, Techniques, and Common Knowledge
- Mitre ATT&CK
- open source LLMs
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →