A new research paper evaluates the performance of seven open-source large language models (LLMs) on classifying complex cyber threat intelligence (CTI) reports. The study constructed a dataset of 2,076 human-annotated sentences mapped to 114 MITRE ATT&CK techniques. The highest-performing LLM achieved a micro-averaged F1 score of 0.22, indicating that current open-source LLMs are not yet sufficient for production-grade ATT&CK classification. The research found a positive correlation between LLM parameter size and performance, but prompt strategy and temperature did not yield significant gains. AI
IMPACT Current open-source LLMs demonstrate insufficient capability for complex cyber threat intelligence classification, highlighting a need for further research and development in this domain.
RANK_REASON The cluster contains an academic paper evaluating LLM performance on a specific task.
- cyber threat intelligence
- Hugging Face
- large language model
- MITRE Adversarial Tactics, Techniques, and Common Knowledge
- Mitre ATT&CK
- open source LLMs
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →