Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 5d

HIDBench: Benchmarking Large Language Models for Host-Based Intrusion Detection

Researchers have developed HIDBench, a new benchmark designed to evaluate the effectiveness of large language models (LLMs) in host-based intrusion detection using system logs. The benchmark integrates three public datasets and a pipeline for processing raw telemetry into LLM-friendly formats, simulating realistic detection scenarios. Evaluations of leading LLMs showed significant performance variations, with models struggling with noisy and complex log data, indicating that while LLMs show promise for intrusion detection, their reliability is contingent on data complexity and robust system design. AI

IMPACT Establishes a new evaluation standard for LLMs in cybersecurity, highlighting current limitations in intrusion detection.

Large Language Models
Host-Based Intrusion Detection Systems
DARPA-E3
HIDBench
NodLink
DARPA-E5