English(EN) Less Data, More Security: Advancing Cybersecurity LLMs Specialization via Resource-Efficient Domain-Adaptive Continuous Pre-training with Minimal Tokens

新方法用更少数据增强网络安全大语言模型

作者 PulseAugur 编辑部 · [1 个来源] · 2026-07-03 04:00

研究人员开发了一种名为领域自适应持续预训练（DAP）的资源高效方法，用于网络安全任务的大语言模型（LLMs）专业化。他们使用了一个精选的1.26亿词语料库和一个分布式FSDP流水线，适配了Llama-3.1-8B、DeepSeek-R1-Distill-Qwen-14B和Llama-3.3-70B-Instruct模型。适配后的Llama-3.3-70B-Ins-DAP模型在使用显著少于同类模型的训练数据的情况下，在三个网络安全基准测试中取得了最先进的性能。 AI

影响这项研究展示了一种更有效的方法来创建专门的网络安全AI模型，有可能降低计算成本并加速威胁分析AI助手的开发。

排序理由该集群包含一篇详细介绍大语言模型适配新方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Salahuddin Salahuddin, Ahmed Hussain, Jussi L\"opp\"onen, Toni Jutila · 2026-07-03 04:00

Less Data, More Security: Advancing Cybersecurity LLMs Specialization via Resource-Efficient Domain-Adaptive Continuous Pre-training with Minimal Tokens

arXiv:2507.02964v2 Announce Type: replace-cross Abstract: The increasing scale of AI workloads demands High-Performance Computing (HPC) infrastructure and training methodologies that are both scalable and sustainable. While Large Language Models (LLMs) demonstrate exceptional nat…

报道来源 [1]

Less Data, More Security: Advancing Cybersecurity LLMs Specialization via Resource-Efficient Domain-Adaptive Continuous Pre-training with Minimal Tokens

相关实体

相关话题