English(EN) Large Byte Model: Teaching Language Models About Compiled Code

新的原生字节大模型可理解恶意软件二进制文件

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-03 04:00

研究人员开发了一种新颖的大型字节模型（LBM），能够处理和理解可执行程序的原始字节表示。这种原生字节大模型使用专门的字节分词器来回答有关恶意软件二进制文件的复杂问题，在恶意软件家族分类（69%）和架构分类（98%）等任务中取得了高准确率。该研究强调了在训练中融入领域特定知识对于有效进行恶意软件分析的重要性，因为通用大模型不足以达到此目的。 AI

影响引入了一种用于直接分析编译代码的新模型架构，有望改进恶意软件检测和逆向工程。

排序理由该集群包含一篇详细介绍新模型架构及其能力的论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Florian St\"ortz, Catalin-Andrei Stan, Alexandru Dinu, Sandra Servia-Rodr\'iguez, Mihaela Gaman, Calin Miron, Edward Raff · 2026-06-03 04:00

大型字节模型：教会语言模型理解编译代码

arXiv:2606.02834v1 Announce Type: cross Abstract: Malware analysis starts with the raw bytes of an executable program, and tools to "lift" these to higher-level representations, such as assembly, are expensive and subject to error. Large Language Models (LLMs) cannot process raw …

报道来源 [1]

大型字节模型：教会语言模型理解编译代码

相关实体

相关话题