English(EN) Detecting Functional Memorization in Code Language Models

新方法检测代码LLM中的功能性记忆

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-12 04:00

研究人员开发了一种新方法来检测代码语言模型中的功能性记忆，超越了简单的文本重叠。通过比较一个接触过目标代码的中间训练模型和一个参考模型，他们可以识别出是否正在复制功能逻辑，而不仅仅是逐字文本。这项研究使用了Olmo-3-32B和Python代码，采用文本相似性和基于执行的功能相似性指标来证明功能性记忆的存在。研究结果强调了需要先进的审计指标来捕捉代码生成中的功能等价性。 AI

影响强调了代码生成模型需要更复杂的评估指标，影响了对其安全性与原创性的评估方式。

排序理由这是一篇详细介绍代码语言模型新评估方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Matthieu Meeus, Anil Ramakrishna, Matthew Grange, Zheng Xu, Luca Melis · 2026-06-12 04:00

Detecting Functional Memorization in Code Language Models

arXiv:2606.12764v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used to generate code at scale. Meanwhile, prior work has investigated whether training data may be recoverable from model outputs, by auditing the textual overlap between training exa…

报道来源 [1]

Detecting Functional Memorization in Code Language Models

相关实体

相关话题