English(EN) Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attribution

Influcoder 为 LLM 提供可扩展的数据归因

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-11 17:58

研究人员开发了 Influcoder，这是一种旨在有效归因单个训练数据样本对大型语言模型 (LLM) 影响的新方法。该方法解决了现有影响函数方法的可扩展性和速度限制，使其适用于大型数据集。Influcoder 旨在通过识别可能导致模型出现不良行为（如毒性）的样本来帮助策展高质量数据集。 AI

影响能够更有效地对大型语言模型进行数据集策展和调试。

排序理由该集群描述了一篇详细介绍 LLM 数据归因新方法的最新研究论文。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Dimitri Kachler, Damien Sileo, Pascal Denis · 2026-06-12 04:00

Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attribution

arXiv:2606.13668v1 Announce Type: new Abstract: With the growth of LLMs' (Large Language Models) capabilities, there has been an increasing push to curate high quality datasets by filtering samples in the training data. In general, Data Attribution (DA) methods aim to estimate ho…
arXiv cs.CL TIER_1 English(EN) · Pascal Denis · 2026-06-11 17:58

Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attribution

With the growth of LLMs' (Large Language Models) capabilities, there has been an increasing push to curate high quality datasets by filtering samples in the training data. In general, Data Attribution (DA) methods aim to estimate how individual samples in a training dataset can p…