English(EN) Quantifying the Agreement Between Data-Influence and Data-Similarity to Understand LLM Behavior

新研究量化了LLM中数据影响与数据相似性之间的一致性

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-22 17:00

研究人员量化了用于将LLM输出追溯到其训练数据的数据相似性与数据影响度量之间的一致性。他们的发现表明，这两种度量之间存在显著的重叠，数据影响度量为数据相似性确定的顶级文档分配了更一致的排名。在对OLMo2-1B、Qwen3-1.7B、LlaMa3.2-1B、Gemma3-1B和GPT2等模型的实验中都观察到了这种不对称性。该研究建议利用这种不对称性，通过使用数据影响度量来改进数据相似性结果，从而实现更好的成本-准确性权衡。 AI

影响提供了一种理解LLM行为的新方法，并可能优化训练数据分析。

排序理由该集群包含一篇学术论文，详细介绍了理解LLM行为的新研究方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Mohammad Emtiyaz Khan · 2026-06-22 17:00

Quantifying the Agreement Between Data-Influence and Data-Similarity to Understand LLM Behavior

One way to understand LLM behavior is to trace its output back to the training data. Two types of measures are commonly used for output tracing: data-similarity and data-influence. The former is cheaper while the latter is believed to be more accurate. Even though many works have…

报道来源 [1]

Quantifying the Agreement Between Data-Influence and Data-Similarity to Understand LLM Behavior

相关实体

相关话题