English(EN) LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training

新的LaRA框架检测RL训练LLM中的数据污染

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-28 00:00

研究人员推出LaRA，一个新颖的框架，旨在检测经过强化学习（RL）训练后阶段的大型语言模型中的数据污染。与依赖输出层信号的现有方法不同，LaRA逐层分析内部表示。它采用三种指标——扰动敏感性、方向塌陷和局部表示刚性——来识别指示污染的几何偏差。实验表明，LaRA的协议在识别RL训练的推理模型中的污染方面，比传统的输出层基线更有效。 AI

影响通过检测数据污染，引入了一种确保RL训练LLM的可靠性和泛化能力的新方法。

排序理由该集群包含一篇学术论文，详细介绍了检测LLM中数据污染的新研究方法。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Minju Gwak, Minseo Kwak, Dongseok Lee, Guijin Son, Alan Ritter, Jaehyung Kim · 2026-05-29 04:00

LaRA：分层表示分析用于检测 RL 训练后数据污染

arXiv:2605.29888v1 Announce Type: cross Abstract: Reinforcement learning (RL) post-training has shown to improve reasoning in large language models (LLMs). However, there has been little exploration on the problem of data contamination in RL post-training, potentially undermining…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-28 00:00

LaRA：分层表示分析用于检测RL训练后数据污染

LaRA is a layer-wise representation analysis framework that detects data contamination in reinforcement learning-post-trained large language models by analyzing geometric deviations across model layers.

报道来源 [2]

LaRA：分层表示分析用于检测 RL 训练后数据污染

LaRA：分层表示分析用于检测RL训练后数据污染

相关实体

相关话题