PulseAugur
EN
LIVE 16:18:04

New LaRA Framework Detects Data Contamination in RL-Trained LLMs

Researchers have introduced LaRA, a novel framework designed to detect data contamination in large language models that have undergone reinforcement learning (RL) post-training. Unlike existing methods that rely on output-level signals, LaRA analyzes internal representations layer by layer. It employs three metrics—perturbation sensitivity, directional collapse, and local representation rigidity—to identify geometric deviations indicative of contamination. Experiments demonstrate that LaRA's protocol is more effective than traditional output-level baselines in identifying contamination in RL-trained reasoning models. AI

IMPACT Introduces a new method for ensuring the reliability and generalization of RL-trained LLMs by detecting data contamination.

RANK_REASON The cluster contains an academic paper detailing a new research methodology for detecting data contamination in LLMs.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New LaRA Framework Detects Data Contamination in RL-Trained LLMs

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Minju Gwak, Minseo Kwak, Dongseok Lee, Guijin Son, Alan Ritter, Jaehyung Kim ·

    LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training

    arXiv:2605.29888v1 Announce Type: cross Abstract: Reinforcement learning (RL) post-training has shown to improve reasoning in large language models (LLMs). However, there has been little exploration on the problem of data contamination in RL post-training, potentially undermining…

  2. Hugging Face Daily Papers TIER_1 English(EN) ·

    LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training

    LaRA is a layer-wise representation analysis framework that detects data contamination in reinforcement learning-post-trained large language models by analyzing geometric deviations across model layers.