PulseAugur
EN
LIVE 14:39:17

New frameworks tackle data contamination in code LLMs and backtesting

Two new research papers address the critical issue of data contamination in large language models, particularly for code generation and backtesting scenarios. The first paper introduces TRACER, a framework designed to detect semantic similarities in code that indicate contamination, achieving high accuracy even with models like GPT-5. The second paper proposes Shapley-DCLR and TimeSPEC, methods to measure and mitigate temporal contamination in LLM backtesting by focusing on decision-driving claims and ensuring predictions are based solely on pre-cutoff knowledge. AI

IMPACT These methods aim to improve the reliability and trustworthiness of LLM evaluations, crucial for their safe and effective deployment.

RANK_REASON Two academic papers published on arXiv introducing novel methods for detecting and mitigating data contamination in LLMs.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Yifeng Di, Xuliang Huang, Tianyi Zhang ·

    TRACER: A Semantic-Aware Framework for Fine-Grained Contamination Detection in Code LLMs

    arXiv:2605.24079v1 Announce Type: cross Abstract: Data contamination is a known threat to the reliability of model evaluation. However, it remains underexplored in code large language models (LLMs), where contamination often goes beyond exact duplication. We present TRACER, a sem…

  2. arXiv cs.AI TIER_1 English(EN) · Zeyu Zhang, Ryan Chen, Bradly C. Stadie ·

    All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection and Mitigation in LLM Backtesting

    arXiv:2602.17234v2 Announce Type: replace Abstract: Backtesting LLMs on resolved events assumes models reason only from pre-cutoff knowledge, yet pretrained models inevitably leak post-cutoff knowledge. We introduce a claim-level evaluation framework that decomposes prediction ra…