New frameworks tackle data contamination in code LLMs and backtesting

By PulseAugur Editorial · [2 sources] · 2026-05-26 04:00

Two new research papers address the critical issue of data contamination in large language models, particularly for code generation and backtesting scenarios. The first paper introduces TRACER, a framework designed to detect semantic similarities in code that indicate contamination, achieving high accuracy even with models like GPT-5. The second paper proposes Shapley-DCLR and TimeSPEC, methods to measure and mitigate temporal contamination in LLM backtesting by focusing on decision-driving claims and ensuring predictions are based solely on pre-cutoff knowledge. AI

IMPACT These methods aim to improve the reliability and trustworthiness of LLM evaluations, crucial for their safe and effective deployment.

RANK_REASON Two academic papers published on arXiv introducing novel methods for detecting and mitigating data contamination in LLMs.

Read on arXiv cs.AI →

paper
safety

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · Yifeng Di, Xuliang Huang, Tianyi Zhang · 2026-05-26 04:00

TRACER: A Semantic-Aware Framework for Fine-Grained Contamination Detection in Code LLMs

arXiv:2605.24079v1 Announce Type: cross Abstract: Data contamination is a known threat to the reliability of model evaluation. However, it remains underexplored in code large language models (LLMs), where contamination often goes beyond exact duplication. We present TRACER, a sem…
arXiv cs.AI TIER_1 English(EN) · Zeyu Zhang, Ryan Chen, Bradly C. Stadie · 2026-05-26 04:00

All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection and Mitigation in LLM Backtesting

arXiv:2602.17234v2 Announce Type: replace Abstract: Backtesting LLMs on resolved events assumes models reason only from pre-cutoff knowledge, yet pretrained models inevitably leak post-cutoff knowledge. We introduce a claim-level evaluation framework that decomposes prediction ra…

COVERAGE [2]

TRACER: A Semantic-Aware Framework for Fine-Grained Contamination Detection in Code LLMs

All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection and Mitigation in LLM Backtesting

RELATED ENTITIES

RELATED TOPICS