TRACER: A Semantic-Aware Framework for Fine-Grained Contamination Detection in Code LLMs
Two new research papers address the critical issue of data contamination in large language models, particularly for code generation and backtesting scenarios. The first paper introduces TRACER, a framework designed to detect semantic similarities in code that indicate contamination, achieving high accuracy even with models like GPT-5. The second paper proposes Shapley-DCLR and TimeSPEC, methods to measure and mitigate temporal contamination in LLM backtesting by focusing on decision-driving claims and ensuring predictions are based solely on pre-cutoff knowledge. AI
IMPACT These methods aim to improve the reliability and trustworthiness of LLM evaluations, crucial for their safe and effective deployment.