Researchers have developed a new method to detect functional memorization in code language models, going beyond simple textual overlap. By comparing a mid-trained model exposed to target code with a reference model, they can identify if functional logic, not just verbatim text, is being reproduced. This study used Olmo-3-32B and Python code, employing both textual similarity and execution-based functional similarity metrics to demonstrate the presence of functional memorization. The findings underscore the necessity for advanced auditing metrics that capture functional equivalence in code generation. AI
IMPACT Highlights the need for more sophisticated evaluation metrics for code generation models, impacting how their safety and originality are assessed.
RANK_REASON This is a research paper detailing a new method for evaluating code language models. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →