Brief · PulseAugur

TOOL · 雷峰网 (Leiphone) 中文(ZH) · 7h

ICML 2026 | University of Electronic Science and Technology of China: Tree-based Self-Play TSP, Fine-grained Self-Correction Framework for Secure Code Large Models

Researchers have developed a novel framework called Tree Self-Play (TSP) to address the inherent security vulnerabilities in large language models trained on code. Current methods like supervised fine-tuning and reinforcement learning are too coarse-grained to fix localized coding errors that lead to issues such as SQL injection. TSP introduces a fine-grained, self-driven approach that precisely identifies risk nodes in code and uses self-play to generate both safe and vulnerable code paths for targeted optimization. AI

IMPACT This framework could significantly improve the security of AI-generated code, reducing vulnerabilities and enhancing trust in AI-assisted software development.

Reinforcement learning
HumanEval
Large language models
Supervised fine-tuning
Qwen2.5-Coder-7B
CodeLlama-7B
DiverseVul
Qwen2.5-Coder-3B
Tree Self-Play