PulseAugur
实时 06:46:06

New framework StepCodeReasoner boosts code reasoning with execution traces

Researchers have developed StepCodeReasoner, a new framework designed to improve code reasoning by focusing on intermediate execution states rather than just final outputs. This approach uses structured print statements to create execution-trace anchors, training models to predict runtime states at each step. The framework also incorporates a novel reinforcement learning algorithm, Bi-Level GRPO, for better credit assignment across and within execution paths. Experiments show that StepCodeReasoner achieves state-of-the-art performance on code reasoning benchmarks, with its 7B model surpassing models like GPT-4o and a previous CodeReasoner baseline. AI

影响 This new method for code reasoning could lead to more reliable AI code generation and debugging tools.

排序理由 The cluster contains an academic paper detailing a new method and benchmark results. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

New framework StepCodeReasoner boosts code reasoning with execution traces

报道来源 [1]

  1. arXiv cs.CL TIER_1 English(EN) · Jie M. Zhang ·

    StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning

    Existing code reasoning methods primarily supervise final code outputs, ignoring intermediate states, often leading to reward hacking where correct answers are obtained through inconsistent reasoning. We propose StepCodeReasoner, a framework that introduces explicit intermediate …