English(EN) StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning

新框架 StepCodeReasoner 通过执行跟踪提升代码推理能力

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-12 10:36

研究人员开发了 StepCodeReasoner，一个旨在通过关注中间执行状态而非仅仅最终输出来改进代码推理的新框架。该方法使用结构化打印语句创建执行跟踪锚点，训练模型预测每一步的运行时状态。该框架还包含一种新颖的强化学习算法 Bi-Level GRPO，用于在执行路径之间以及路径内部进行更好的信用分配。实验表明，StepCodeReasoner 在代码推理基准测试中取得了最先进的性能，其 7B 模型超越了 GPT-4o 和之前的 CodeReasoner 基线等模型。 AI

影响这种新的代码推理方法可能带来更可靠的 AI 代码生成和调试工具。

排序理由该集群包含一篇详细介绍新方法和基准测试结果的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Jie M. Zhang · 2026-05-12 10:36

StepCodeReasoner：通过强化学习将代码推理与分步执行跟踪对齐

Existing code reasoning methods primarily supervise final code outputs, ignoring intermediate states, often leading to reward hacking where correct answers are obtained through inconsistent reasoning. We propose StepCodeReasoner, a framework that introduces explicit intermediate …

报道来源 [1]

StepCodeReasoner：通过强化学习将代码推理与分步执行跟踪对齐

相关实体

相关话题