Extrapolative Weight Averaging Extends Code RL Frontiers

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-27 17:09

研究人员探索了外插权重平均法，作为一种在代码生成强化学习中扩展竞争目标帕累托前沿的方法。通过训练具有嵌套单元测试覆盖率的检查点，他们观察到一个正确性-效率前沿，其中覆盖率的提高会改善优化但降低正确性，而解决率保持不变。在训练端点之外进行外插成功地扩展了这个前沿，证明了其在不同推理设置和模型规模（32B和7B参数）上的效用。该技术在用于集成时，将LCB/hard上的pass@250提高了3.3%。 AI

影响外插权重平均法可能提供一种在无需额外训练的情况下提高模型性能的方法，从而可能提高代码生成任务的效率。

排序理由该集群包含一篇研究论文，详细介绍了一种通过外插权重平均法改进代码生成模型的新方法。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

Extrapolative Weight Averaging Extends Code RL Frontiers

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Kunhao Zheng, Pierre Chambon, Juliette Decugis, Jonas Gehring, Taco Cohen, Benjamin Negrevergne, Gabriel Synnaeve · 2026-05-28 04:00

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL

arXiv:2605.28751v1 Announce Type: cross Abstract: Linear interpolation between fine-tuned checkpoints has been shown to trace the Pareto front between competing objectives, but whether extrapolative weight averaging can extend such frontiers to new checkpoints useful at inference…
arXiv cs.AI TIER_1 English(EN) · Gabriel Synnaeve · 2026-05-27 17:09

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL

Linear interpolation between fine-tuned checkpoints has been shown to trace the Pareto front between competing objectives, but whether extrapolative weight averaging can extend such frontiers to new checkpoints useful at inference time, without additional RL training, remains unc…

报道来源 [2]

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL

相关实体

相关话题