PulseAugur
EN
LIVE 19:02:55

Extrapolative Weight Averaging Extends Code RL Frontiers

Researchers have explored extrapolative weight averaging as a method to extend the Pareto front between competing objectives in reinforcement learning for code generation. By training checkpoints with nested unit-test coverage, they observed a correctness-efficiency frontier where increased coverage improved optimization but decreased correctness, leaving the solve rate unchanged. Extrapolation beyond trained endpoints successfully extended this frontier, demonstrating its utility across different inference settings and model scales (32B and 7B parameters). This technique improved pass@250 on LCB/hard by 3.3% when used in ensembles. AI

IMPACT Extrapolative weight averaging may offer a way to enhance model performance without additional training, potentially improving efficiency in code generation tasks.

RANK_REASON The cluster contains a research paper detailing a novel method for improving code generation models through extrapolative weight averaging.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Extrapolative Weight Averaging Extends Code RL Frontiers

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Kunhao Zheng, Pierre Chambon, Juliette Decugis, Jonas Gehring, Taco Cohen, Benjamin Negrevergne, Gabriel Synnaeve ·

    Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL

    arXiv:2605.28751v1 Announce Type: cross Abstract: Linear interpolation between fine-tuned checkpoints has been shown to trace the Pareto front between competing objectives, but whether extrapolative weight averaging can extend such frontiers to new checkpoints useful at inference…

  2. arXiv cs.AI TIER_1 English(EN) · Gabriel Synnaeve ·

    Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL

    Linear interpolation between fine-tuned checkpoints has been shown to trace the Pareto front between competing objectives, but whether extrapolative weight averaging can extend such frontiers to new checkpoints useful at inference time, without additional RL training, remains unc…