Researchers have developed a new method called Span-level Hidden state Enabled Advantage Reweighting (SHEAR) to improve credit assignment in reinforcement learning for language models. SHEAR leverages the Wasserstein distance between hidden state distributions of correct and incorrect reasoning paths to identify and amplify learning signals in critical token areas. This approach requires no additional annotation or reward model training, demonstrating improved performance on mathematical reasoning and code generation tasks compared to existing methods. AI
影响 Introduces a novel, annotation-free method to improve AI reasoning and code generation capabilities.
排序理由 Academic paper introducing a novel method for improving AI training.
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →