新的GFlowNet训练方法提高了LLM前缀平衡性和多样性

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-29 04:00

研究人员引入了一种新的生成流网络（GFlowNets）训练方法，称为Rooted absorbed prefix Trajectory Balance (RapTB)，旨在解决大型语言模型中的前缀崩溃和长度偏差等问题。RapTB通过将子轨迹监督锚定在根部并向中间前缀传播奖励来改进信用分配。此外，还提出了一种名为SubM的子模态回放刷新策略，以对抗由有偏回放引起的分布偏移，从而在训练流中同时促进高奖励和多样性。在分子生成等任务上的实证结果表明，RapTB与SubM结合可以提高优化性能和分子多样性，同时保持有效性。 AI

影响引入了提高LLM训练稳定性和输出质量的新颖技术，有可能增强生成式AI应用。

排序理由这是一篇详细介绍GFlowNet新训练方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Xi Wang, Wenbo Lu, Shengjie Wang · 2026-05-29 04:00

Rooted Absorbed Prefix Trajectory Balance with Submodular Replay for GFlowNet Training

arXiv:2603.00454v2 Announce Type: replace-cross Abstract: Generative Flow Networks (GFlowNets) enable fine-tuning large language models to approximate reward-proportional posteriors, but they remain prone to mode collapse, manifesting as prefix collapse and length bias. We attrib…

报道来源 [1]

Rooted Absorbed Prefix Trajectory Balance with Submodular Replay for GFlowNet Training

相关实体

相关话题