新的大语言模型训练方法提高了效率和错误恢复能力

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-11 13:07

研究人员开发了提高大语言模型（LLM）训练效率的新技术。一种名为“步进拒绝微调”（SRFT）的方法，通过评估每个步骤的正确性来利用不成功的训练轨迹，使模型能够在不重复错误的情况下从中学习。该方法将SWE-bench任务的解决率提高了3.7%。另一项开发，“无限掩码扩散模型”（IMDM），通过引入随机无限状态掩码来解决掩码扩散模型（MDMs）中的因子化错误。IMDM展示了卓越的几步生成能力，并在与蒸馏结合时，在LM1B和OpenWebText数据集上超越了现有方法。 AI

影响这些新的训练技术可能带来更强大、更高效的大语言模型，提高复杂任务的性能并降低训练成本。

排序理由两篇介绍大语言模型训练新方法的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Yaroslav Zharov · 2026-05-11 14:55

步进拒绝微调：一种实用的蒸馏方法

Rejection Fine-Tuning (RFT) is a standard method for training LLM agents, where unsuccessful trajectories are discarded from the training set. In the context of SWE-bench tasks, this corresponds to filtering out runs where the submitted patch does not pass the tests. However, thi…
arXiv cs.CL TIER_1 English(EN) · Seunghoon Hong · 2026-05-11 13:07

Infinite Mask Diffusion for Few-Step Distillation

Masked Diffusion Models (MDMs) have emerged as a promising alternative to autoregressive models in language modeling, offering the advantages of parallel decoding and bidirectional context processing within a simple yet effective framework. Specifically, their explicit distinctio…

报道来源 [2]

步进拒绝微调：一种实用的蒸馏方法

Infinite Mask Diffusion for Few-Step Distillation

相关实体

相关话题