PulseAugur
实时 14:51:39
English(EN) EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics

EvoLM 使语言模型能够在无外部监督的情况下自我改进

研究人员推出了一种新颖的语言模型后训练方法 EvoLM,该方法能够在无外部监督的情况下实现自我改进。该方法交替训练一个评分标准生成器(该生成器创建特定实例的评估标准)和一个策略(该策略使用这些标准作为奖励信号)。EvoLM 通过训练一个 Qwen3-8B 模型生成了超越 GPT-4.1 的评分标准,并使共同训练的策略在另一套基准上取得了高性能,从而证明了其有效性。 AI

影响 该方法可以减少对人类标注和专有模型在 LLM 训练中的依赖,从而可能加速自我改进周期。

排序理由 这是一篇详细介绍语言模型自改进新方法的学术论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

EvoLM 使语言模型能够在无外部监督的情况下自我改进

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Shuyue Stella Li, Rui Xin, Teng Xiao, Yike Wang, Rulin Shao, Zoey Hao, Melanie Sclar, Sewoong Oh, Faeze Brahman, Pang Wei Koh, Yulia Tsvetkov ·

    EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics

    arXiv:2605.03871v1 Announce Type: new Abstract: Language models encode substantial evaluative knowledge from pretraining, yet current post-training methods rely on external supervision (human annotations, proprietary models, or scalar reward models) to produce reward signals. Eac…

  2. arXiv cs.AI TIER_1 English(EN) · Yulia Tsvetkov ·

    EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics

    Language models encode substantial evaluative knowledge from pretraining, yet current post-training methods rely on external supervision (human annotations, proprietary models, or scalar reward models) to produce reward signals. Each imposes a ceiling. Human judgment cannot super…