PulseAugur
实时 13:17:45
English(EN) SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

新的SCOPE框架通过在开放式任务上进行自我博弈来训练LLM

研究人员开发了SCOPE,一个新颖的无数据自我博弈框架,旨在无需外部监督即可在开放式任务上训练语言模型。该框架共同演化两个策略:一个挑战者(Challenger)创建基于文档的任务,一个解决者(Solver)回答这些任务。初始模型的冻结副本充当自我裁判,创建评分标准并评估响应。SCOPE已在Qwen2.5、Qwen3和OLMo-3等模型的各种基准测试中展示了显著的性能提升,甚至超越了在精选提示上训练的模型。 AI

影响 这个自我博弈框架可以减少在复杂、开放式任务上训练LLM对精选数据集的依赖。

排序理由 该集群包含一篇详细介绍语言模型训练新框架的研究论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

报道来源 [3]

  1. arXiv cs.CL TIER_1 English(EN) · Wai-Chung Kwan, Aryo Pradipta Gema, Joshua Ong Jun Leang, Pasquale Minervini ·

    SCOPE:通过共同演化策略进行自我博弈以应对开放式任务

    arXiv:2605.31433v1 Announce Type: new Abstract: Self-play can train language models without external supervision. However, existing methods require rule-checkable answers, leaving open-ended tasks dependent on curated prompts or frontier-model judges. We introduce SCOPE, a data-f…

  2. arXiv cs.CL TIER_1 English(EN) · Pasquale Minervini ·

    SCOPE:通过共同演化策略实现开放式任务的自我博弈

    Self-play can train language models without external supervision. However, existing methods require rule-checkable answers, leaving open-ended tasks dependent on curated prompts or frontier-model judges. We introduce SCOPE, a data-free self-play framework for open-ended tasks tha…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    SCOPE:通过共同演化策略实现开放式任务的自我博弈

    SCOPE is a self-play framework that trains language models on open-ended tasks through policy co-evolution, achieving superior performance on both targeted and held-out benchmarks without external supervision.