New SCOPE framework trains LLMs via self-play on open-ended tasks

By PulseAugur Editorial · [3 sources] · 2026-05-29 00:00

Researchers have developed SCOPE, a novel data-free self-play framework designed to train language models on open-ended tasks without external supervision. This framework co-evolves two policies: a Challenger that creates document-grounded tasks and a Solver that answers them. A frozen copy of the initial model acts as a self-judge, creating rubrics and grading responses. SCOPE has demonstrated significant performance improvements on various benchmarks for models like Qwen2.5, Qwen3, and OLMo-3, even surpassing models trained on curated prompts. AI

IMPACT This self-play framework could reduce reliance on curated datasets for training LLMs on complex, open-ended tasks.

RANK_REASON The cluster contains a research paper detailing a new framework for training language models.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

COVERAGE [3]

arXiv cs.CL TIER_1 English(EN) · Wai-Chung Kwan, Aryo Pradipta Gema, Joshua Ong Jun Leang, Pasquale Minervini · 2026-06-01 04:00

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

arXiv:2605.31433v1 Announce Type: new Abstract: Self-play can train language models without external supervision. However, existing methods require rule-checkable answers, leaving open-ended tasks dependent on curated prompts or frontier-model judges. We introduce SCOPE, a data-f…
arXiv cs.CL TIER_1 English(EN) · Pasquale Minervini · 2026-05-29 15:28

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

Self-play can train language models without external supervision. However, existing methods require rule-checkable answers, leaving open-ended tasks dependent on curated prompts or frontier-model judges. We introduce SCOPE, a data-free self-play framework for open-ended tasks tha…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-29 00:00

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

SCOPE is a self-play framework that trains language models on open-ended tasks through policy co-evolution, achieving superior performance on both targeted and held-out benchmarks without external supervision.

COVERAGE [3]

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

RELATED ENTITIES

RELATED TOPICS