Researchers have developed SCOPE, a novel data-free self-play framework designed to train language models on open-ended tasks without external supervision. This framework co-evolves two policies: a Challenger that creates document-grounded tasks and a Solver that answers them. A frozen copy of the initial model acts as a self-judge, creating rubrics and grading responses. SCOPE has demonstrated significant performance improvements on various benchmarks for models like Qwen2.5, Qwen3, and OLMo-3, even surpassing models trained on curated prompts. AI
IMPACT This self-play framework could reduce reliance on curated datasets for training LLMs on complex, open-ended tasks.
RANK_REASON The cluster contains a research paper detailing a new framework for training language models.
Read on Hugging Face Daily Papers →
AI-generated summary · Google Gemini · from 3 sources. How we write summaries →