Researchers have developed DORA, a novel asynchronous reinforcement learning system designed to accelerate language model training. DORA addresses the bottleneck caused by long-tailed trajectories in the rollout phase by employing multi-version streaming rollout, which allows for concurrent policy versions. This system achieves up to 2-3 times higher throughput than existing methods on benchmarks and 2-4 times faster training in large-scale industrial settings. The resulting open-source models, LongCat-Flash-Thinking, demonstrate competitive performance on complex reasoning tasks. AI
影响 Accelerates RL training for LLMs, potentially enabling faster iteration and deployment of advanced models.
排序理由 This is a research paper detailing a new system for language model training.
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →