Researchers have developed DORA, a novel asynchronous reinforcement learning system designed to accelerate language model training. DORA addresses the bottleneck caused by long-tailed trajectories in the rollout phase by employing multi-version streaming rollout, which allows for concurrent policy versions. This system achieves up to 2-3 times higher throughput than existing methods on benchmarks and 2-4 times faster training in large-scale industrial settings. The resulting open-source models, LongCat-Flash-Thinking, demonstrate competitive performance on complex reasoning tasks. AI
IMPACT Accelerates RL training for LLMs, potentially enabling faster iteration and deployment of advanced models.
RANK_REASON This is a research paper detailing a new system for language model training.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →