Researchers have developed DORA, a novel asynchronous reinforcement learning system designed to accelerate language model training. DORA addresses the bottleneck caused by long-tailed trajectories in the rollout phase by employing multi-version streaming rollout, which allows for concurrent policy versions. This system achieves up to 2-3 times higher throughput than existing methods on benchmarks and 2-4 times faster training in large-scale industrial settings. The resulting open-source models, LongCat-Flash-Thinking, demonstrate competitive performance on complex reasoning tasks. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Accelerates RL training for LLMs, potentially enabling faster iteration and deployment of advanced models.
RANK_REASON This is a research paper detailing a new system for language model training.