DORA system accelerates LLM reinforcement learning by 2-4x with novel asynchronous rollout

By PulseAugur Editorial · [1 sources] · 2026-04-30 04:00

Researchers have developed DORA, a novel asynchronous reinforcement learning system designed to accelerate language model training. DORA addresses the bottleneck caused by long-tailed trajectories in the rollout phase by employing multi-version streaming rollout, which allows for concurrent policy versions. This system achieves up to 2-3 times higher throughput than existing methods on benchmarks and 2-4 times faster training in large-scale industrial settings. The resulting open-source models, LongCat-Flash-Thinking, demonstrate competitive performance on complex reasoning tasks. AI

IMPACT Accelerates RL training for LLMs, potentially enabling faster iteration and deployment of advanced models.

RANK_REASON This is a research paper detailing a new system for language model training.

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Tianhao Hu, Xiangcheng Liu, Youshao Xiao, Yang Zheng, Xuan Huang, Jinrui Ding, Yufei Zhang, Tao Liang, Hongyu Zang, Quan Chen, Yueqing Sun, Wenjie Shi, Chao Zhang, Wei Wang, Qi Gu, Yerui Sun, Yucheng Xie, Xunliang Cai · 2026-04-30 04:00

DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training

arXiv:2604.26256v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a critical paradigm for LLM post-training, yet the rollout phase -- accounting for 50--80% of total step time -- is bottlenecked by skewed generation: long-tailed trajectories indispensable for…

COVERAGE [1]

DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training

RELATED ENTITIES

RELATED TOPICS