DORA system accelerates LLM reinforcement learning by 2-4x with novel asynchronous rollout

作者 PulseAugur 编辑部 · [1 个来源] · 2026-04-30 04:00

Researchers have developed DORA, a novel asynchronous reinforcement learning system designed to accelerate language model training. DORA addresses the bottleneck caused by long-tailed trajectories in the rollout phase by employing multi-version streaming rollout, which allows for concurrent policy versions. This system achieves up to 2-3 times higher throughput than existing methods on benchmarks and 2-4 times faster training in large-scale industrial settings. The resulting open-source models, LongCat-Flash-Thinking, demonstrate competitive performance on complex reasoning tasks. AI

影响 Accelerates RL training for LLMs, potentially enabling faster iteration and deployment of advanced models.

排序理由 This is a research paper detailing a new system for language model training.

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Tianhao Hu, Xiangcheng Liu, Youshao Xiao, Yang Zheng, Xuan Huang, Jinrui Ding, Yufei Zhang, Tao Liang, Hongyu Zang, Quan Chen, Yueqing Sun, Wenjie Shi, Chao Zhang, Wei Wang, Qi Gu, Yerui Sun, Yucheng Xie, Xunliang Cai · 2026-04-30 04:00

DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training

arXiv:2604.26256v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a critical paradigm for LLM post-training, yet the rollout phase -- accounting for 50--80% of total step time -- is bottlenecked by skewed generation: long-tailed trajectories indispensable for…

报道来源 [1]

DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training

相关实体

相关话题