English(EN) PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR

PlexRL 运行时将 LLM 训练效率提升 37%

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-20 07:55

研究人员开发了 PlexRL，这是一个集群级运行时，旨在提高可验证奖励强化学习 (RLVR) 的大型语言模型 (LLM) 训练效率。由于长尾 rollout 和工具引起的停顿造成的空闲时间，RLVR 训练通常效率低下。PlexRL 通过在多个 RLVR 作业中复用 LLM 服务来解决此问题，通过时间切片执行模型来填补空闲时段，而无需进行昂贵的迁移。评估显示，PlexRL 在保持算法灵活性和增加最小开销的同时，可将 GPU 小时成本降低高达 37.58%。 AI

影响优化 LLM 训练基础设施，可能降低 RLVR 应用的成本并提高吞吐量。

排序理由该集群包含一篇学术论文，详细介绍了一种用于优化 LLM 执行的新系统。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Siyuan Feng · 2026-05-20 07:55

PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR

Reinforcement learning with verifiable rewards (RLVR) has recently unlocked strong reasoning capabilities in large language models (LLMs), triggering rapid exploration of new algorithms and data. However, RLVR training is notoriously inefficient: long-tailed rollouts, tool-induce…

报道来源 [1]

PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR

相关实体

相关话题