ScaleBox 系统提升了 LLM 代码验证的准确性和效率

作者 PulseAugur 编辑部 · [3 个来源] · 2026-04-30 06:09

研究人员开发了 ScaleBox，一个旨在提高大型语言模型代码验证准确性和效率的新系统。现有的代码沙箱在高并发工作负载方面存在困难，导致在强化学习训练和评估期间反馈不准确。ScaleBox 通过自动化的评测生成、跨多个节点的并行执行以及可配置的评估套件来解决这些问题，从而提高了验证性能和训练稳定性。 AI

影响增强了 LLM 训练代码验证基础设施的可靠性和吞吐量，可能提高模型在编码任务上的性能。

排序理由该集群描述了一篇详细介绍 LLM 代码验证系统的研究论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.CL TIER_1 English(EN) · Jiasheng Zheng, Xin Zheng, Boxi Cao, Pengbo Wang, Zhengzhao Ma, Qiming Zhu, Jiazhen Jiang, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun · 2026-05-01 04:00

ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

arXiv:2604.27467v1 Announce Type: cross Abstract: Code sandboxes have emerged as a critical infrastructure for advancing the coding capabilities of large language models, providing verifiable feedback for both RL training and evaluation. However, existing systems fail to provide …
arXiv cs.CL TIER_1 English(EN) · Le Sun · 2026-04-30 06:09

ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

Code sandboxes have emerged as a critical infrastructure for advancing the coding capabilities of large language models, providing verifiable feedback for both RL training and evaluation. However, existing systems fail to provide accurate verification and efficiency under high-co…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-04-30 06:09

ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

Code sandboxes have emerged as a critical infrastructure for advancing the coding capabilities of large language models, providing verifiable feedback for both RL training and evaluation. However, existing systems fail to provide accurate verification and efficiency under high-co…

报道来源 [3]

ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

相关实体

相关话题