English(EN) Which Tokens Matter? Adaptive Token Selection for RLVR with the Relative Surprisal Index

新的相对惊奇度指数增强了 RLVR 中 LLM 的推理能力

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-30 12:33

研究人员引入了相对惊奇度指数（RSI），这是一个旨在改进大型语言模型（LLM）的可验证奖励强化学习（RLVR）的新指标。RSI 结合了 Token 熵和所选 Token 的概率，解决了先前专注于高熵或低概率 Token 的冲突方法。通过提出 RSI 选择（RSI-S），一种自适应 Token 过滤方法，研究人员在各种 Qwen2.5 模型规模的 AIME 和 AMC 等基准测试中展示了性能的提高，与 GRPO 相比，平均准确率（avg@32）提高了 2-3 个百分点。 AI

影响引入了一种新颖的指标和过滤方法，有望提高大型语言模型（LLM）的推理能力。

排序理由该项目是一篇学术论文，介绍了一种用于改进 LLM 推理的新指标和方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Outongyi Lv, Yanzhao Zheng, Yuanwei Zhang, Zhenghao Huang, Xingjun Wang, Baohua Dong, Hangcheng Zhu, Yingda Chen · 2026-07-01 04:00

Which Tokens Matter? Adaptive Token Selection for RLVR with the Relative Surprisal Index

arXiv:2606.31575v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a powerful tool for propelling Large Language Models (LLMs) beyond imitation-based training towards more robust reasoning capabilities. Among existing approaches, RL with Verifiable Rewards (RL…
arXiv cs.AI TIER_1 English(EN) · Yingda Chen · 2026-06-30 12:33

Which Tokens Matter? Adaptive Token Selection for RLVR with the Relative Surprisal Index

Reinforcement learning (RL) has become a powerful tool for propelling Large Language Models (LLMs) beyond imitation-based training towards more robust reasoning capabilities. Among existing approaches, RL with Verifiable Rewards (RLVR) has emerged as a pivotal paradigm for advanc…

报道来源 [2]

Which Tokens Matter? Adaptive Token Selection for RLVR with the Relative Surprisal Index

Which Tokens Matter? Adaptive Token Selection for RLVR with the Relative Surprisal Index

相关实体

相关话题