PulseAugur
EN
LIVE 14:56:08

New Relative Surprisal Index Enhances LLM Reasoning in RLVR

Researchers have introduced the Relative Surprisal Index (RSI), a new metric designed to improve Reinforcement Learning with Verifiable Rewards (RLVR) for large language models. RSI combines token entropy with the probability of selected tokens, addressing conflicting prior approaches that focused on either high-entropy or low-probability tokens. By proposing RSI Selection (RSI-S), an adaptive token filtering method, the researchers demonstrated improved performance on benchmarks like AIME and AMC across various Qwen2.5 model scales, showing a 2-3 percentage point increase in avg@32 accuracy over GRPO. AI

IMPACT Introduces a novel metric and filtering method that could lead to more robust reasoning capabilities in large language models.

RANK_REASON The item is an academic paper introducing a new metric and method for improving LLM reasoning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New Relative Surprisal Index Enhances LLM Reasoning in RLVR

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Outongyi Lv, Yanzhao Zheng, Yuanwei Zhang, Zhenghao Huang, Xingjun Wang, Baohua Dong, Hangcheng Zhu, Yingda Chen ·

    Which Tokens Matter? Adaptive Token Selection for RLVR with the Relative Surprisal Index

    arXiv:2606.31575v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a powerful tool for propelling Large Language Models (LLMs) beyond imitation-based training towards more robust reasoning capabilities. Among existing approaches, RL with Verifiable Rewards (RL…

  2. arXiv cs.AI TIER_1 English(EN) · Yingda Chen ·

    Which Tokens Matter? Adaptive Token Selection for RLVR with the Relative Surprisal Index

    Reinforcement learning (RL) has become a powerful tool for propelling Large Language Models (LLMs) beyond imitation-based training towards more robust reasoning capabilities. Among existing approaches, RL with Verifiable Rewards (RLVR) has emerged as a pivotal paradigm for advanc…