Researchers have introduced the Relative Surprisal Index (RSI), a new metric designed to improve Reinforcement Learning with Verifiable Rewards (RLVR) for large language models. RSI combines token entropy with the probability of selected tokens, addressing conflicting prior approaches that focused on either high-entropy or low-probability tokens. By proposing RSI Selection (RSI-S), an adaptive token filtering method, the researchers demonstrated improved performance on benchmarks like AIME and AMC across various Qwen2.5 model scales, showing a 2-3 percentage point increase in avg@32 accuracy over GRPO. AI
IMPACT Introduces a novel metric and filtering method that could lead to more robust reasoning capabilities in large language models.
RANK_REASON The item is an academic paper introducing a new metric and method for improving LLM reasoning. [lever_c_demoted from research: ic=1 ai=1.0]
- AMC
- GRPO
- large-language models
- Qwen2.5-1.5B
- Qwen2.5-3B
- qwen2.5:7b
- Relative Surprisal Index
- RLVR
- RSI Selection
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →