ENTITY Reinforcement Learning with Verifiable Rewards (RLVR)

Reinforcement Learning with Verifiable Rewards (RLVR)

PulseAugur coverage of Reinforcement Learning with Verifiable Rewards (RLVR) — every cluster mentioning Reinforcement Learning with Verifiable Rewards (RLVR) across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

13 over 90d

Releases · 30d

0 over 90d

Papers · 30d

13 over 90d

TIER MIX · 90D

TOPICS

paper 13
model release 9
safety 3
other 1

SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 13 TOTAL

RESEARCH · CL_141144 · Jul 13 · 12:56

New SCOPE-RL framework optimizes LLM reasoning paths for better accuracy and efficiency

Researchers have developed SCOPE-RL, a novel two-stage framework designed to enhance reinforcement learning for large language models (LLMs) by optimizing their reasoning processes. This method introduces more granular …
TOOL · CL_111740 · Jun 26 · 04:00

LLM RLVR training activates memorization shortcuts, researchers find

Researchers have identified a "Perplexity Paradox" in Large Language Models (LLMs) trained with Reinforcement Learning from Verifiable Rewards (RLVR). This paradox occurs when models achieve performance gains despite re…
TOOL · CL_93150 · Jun 16 · 04:00

New STRIDE framework enhances LLM reasoning with verifiable rewards

Researchers have introduced STRIDE, a novel framework for Reinforcement Learning with Verifiable Rewards (RLVR) designed to enhance the reasoning capabilities of large language models. Unlike previous methods that rely …
TOOL · CL_79751 · Jun 9 · 04:00

New RePO framework enhances LLM training with regret minimization

Researchers have introduced a new framework called Regret-based Preference Optimization (RePO) for training large language models using human feedback. RePO reframes the process from reward maximization to regret minimi…
TOOL · CL_68395 · Jun 3 · 04:00

New testbed analyzes RLVR for code verifier training

Researchers have introduced Aletheia, a new testbed designed to analyze the training of code verifiers. The study focuses on the trade-offs between performance and cost in Reinforcement Learning with Verifiable Rewards …
TOOL · CL_65348 · Jun 2 · 04:00

New framework detects bugs in AI reward verifiers before training

Researchers have developed a new framework to identify bugs in reinforcement learning with verifiable rewards (RLVR) systems. This method focuses on fuzzing the verifiers, which act as reward functions, to detect errors…
RESEARCH · CL_62293 · May 29 · 09:29

New framework scales LLM coding via atomic task synthesis

Researchers have developed a new framework called Atomic Decomposition and Recombination (ADR) to address the limitations in scaling Reinforcement Learning with Verifiable Rewards (RLVR) for Large Language Models (LLMs)…
TOOL · CL_72412 · May 29 · 00:00

New framework generates novel code tasks for LLM training

Researchers have introduced the Atomic Decomposition and Recombination (ADR) framework to generate challenging and novel code tasks for training Large Language Models (LLMs) using Reinforcement Learning with Verifiable …
TOOL · CL_56297 · May 28 · 04:00

Qwen3 LLMs Trained for Creativity Using Word Association Game

Researchers have developed a novel method called Reinforcement Learning with Verifiable Rewards (RLVR) to train Large Language Models (LLMs) for creativity, bypassing subjective human judgment. They applied this techniq…
TOOL · CL_51171 · May 26 · 04:00

F-GRPO method improves reinforcement learning by focusing on rare trajectories

Researchers have developed F-GRPO, a novel method to improve reinforcement learning by addressing the issue of rare-correct trajectories being missed during training. The approach introduces a difficulty-aware scaling c…
RESEARCH · CL_51028 · May 26 · 04:00

New research explores advanced masking techniques for LLM fine-tuning and pre-training

Researchers are exploring novel masking strategies to improve the fine-tuning and pre-training of large language models. One approach, EKSFT, selectively masks tokens with high entropy or KL divergence during supervised…
RESEARCH · CL_44028 · May 21 · 16:45

New method stabilizes LLM reasoning by rescuing near-boundary signals

Researchers have identified a key bottleneck in Reinforcement Learning from Verifiable Rewards (RLVR) that hinders LLM reasoning optimization. The study pinpoints rigid clipping decisions in standard hard-clipping metho…
TOOL · CL_38296 · May 18 · 11:59

New K2V framework boosts LLM reasoning in knowledge-intensive domains

Researchers have introduced Knowledge-to-Verification (K2V), a new framework designed to improve the reasoning abilities of large language models (LLMs) in knowledge-intensive fields. K2V extends reinforcement learning …

New SCOPE-RL framework optimizes LLM reasoning paths for better accuracy and efficiency

LLM RLVR training activates memorization shortcuts, researchers find

New STRIDE framework enhances LLM reasoning with verifiable rewards

New RePO framework enhances LLM training with regret minimization

New testbed analyzes RLVR for code verifier training

New framework detects bugs in AI reward verifiers before training

New framework scales LLM coding via atomic task synthesis

New framework generates novel code tasks for LLM training

Qwen3 LLMs Trained for Creativity Using Word Association Game

F-GRPO method improves reinforcement learning by focusing on rare trajectories

New research explores advanced masking techniques for LLM fine-tuning and pre-training

New method stabilizes LLM reasoning by rescuing near-boundary signals

New K2V framework boosts LLM reasoning in knowledge-intensive domains