ENTITY Reinforcement Learning with Verifiable Rewards

Reinforcement Learning with Verifiable Rewards

PulseAugur coverage of Reinforcement Learning with Verifiable Rewards — every cluster mentioning Reinforcement Learning with Verifiable Rewards across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

19 over 90d

Releases · 30d

0 over 90d

Papers · 30d

18 over 90d

TIER MIX · 90D

research 6
tool 12
commentary 1

TOPICS

paper 18
model release 16
safety 3
other 1

SENTIMENT · 30D

9 day(s) with sentiment data

RECENT · PAGE 1/1 · 19 TOTAL

COMMENTARY · CL_113898 · Jun 27 · 19:40

Neuralese training method may improve AI alignment via verifiable rewards

The concept of "Neuralese," a method for training AI models, is explored as a potentially beneficial approach for AI alignment. This method leverages Reinforcement Learning with Verifiable Rewards (RLVR) to optimize com…
TOOL · CL_109945 · Jun 25 · 04:00

New RL method trains AI to reason about geological event histories

Researchers have developed Geo-Strat-RL, a synthetic environment designed to train vision-language models (VLMs) in reasoning about geological event histories. This system uses reinforcement learning with verifiable rew…
TOOL · CL_104718 · Jun 21 · 03:15

Curriculum RL pushes LLM reasoning beyond base model limits

Researchers have developed a new Curriculum Reinforcement Learning (CRL) approach designed to enhance the reasoning capabilities of large language models (LLMs) beyond their initial training. This method, termed boundar…
TOOL · CL_93283 · Jun 16 · 04:00

New research frames RLVR diversity collapse as overtraining

A new research paper published on arXiv explores the phenomenon of "diversity collapse" in Reinforcement Learning with Verifiable Rewards (RLVR), a technique used to enhance large language models' reasoning. The paper f…
TOOL · CL_91404 · Jun 15 · 04:00

New RL framework boosts 3D video scene understanding

Researchers have introduced 3D-RFT, a novel framework that applies Reinforcement Learning with Verifiable Rewards (RLVR) to video-based 3D scene understanding. Unlike traditional Supervised Fine-Tuning (SFT) methods tha…
RESEARCH · CL_91209 · Jun 12 · 17:54

New CORA method bridges thinking-answer gap in multimodal AI

Researchers have introduced CORA, a new method to address the thinking-answer inconsistency in multimodal large vision-language models (LVLMs). This inconsistency, where the reasoning process does not align semantically…
TOOL · CL_82523 · Jun 10 · 04:00

TD-Grokking framework enables LLMs to learn from zero-reward problems

Researchers have introduced TD-Grokking, a novel framework designed to enable large language models to learn from zero-reward problems. This method recursively breaks down complex, intractable problems into smaller, ver…
RESEARCH · CL_79524 · Jun 8 · 11:57

Reasoning Arena boosts LLM reasoning with trace tournaments

Researchers have developed "Reasoning Arena," a new framework designed to enhance the reasoning capabilities of large language models. This system addresses a limitation in reinforcement learning with verifiable rewards…
TOOL · CL_62863 · Jun 1 · 04:00

Small language models improve code generation with RLVR

Researchers have explored using reinforcement learning with verifiable rewards (RLVR) to enhance the code generation capabilities of small language models. Their study focused on Python code generation using Qwen3-0.6B …
RESEARCH · CL_51033 · May 26 · 04:00

New RLVR methods boost LLM training efficiency and data selection

Researchers are developing new methods to improve the efficiency and effectiveness of Reinforcement Learning with Verifiable Rewards (RLVR) for training Large Language Models (LLMs). Two papers introduce novel data sele…
RESEARCH · CL_50951 · May 26 · 04:00

New research advances policy optimization for robotics and LLMs

Researchers have introduced several new methods to enhance policy optimization in reinforcement learning, particularly for complex tasks involving robotics and large language models. MODIP aims to efficiently fine-tune …
TOOL · CL_48817 · May 25 · 04:00

New VI-CuRL framework stabilizes LLM reasoning without external verifiers

Researchers have developed VI-CuRL, a new framework designed to stabilize reinforcement learning for large language models without relying on external verifiers. This method uses the model's internal confidence to guide…
TOOL · CL_65073 · May 25 · 00:00

New RLVR method uses temporal scheduling for stable LLM training

Researchers have introduced a new method called Temporal Scheduling for Reinforcement Learning with Verifiable Rewards (RLVR), a technique used in training Large Language Models. This approach addresses the limitation o…
TOOL · CL_38259 · May 18 · 15:14

New AMR-SD method improves LLM reasoning by refining token-level credit assignment

Researchers have developed a new method called Asymmetric Meta-Reflective Self-Distillation (AMR-SD) to improve the alignment of Large Language Models (LLMs) for complex reasoning tasks. Traditional methods struggle wit…
TOOL · CL_22133 · May 8 · 04:00

LLM reasoning emerges via Inverse Tree Freezing, improving multi-step thinking

Researchers have developed a new framework called Inverse Tree Freezing to understand how large language models (LLMs) achieve complex reasoning. This model views the LLM's learning process as a random walk on a 'Concep…
TOOL · CL_20552 · May 7 · 04:00

RLVR training dynamics reveal implicit curriculum in reasoning models

Researchers have developed a theory explaining how reinforcement learning with verifiable rewards (RLVR) aids large reasoning models in overcoming long-horizon challenges. Their analysis reveals that RLVR training natur…
TOOL · CL_18760 · May 6 · 04:00

Systematic errors in RLVR verifiers can cause model performance collapse

A new research paper explores the impact of systematic errors in verifiers used for Reinforcement Learning with Verifiable Rewards (RLVR) in large language models. Unlike previous assumptions that errors only slow down …
RESEARCH · CL_47680 · Oct 22 · 00:00

AI research explores hierarchical reasoning, counterfactuals, and efficient training methods · 10 sources tracked

Several recent research papers explore advanced techniques in AI reasoning and model training. "Concept Flow Models" introduce a hierarchical approach to improve interpretability in concept-based reasoning, mitigating i…
RESEARCH · CL_21967 · Nov 27 · 00:00

New research probes LLM context understanding and confidence calibration

Researchers are developing new methods to evaluate and enhance Large Language Models (LLMs). Apple's research proposes a benchmark to test LLMs' understanding of context, finding that quantized models and pre-trained de…

Neuralese training method may improve AI alignment via verifiable rewards

New RL method trains AI to reason about geological event histories

Curriculum RL pushes LLM reasoning beyond base model limits

New research frames RLVR diversity collapse as overtraining

New RL framework boosts 3D video scene understanding

New CORA method bridges thinking-answer gap in multimodal AI

TD-Grokking framework enables LLMs to learn from zero-reward problems

Reasoning Arena boosts LLM reasoning with trace tournaments

Small language models improve code generation with RLVR

New RLVR methods boost LLM training efficiency and data selection

New research advances policy optimization for robotics and LLMs

New VI-CuRL framework stabilizes LLM reasoning without external verifiers

New RLVR method uses temporal scheduling for stable LLM training

New AMR-SD method improves LLM reasoning by refining token-level credit assignment

LLM reasoning emerges via Inverse Tree Freezing, improving multi-step thinking

RLVR training dynamics reveal implicit curriculum in reasoning models

Systematic errors in RLVR verifiers can cause model performance collapse

AI research explores hierarchical reasoning, counterfactuals, and efficient training methods · 10 sources tracked

New research probes LLM context understanding and confidence calibration