PulseAugur
EN
LIVE 07:18:36
ENTITY RLVR

RLVR

PulseAugur coverage of RLVR — every cluster mentioning RLVR across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
40
40 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
38
38 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
TIMELINE
  1. 2026-06-03 research_milestone A new paper introduces a method to address forgetting in RLVR for LLMs. source
SENTIMENT · 30D

17 day(s) with sentiment data

RECENT · PAGE 1/2 · 40 TOTAL
  1. RESEARCH · CL_109577 ·

    New Local Branch Routing framework enhances language model reasoning

    Researchers have developed a new framework called Local Branch Routing (LBR) to improve language model reasoning during test-time scaling. LBR operates at the token level, expanding a local lookahead tree and using a li…

  2. RESEARCH · CL_107806 ·

    New research paper details "pigeonholing" effect in LLMs

    A new research paper introduces the concept of "pigeonholing," where suboptimal or incorrect prompts can degrade the performance of large language models (LLMs) and lead to mode collapse. This phenomenon occurs when mod…

  3. TOOL · CL_106811 ·

    RLVR outperforms SFT for LLM reasoning, paper shows

    A new paper analyzes why reinforcement fine-tuning, specifically Reinforcement Learning with Verifiable Rewards (RLVR), outperforms supervised fine-tuning (SFT) for improving the reasoning capabilities of large language…

  4. TOOL · CL_104743 ·

    New RLVR method ACPO enhances LLM reasoning capabilities

    Researchers have analyzed Reinforcement Learning from Verifiable Rewards (RLVR) to understand its impact on large language model reasoning. Their theoretical analysis revealed that the degree of off-policy learning, inf…

  5. RESEARCH · CL_99522 ·

    ELVA framework tackles "grain blindness" in multimodal retrieval · 2 sources tracked

    Researchers have introduced ELVA, a novel framework designed to address "grain blindness" in Universal Multimodal Retrieval (UMR) systems that utilize Multimodal Large Language Models (MLLMs). Grain blindness occurs whe…

  6. RESEARCH · CL_96154 ·

    RLVR research advances improve LLM reasoning and exploration

    Two research papers explore advancements in reinforcement learning with verifiable rewards (RLVR) for large language models. The first paper theoretically analyzes why RLVR outperforms supervised fine-tuning (SFT) for r…

  7. RESEARCH · CL_98026 ·

    AI research: SFT overtraining causes rank inversion in code generation models

    A new research paper explores the phenomenon of supervised fine-tuning (SFT) overtraining in reinforcement learning from human feedback (RLHF) for code generation models. The study, focusing on Qwen2.5-Coder-3B and Deep…

  8. TOOL · CL_93283 ·

    New research frames RLVR diversity collapse as overtraining

    A new research paper published on arXiv explores the phenomenon of "diversity collapse" in Reinforcement Learning with Verifiable Rewards (RLVR), a technique used to enhance large language models' reasoning. The paper f…

  9. RESEARCH · CL_91346 ·

    New RL methods enhance LLM training stability and efficiency · 7 sources tracked

    Researchers have developed several new methods to improve the stability and efficiency of reinforcement learning (RL) in large language models (LLMs). STARE addresses policy entropy collapse by reweighting token-level a…

  10. RESEARCH · CL_93241 ·

    Nemotron 3 Ultra: Open-Source LLM Boasts 1M Context, 6x Throughput

    Researchers have introduced Nemotron 3 Ultra, a 550 billion parameter language model that utilizes a hybrid Mamba-Transformer architecture with a Mixture-of-Experts approach. The model was trained on 20 trillion tokens …

  11. RESEARCH · CL_91199 ·

    On-Policy Distillation Updates Found to Be Sparse and Geometrically Distinct

    A new research paper explores the mechanics of on-policy distillation (OPD), a post-training technique that combines on-policy student trajectories with dense teacher supervision. The study reveals that OPD updates are …

  12. TOOL · CL_79735 ·

    LLMs enhanced with RLVR improve long-horizon maritime forecasting

    Researchers have developed a new framework called RLVR to improve long-horizon maritime trajectory and destination forecasting using large language models. This approach converts vessel trajectories into semantic textua…

  13. RESEARCH · CL_79475 ·

    New sGPO strategy cuts RLVR training compute by 3x

    Researchers have developed a new training strategy called sorted Group Policy Optimization (sGPO) to improve the efficiency of Reinforcement Learning with Verifiable Rewards (RLVR). This method uses a small amount of in…

  14. RESEARCH · CL_79193 ·

    AI agents trained to navigate long shopping histories

    Researchers have developed new methods for training AI agents to understand long customer shopping trajectories, a task previously limited by context window constraints in large language models. One approach uses an "ag…

  15. TOOL · CL_70308 ·

    New GeoMin method boosts data efficiency in semi-supervised RLVR

    Researchers have introduced GeoMin, a novel method designed to improve the data efficiency of semi-supervised reinforcement learning with verifiable rewards (RLVR). This approach models global feature distributions from…

  16. TOOL · CL_68473 ·

    New RLVR method combats LLM forgetting of solved problems

    Researchers have identified a phenomenon called "correct-set turnover" in reinforcement learning with verifiable rewards (RLVR) for large language models. This issue causes models to forget previously solved problems as…

  17. RESEARCH · CL_68154 ·

    AI research paper explores synthetic task augmentation for RLVR

    Researchers have developed a method to replace human-curated tasks with synthetically augmented ones for training language models in reinforcement learning from verifiable rewards (RLVR). This approach addresses the sca…

  18. TOOL · CL_65313 ·

    New CAST method improves LLM reasoning via self-distillation

    Researchers have developed CAST, a novel self-distillation method designed to enhance reinforcement learning with verifiable rewards (RLVR) in large language models, particularly for Group Relative Policy Optimization (…

  19. RESEARCH · CL_62929 ·

    AI models improve code generation with new verification techniques

    Researchers have developed new methods to improve the ability of large language models to generate correct code and proofs. One approach, TTRL-CoCoV, uses confidence-conditioned verification to enhance coverage and accu…

  20. COMMENTARY · CL_62484 ·

    AI writing detectors criticized for flagging human text

    The author argues that the prevalence of