PulseAugur
实时 20:37:03
实体 RLVR

RLVR

PulseAugur coverage of RLVR — every cluster mentioning RLVR across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
15
90 天内 15
发布 · 30天
0
90 天内 0
论文 · 30天
14
90 天内 14
层级分布 · 90 天
关系
情绪 · 30 天

6 天有情绪数据

最近 · 第 1/1 页 · 共 15 条
  1. TOOL · CL_44357 ·

    Anyscale launches skill to automate LLM post-training runs

    Anyscale has introduced a new Anyscale Agent Skill designed to simplify and automate the process of generating LLM post-training runs. This skill assists users in selecting the most appropriate post-training method, suc…

  2. TOOL · CL_41869 ·

    PlexRL runtime boosts LLM training efficiency by 37%

    Researchers have developed PlexRL, a cluster-level runtime designed to improve the efficiency of training large language models (LLMs) for reinforcement learning with verifiable rewards (RLVR). RLVR training is often in…

  3. TOOL · CL_40746 ·

    New RLVR framework POW3R adapts rewards for faster learning

    Researchers have developed a new framework called POW3R to improve reinforcement learning with verifiable rewards (RLVR). This method addresses the issue where static rubric rewards in RLVR may not effectively guide tra…

  4. RESEARCH · CL_40825 ·

    New self-distillation methods boost LLM performance on reasoning tasks

    Researchers have developed new self-distillation techniques for large language models to improve their performance without relying on external feedback. AVSD (Adaptive-View Self-Distillation) balances consensus signals …

  5. TOOL · CL_34321 ·

    LLM alignment: PPO, DPO, or verifier-based RL for 2026?

    This article provides a technical guide for selecting the appropriate reinforcement learning technique for aligning large language models in 2026. It contrasts Proximal Policy Optimization (PPO) for Reinforcement Learni…

  6. TOOL · CL_36551 ·

    NudgeRL framework enhances LLM reasoning via structured exploration

    Researchers have developed NudgeRL, a new framework designed to improve the exploration capabilities of reinforcement learning with verifiable rewards (RLVR) for large language models. This method uses "Strategy Nudging…

  7. TOOL · CL_28315 ·

    New RLRT method enhances LLM reasoning by reversing teacher signals

    Researchers have developed a new method called RLRT, which reverses the typical self-distillation process in large language models. Instead of a teacher model guiding a student, RLRT identifies and reinforces the studen…

  8. TOOL · CL_22111 ·

    P^2O method enhances LLM reasoning by optimizing prompts and policies

    Researchers have developed a new method called P^2O (Joint Policy and Prompt Optimization) to address the issue of advantage collapse in Reinforcement Learning with Verifiable Rewards (RLVR) for large language models. T…

  9. TOOL · CL_22082 ·

    New theory explains RLVR optimization dynamics and step-size thresholds

    Researchers have developed a theoretical framework for Reinforcement Learning with Verifiable Rewards (RLVR), a technique used to fine-tune large language models with binary feedback. The study introduces a 'Gradient Ga…

  10. TOOL · CL_21953 ·

    New S-trace method improves RLVR efficiency and credit assignment

    Researchers have introduced Selective Eligibility Traces (S-trace), a novel method designed to enhance the reasoning capabilities of large language models within the Reinforcement Learning with Verifiable Rewards (RLVR)…

  11. TOOL · CL_20552 ·

    RLVR training dynamics reveal implicit curriculum in reasoning models

    Researchers have developed a theory explaining how reinforcement learning with verifiable rewards (RLVR) aids large reasoning models in overcoming long-horizon challenges. Their analysis reveals that RLVR training natur…

  12. TOOL · CL_18760 ·

    Systematic errors in RLVR verifiers can cause model performance collapse

    A new research paper explores the impact of systematic errors in verifiers used for Reinforcement Learning with Verifiable Rewards (RLVR) in large language models. Unlike previous assumptions that errors only slow down …

  13. RESEARCH · CL_08319 ·

    JURY-RL framework enhances LLM reasoning with label-free verifiable rewards

    Researchers have developed JURY-RL, a novel framework for label-free reinforcement learning with verifiable rewards (RLVR) designed to improve the reasoning capabilities of large language models. This method separates t…

  14. RESEARCH · CL_06623 ·

    New method uses hidden states to improve AI reasoning credit assignment

    Researchers have developed a new method called Span-level Hidden state Enabled Advantage Reweighting (SHEAR) to improve credit assignment in reinforcement learning for language models. SHEAR leverages the Wasserstein di…

  15. RESEARCH · CL_21967 ·

    New research probes LLM context understanding and confidence calibration

    Researchers are developing new methods to evaluate and enhance Large Language Models (LLMs). Apple's research proposes a benchmark to test LLMs' understanding of context, finding that quantized models and pre-trained de…