PulseAugur
EN
LIVE 21:03:42
ENTITY Outcome Reward Models

Outcome Reward Models

PulseAugur coverage of Outcome Reward Models — every cluster mentioning Outcome Reward Models across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
2
2 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
2
2 over 90d
TIER MIX · 90D
TOPICS
RECENT · PAGE 1/1 · 2 TOTAL
  1. TOOL · CL_58783 ·

    GRPO RL Algorithm Equivalent to Process Reward Model, New Paper Shows

    A new research paper proposes that the Group Relative Policy Optimization (GRPO) reinforcement learning algorithm, when used with outcome reward models, is mathematically equivalent to a process reward model. This equiv…

  2. RESEARCH · CL_10096 ·

    Survey details process reward models for fine-grained LLM reasoning alignment

    This survey paper systematically reviews Process Reward Models (PRMs), which evaluate and guide Large Language Models (LLMs) at the reasoning step or trajectory level, unlike traditional outcome-based models. It details…