This survey paper systematically reviews Process Reward Models (PRMs), which evaluate and guide Large Language Models (LLMs) at the reasoning step or trajectory level, unlike traditional outcome-based models. It details methods for generating process data, constructing PRMs, and utilizing them for reinforcement learning and test-time scaling. The paper covers applications in diverse areas such as mathematics, coding, text, multimodal reasoning, robotics, and agents, aiming to clarify design choices and identify future research directions for improved reasoning alignment. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Provides a structured overview of process-based reward modeling for LLMs, guiding future research in fine-grained reasoning alignment.
RANK_REASON This is a survey paper on a specific technique for improving LLM alignment.