Survey details process reward models for fine-grained LLM reasoning alignment

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

This survey paper systematically reviews Process Reward Models (PRMs), which evaluate and guide Large Language Models (LLMs) at the reasoning step or trajectory level, unlike traditional outcome-based models. It details methods for generating process data, constructing PRMs, and utilizing them for reinforcement learning and test-time scaling. The paper covers applications in diverse areas such as mathematics, coding, text, multimodal reasoning, robotics, and agents, aiming to clarify design choices and identify future research directions for improved reasoning alignment. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Provides a structured overview of process-based reward modeling for LLMs, guiding future research in fine-grained reasoning alignment.

RANK_REASON This is a survey paper on a specific technique for improving LLM alignment.

Read on arXiv cs.CL →

paper
safety

COVERAGE [1]

arXiv cs.CL TIER_1 · Congmin Zheng, Jiachen Zhu, Zhuoying Ou, Yuxiang Chen, Kangning Zhang, Rong Shan, Zeyu Zheng, Mengyue Yang, Jianghao Lin, Yong Yu, Weinan Zhang · 2026-04-30 04:00

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

arXiv:2510.08049v3 Announce Type: replace Abstract: Although Large Language Models (LLMs) exhibit advanced reasoning ability, conventional alignment remains largely dominated by outcome reward models (ORMs) that judge only final answers. Process Reward Models(PRMs) address this g…

COVERAGE [1]

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

RELATED ENTITIES

RELATED TOPICS