实体 Process Reward Model

Process Reward Model

PulseAugur coverage of Process Reward Model — every cluster mentioning Process Reward Model across labs, papers, and developer communities, ranked by signal.

Show in brief

总计 · 30天

90 天内 2

发布 · 30天

90 天内 0

论文 · 30天

90 天内 2

层级分布 · 90 天

情绪 · 30 天

1 天有情绪数据

最近 · 第 1/1 页 · 共 2 条

RESEARCH · CL_44959 · May 11 · 00:00

新的VRPRM模型利用视觉线索增强LLM推理能力

研究人员开发了VRPRM，一种新颖的过程奖励模型，它利用视觉推理来增强大型语言模型（LLM）推理步骤的细粒度评估。这种方法显著降低了此类模型训练通常需要的数据标注成本。与传统的非思考PRM相比，VRPRM表现出更优越的性能，仅用一小部分训练数据就取得了实质性改进。
RESEARCH · CL_15892 · May 4 · 08:51

New method debiases LLMs at decoding time, improving fairness without model retraining

Researchers have developed a novel method to mitigate biases in large language models during the decoding phase, without altering the model's weights. This approach uses a separate Process Reward Model (PRM) to score to…

新的VRPRM模型利用视觉线索增强LLM推理能力

New method debiases LLMs at decoding time, improving fairness without model retraining