ENTITY RewardBench 2 Factuality

RewardBench 2 Factuality

PulseAugur coverage of RewardBench 2 Factuality — every cluster mentioning RewardBench 2 Factuality across labs, papers, and developer communities, ranked by signal.

Total · 30d

1 over 90d

Releases · 30d

0 over 90d

Papers · 30d

1 over 90d

TIER MIX · 90D

RECENT · PAGE 1/1 · 1 TOTAL

TOOL · CL_18886 · May 6 · 04:00

LLM judges gain reliability with permutation-consensus factuality evaluation

Researchers have developed PCFJudge, a new method to improve the reliability of Large Language Model (LLM) factuality evaluations. This technique addresses the issue of candidate-order sensitivity, where the order in wh…

LLM judges gain reliability with permutation-consensus factuality evaluation