Expected Value Alignment for Generative Reward Modeling in Formal Mathematics Verification
Researchers have introduced Expected Value Alignment (EVA), a new procedure for training reward models used with large language models in formal mathematics verification. EVA addresses a trade-off in existing models by extracting continuous scores from a model's token distribution while preserving discrete textual rationales. This method was implemented in a model called Leibniz for Lean 4 formal verification, showing reduced discretization artifacts compared to baseline approaches. AI
IMPACT This new method could improve the accuracy and interpretability of AI systems used in formal mathematical reasoning.