Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 11h

SPADER: Step-wise Peer Advantage with Diversity-Aware Exploration Rewards for Multi-Answer Question Answering

Researchers have introduced SPADER, a new reinforcement learning framework designed to enhance the ability of large language models to answer complex questions that require multiple valid responses. This framework addresses challenges in assigning credit over long sequences of actions and in encouraging exploration of less common information. SPADER utilizes a novel step-wise credit assignment mechanism and a reward system that prioritizes discovering diverse, long-tail answers over redundant ones, showing improved performance on several multi-answer QA benchmarks. AI

IMPACT Enhances LLM capabilities for complex, multi-faceted queries, potentially improving information retrieval and agentic reasoning.

QUEST
Large language models
WebQSP
Mintaka
SPADER
QAMPARI