SPADER: Step-wise Peer Advantage with Diversity-Aware Exploration Rewards for Multi-Answer Question Answering
Researchers have introduced SPADER, a new reinforcement learning framework designed to enhance the ability of large language models to answer complex questions that require multiple valid responses. This framework addresses challenges in assigning credit over long sequences of actions and in encouraging exploration of less common information. SPADER utilizes a novel step-wise credit assignment mechanism and a reward system that prioritizes discovering diverse, long-tail answers over redundant ones, showing improved performance on several multi-answer QA benchmarks. AI
IMPACT Enhances LLM capabilities for complex, multi-faceted queries, potentially improving information retrieval and agentic reasoning.