Who Deserves the Reward? SHARP: Shapley Credit-based Optimization for Multi-Agent System
Researchers have developed a new framework called SHARP to improve the training of multi-agent systems that integrate large language models with external tools. This method addresses the challenge of assigning credit to individual agents for successful outcomes, which is crucial for efficient learning. SHARP utilizes a decomposed reward mechanism, including a Shapley-based marginal-credit reward, to precisely attribute contributions and stabilize training. Experiments show SHARP significantly outperforms existing methods, achieving substantial improvements in accuracy and efficiency. AI
IMPACT Enhances training efficiency for complex multi-agent LLM systems, potentially accelerating their adoption in real-world problem-solving.