Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 10h

MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution

Researchers have developed a novel reinforcement learning technique called delayed per-step reward attribution, designed to overcome challenges in training language model agents for complex multi-agent interactions. This method allows for rewards to be computed and propagated only at the end of an episode, excluding invalid steps and ensuring stable, sample-efficient training. When applied to the MindGames Arena benchmark, an 8-billion-parameter open-source model trained with this approach outperformed significantly larger proprietary systems, including GPT-5, securing first place in both open and efficient tracks. AI

IMPACT Demonstrates a new method for training AI agents in complex environments, potentially improving performance in multi-agent strategic interactions.

GPT-5
NeurIPS 2025
In2AI
MindGames Arena
Aliaksei Korshuk