PulseAugur
实时 17:55:52
English(EN) EMAgnet: Parameter-Space EMA Regularization for Policy Gradient Self-Play in Large Games

EMAgnet 引入自适应正则化以实现策略梯度自玩

研究人员开发了EMAgnet,一种新颖的参数空间指数移动平均(EMA)正则化技术,用于大型游戏的策略梯度自玩。与之前使用均匀分布作为正则化目标的传统方法不同,EMAgnet根据智能体不断演变的策略来调整其目标。这种方法在各种基准测试中表现出更高的性能,实现了更低的被利用性,尤其是在具有严格占优策略的游戏中。 AI

影响 EMAgnet 的自适应正则化可能会提高复杂游戏环境中 AI 智能体的性能,并可能影响游戏理论和强化学习领域的未来研究。

排序理由 该集群包含一篇详细介绍 AI 自玩新方法的学术论文。

在 arXiv cs.MA (Multiagent) 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

EMAgnet 引入自适应正则化以实现策略梯度自玩

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Tristan Maidment, JB Lanier, Chase McDonald, Nathan Tsang, Eugene Vinitsky, Roy Fox, Albert Wang, Wesley N. Kerr ·

    EMAgnet: Parameter-Space EMA Regularization for Policy Gradient Self-Play in Large Games

    arXiv:2606.23995v1 Announce Type: cross Abstract: Recent work has established that regularized policy gradient methods such as PPO, when used in self-play, can match or exceed specialized game-theoretic algorithms for solving two-player zero-sum imperfect-information games. The u…

  2. arXiv cs.MA (Multiagent) TIER_1 English(EN) · Wesley N. Kerr ·

    EMAgnet: Parameter-Space EMA Regularization for Policy Gradient Self-Play in Large Games

    Recent work has established that regularized policy gradient methods such as PPO, when used in self-play, can match or exceed specialized game-theoretic algorithms for solving two-player zero-sum imperfect-information games. The uniform distribution has emerged as a strong policy…