English(EN) GARIP: A Running-Average Moving Reference for Last-Iterate Self-Play in Two-Player Zero-Sum Games

新GARIP方法增强零和博弈中自博弈的收敛性

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-21 22:08

研究人员推出了一种新颖的用于改进双人零和博弈中自博弈的方法，名为GARIP。与使用固定或定期更新参考的方法不同，GARIP利用过去策略的运行平均值。该方法在理论上被证明可以最小化参考值的峰值滞后，从而实现更稳定的收敛。在包括矩阵博弈以及Connect Four和Othello等棋盘博弈在内的各种博弈上的实验表明，GARIP在鲁棒性和默认超参数设置方面，表现与现有方法相当或更优。 AI

影响这项研究可能带来更高效的竞争环境中AI智能体的训练。

排序理由学术论文，详细介绍了一种新的博弈论和AI方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.MA (Multiagent) 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Can Savcı · 2026-06-21 22:08

GARIP: A Running-Average Moving Reference for Last-Iterate Self-Play in Two-Player Zero-Sum Games

Self-play with naive gradient ascent cycles in two-player zero-sum games: the last iterate orbits the equilibrium. Modern methods restore last-iterate convergence by regularizing toward a reference policy -- MMD a fixed one (reaching only the regularized equilibrium), R-NaD a per…

报道来源 [1]

GARIP: A Running-Average Moving Reference for Last-Iterate Self-Play in Two-Player Zero-Sum Games

相关实体

相关话题