English(EN) Playing the network backward: A Game Theoretic Attribution Framework

博弈论框架重塑反向归因方法以实现AI模型可解释性

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-07 13:15

研究人员开发了一个新颖的博弈论框架，用于统一和比较用于解释AI模型预测的各种反向归因方法。该方法将归因重塑为一场双人博弈，允许将诸如局部性和鲁棒性等期望的解释属性整合为博弈论概念。该框架的一个适应性应用在ViT-B/16模型上，在局部性指标上表现优于现有的特定于Transformer的反向方法。 AI

影响引入了一个统一的归因方法框架，有望带来更鲁棒和可解释的AI模型。

排序理由这是一篇介绍AI模型可解释性新理论框架的研究论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Jakob Paul Zimmermann, Jim Berend, Georg Loho, Sebastian Lapuschkin, Wojciech Samek · 2026-05-08 04:00

Playing the network backward: A Game Theoretic Attribution Framework

arXiv:2605.06212v1 Announce Type: new Abstract: Attribution methods explain which input features drive a model's prediction, making them central to model debugging and mechanistic interpretability. Yet backward attribution methods, including gradients, LRP, and transformer-specif…
arXiv cs.CV TIER_1 English(EN) · Wojciech Samek · 2026-05-07 13:15

Playing the network backward: A Game Theoretic Attribution Framework

Attribution methods explain which input features drive a model's prediction, making them central to model debugging and mechanistic interpretability. Yet backward attribution methods, including gradients, LRP, and transformer-specific rules, lack a shared framework in which to co…

报道来源 [2]

Playing the network backward: A Game Theoretic Attribution Framework

Playing the network backward: A Game Theoretic Attribution Framework

相关实体

相关话题