Game theory framework recasts backward attribution methods for AI model interpretability

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-07 13:15

Researchers have developed a novel game-theoretic framework to unify and compare various backward attribution methods used for explaining AI model predictions. This approach recasts attribution as a two-player game, allowing desired explanation properties like localization and robustness to be integrated as game-theoretic concepts. One adaptation of this framework, applied to the ViT-B/16 model, demonstrated superior performance over existing transformer-specific backward methods on localization metrics. AI

影响 Introduces a unified framework for attribution methods, potentially leading to more robust and interpretable AI models.

排序理由 This is a research paper introducing a new theoretical framework for AI model interpretability.

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Jakob Paul Zimmermann, Jim Berend, Georg Loho, Sebastian Lapuschkin, Wojciech Samek · 2026-05-08 04:00

Playing the network backward: A Game Theoretic Attribution Framework

arXiv:2605.06212v1 Announce Type: new Abstract: Attribution methods explain which input features drive a model's prediction, making them central to model debugging and mechanistic interpretability. Yet backward attribution methods, including gradients, LRP, and transformer-specif…
arXiv cs.CV TIER_1 English(EN) · Wojciech Samek · 2026-05-07 13:15

Playing the network backward: A Game Theoretic Attribution Framework

Attribution methods explain which input features drive a model's prediction, making them central to model debugging and mechanistic interpretability. Yet backward attribution methods, including gradients, LRP, and transformer-specific rules, lack a shared framework in which to co…

报道来源 [2]

Playing the network backward: A Game Theoretic Attribution Framework

Playing the network backward: A Game Theoretic Attribution Framework

相关实体

相关话题