PulseAugur
实时 01:56:12
English(EN) Playing the network backward: A Game Theoretic Attribution Framework

博弈论框架重塑反向归因方法以实现AI模型可解释性

研究人员开发了一个新颖的博弈论框架,用于统一和比较用于解释AI模型预测的各种反向归因方法。该方法将归因重塑为一场双人博弈,允许将诸如局部性和鲁棒性等期望的解释属性整合为博弈论概念。该框架的一个适应性应用在ViT-B/16模型上,在局部性指标上表现优于现有的特定于Transformer的反向方法。 AI

影响 引入了一个统一的归因方法框架,有望带来更鲁棒和可解释的AI模型。

排序理由 这是一篇介绍AI模型可解释性新理论框架的研究论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

博弈论框架重塑反向归因方法以实现AI模型可解释性

报道来源 [2]

  1. arXiv cs.LG TIER_1 English(EN) · Jakob Paul Zimmermann, Jim Berend, Georg Loho, Sebastian Lapuschkin, Wojciech Samek ·

    Playing the network backward: A Game Theoretic Attribution Framework

    arXiv:2605.06212v1 Announce Type: new Abstract: Attribution methods explain which input features drive a model's prediction, making them central to model debugging and mechanistic interpretability. Yet backward attribution methods, including gradients, LRP, and transformer-specif…

  2. arXiv cs.CV TIER_1 English(EN) · Wojciech Samek ·

    Playing the network backward: A Game Theoretic Attribution Framework

    Attribution methods explain which input features drive a model's prediction, making them central to model debugging and mechanistic interpretability. Yet backward attribution methods, including gradients, LRP, and transformer-specific rules, lack a shared framework in which to co…