English(EN) Attributions All the Way Down? The Metagame of Interpretability

新的“元游戏”框架量化AI模型解释中的二阶效应

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-07 13:59

研究人员引入了一个名为“元游戏”的新框架，用于量化模型解释中的二阶交互效应。该框架通过将归因方法视为一个合作博弈并计算其Shapley值，来衡量一个特征的归因对另一个特征的定向影响。元游戏理论上表明归因可以被分层分解为元归因，并在实践中证明了其在分析语言模型中的token交互、视觉-语言模型中的跨模态相似性以及文本到图像Transformer中的概念方面的效用。 AI

影响引入了一种分析AI模型解释中复杂交互的新方法，有望提高透明度和调试能力。

排序理由该集群包含一篇详细介绍AI模型新可解释性框架的学术论文。

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Hubert Baniecki, Przemyslaw Biecek, Fabian Fumagalli · 2026-05-08 04:00

Attributions All the Way Down? The Metagame of Interpretability

arXiv:2605.06295v1 Announce Type: new Abstract: We introduce the metagame, a conceptual framework for quantifying second-order interaction effects of model explanations. For any first-order attribution $\phi(f)$ explaining a model $f$, we measure the directional influence of feat…
arXiv stat.ML TIER_1 English(EN) · Fabian Fumagalli · 2026-05-07 13:59

Attributions All the Way Down? The Metagame of Interpretability

We introduce the metagame, a conceptual framework for quantifying second-order interaction effects of model explanations. For any first-order attribution $φ(f)$ explaining a model $f$, we measure the directional influence of feature $j$ on the attribution of feature $i$, denoted …

报道来源 [2]

Attributions All the Way Down? The Metagame of Interpretability

Attributions All the Way Down? The Metagame of Interpretability

相关实体

相关话题