English(EN) When Does Routing Become Interpretable? Causal Probes on Block Attention Residuals

AI路由可解释性：块注意力残差暴露不足以揭示机制

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-11 10:37

研究人员调查了AI模型中路由机制的可解释性，特别关注块注意力残差（Block AttnRes）。他们的研究使用了两个Qwen3检查点的因果探测，一个是从头开始训练的，将路由作为优化组件；另一个是通过确定性计划模拟路由。研究结果表明，虽然块注意力残差将路由暴露为可检查的张量，但这种暴露本身不足以进行机制性解释。结构化深度路由仅在其作为训练过程的一部分时出现，即使如此，路由摘要也应被视为需要因果干预来验证的假设。 AI

影响研究AI模型的可解释性对于理解和信任复杂系统至关重要，有望带来更强大、更可靠的AI。

排序理由该集群包含一篇学术论文，详细介绍了AI模型可解释性的新研究方法和发现。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Aydin Javadov · 2026-06-11 10:37

When Does Routing Become Interpretable? Causal Probes on Block Attention Residuals

Block Attention Residuals (Block AttnRes) by replace fixed additive residuals with a learned softmax over earlier depth-source representations, surfacing cross-layer routing as an inspectable tensor in the forward pass. This is a tempting interpretability target: information flow…

报道来源 [1]

When Does Routing Become Interpretable? Causal Probes on Block Attention Residuals

相关实体

相关话题