PulseAugur
实时 12:07:52
English(EN) Language-Switching Triggers Take a Latent Detour Through Language Models

新的XAI框架解释了用于治理的大语言模型行为转变

一篇新论文介绍了一种比较XAI(XAIΔ)的框架,旨在解释大语言模型在经过微调或扩展等干预后的行为转变。目前的方法不足,因为它们将模型视为静态的,或者仅仅比较解释而没有详细说明过渡过程。这种方法旨在为模型修改提供一个原则性的因果链记录方式,这对于监管合规至关重要。 AI

影响 为解释大语言模型变化建立了新标准,这对于监管合规和模型审计至关重要。

排序理由 该集群包含一篇学术论文,提出了一种用于解释大语言模型行为转变的新研究框架。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

报道来源 [3]

  1. arXiv cs.CL TIER_1 English(EN) · Francis Kulumba, Wissam Antoun, Th\'eo Lasnier, Beno\^it Sagot, Djam\'e Seddah ·

    Language-Switching Triggers Take a Latent Detour Through Language Models

    arXiv:2605.18646v2 Announce Type: replace Abstract: Backdoor attacks on language models pose a growing security concern, yet the internal mechanisms by which a trigger sequence hijacks model computations remain poorly understood. We identify a circuit underlying a language-switch…

  2. arXiv cs.AI TIER_1 English(EN) · Martino Ciaperoni, Marzio Di Vece, Roberto Pellungrini, Luca Pappalardo, Fosca Giannotti, Francesco Giannini ·

    Comparing Explanations is Not Enough, Explain the Change: New Standards are Needed to Explain Behavioral Shifts in Large Language Models

    arXiv:2602.02304v2 Announce Type: replace Abstract: Large-scale foundation models exhibit \emph{behavioral shifts} when subjected to interventions such as scaling, fine-tuning, reinforcement learning with human feedback, or in-context learning. Current explainability methods are …

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    Language-Switching Triggers Take a Latent Detour Through Language Models

    Backdoor attacks on language models pose a growing security concern, yet the internal mechanisms by which a trigger sequence hijacks model computations remain poorly understood. We identify a circuit underlying a language-switching backdoor in an 8B-parameter autoregressive langu…