English(EN) Can Global XAI Methods Reveal Injected Behaviours in LLMs? SHAP vs Rule Extraction vs RuleSHAP

新的RuleSHAP方法揭示了LLMs中的注入行为

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-09 04:00

研究人员开发了一种名为RuleSHAP的新方法，以更好地检测和理解大型语言模型（LLMs）中的注入行为。该技术结合了全局SHAP聚合和规则归纳，与RuleFit和单独的全局SHAP等现有方法相比，显著提高了对复杂、非单变量触发器的识别能力。研究表明，RuleSHAP在揭示可能导致错误信息的驱动信念的启发式方法方面非常有效，与RuleFit相比，MRR@1提高了82%。 AI

影响提供了一种新颖的方法来检测和理解LLMs中潜在的偏见或错误信息触发器。

排序理由该集群包含一篇详细介绍分析LLM行为新方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Francesco Sovrano · 2026-06-09 04:00

全球XAI方法能否揭示LLM中的注入行为？SHAP vs 规则提取 vs RuleSHAP

arXiv:2505.11189v3 Announce Type: replace Abstract: Large language models (LLMs) can amplify misinformation, undermining societal goals such as the UN SDGs. We study three documented drivers of misinformation (valence framing, information overload, and oversimplification) often s…

报道来源 [1]

全球XAI方法能否揭示LLM中的注入行为？SHAP vs 规则提取 vs RuleSHAP

相关实体

相关话题