English(EN) Harnessing Textual Refusal Directions for Multimodal Safety

新的MARS方法利用文本拒绝指令增强多模态LLM安全性

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-30 15:57

研究人员开发了一种名为MARS（Modality-Agnostic Refusal Steering，跨模态无关拒绝引导）的新方法，以增强多模态大语言模型（MLLMs）的安全性。MARS利用通常用于单模态LLM的文本拒绝指令，在无需不安全的多模态训练数据的情况下提高安全性。该方法解决了跨模态对齐问题，并在保持效用的同时，在各种基准测试中持续展示了安全性的提升。 AI

影响这项研究通过在无需大量专业安全数据的情况下实现对齐，有望带来更安全、更强大的多模态AI系统。

排序理由该集群包含一篇详细介绍AI安全新方法的论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Moreno D'Inc\`a, Massimiliano Mancini, Nicu Sebe · 2026-07-01 04:00

Harnessing Textual Refusal Directions for Multimodal Safety

arXiv:2606.31876v1 Announce Type: new Abstract: To improve safety in Large Language Models (LLMs) we can either perform post-training alignment or exploit refusal directions in the activation space. Both strategies are less feasible in Multimodal LLMs (MLLMs) as they require unsa…
arXiv cs.CV TIER_1 English(EN) · Nicu Sebe · 2026-06-30 15:57

Harnessing Textual Refusal Directions for Multimodal Safety

To improve safety in Large Language Models (LLMs) we can either perform post-training alignment or exploit refusal directions in the activation space. Both strategies are less feasible in Multimodal LLMs (MLLMs) as they require unsafe multimodal data, harder to collect than their…

报道来源 [2]

Harnessing Textual Refusal Directions for Multimodal Safety

Harnessing Textual Refusal Directions for Multimodal Safety

相关实体

相关话题