English(EN) Multimodal Function Vectors for Visual Relations

研究人员在大型多模态模型中分离出视觉关系向量

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-02 04:00

研究人员已在大型多模态模型（LMMs）中识别出对处理视觉关系至关重要的特定注意力头。通过提取和操纵这些“函数向量”，他们可以提高模型在关系任务上的零样本准确率。这种方法允许在不改变LMM主参数的情况下对这些向量进行微调，其性能优于标准的上下文学习方法，并展示了强大的视觉类比问题泛化能力。 AI

影响增强了对LMMs内部工作原理的理解，并提供了一种改进关系推理的新方法。

排序理由学术论文，详细介绍了一种理解和操纵LMMs的新颖方法。[lever_c_demoted from research: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Shuhao Fu, Esther Goldberg, Ying Nian Wu, Hongjing Lu · 2026-06-02 04:00

Multimodal Function Vectors for Visual Relations

arXiv:2510.02528v2 Announce Type: replace Abstract: Large Multimodal Models (LMMs) demonstrate impressive in-context learning abilities from few multimodal demonstrations, yet the internal mechanisms supporting such task learning remain opaque. Building on prior work of Large Lan…