新方法增强多模态LLM的持续学习能力

作者 PulseAugur 编辑部 · [9 个来源] · 2026-06-01 17:11

研究人员正在开发新的多模态持续指令调优方法，以提高大型语言模型的效率和性能。一种名为CRAM的方法使用质心路由和自适应专家混合（MoE）来隔离特定任务的模式并有效分配参数，从而缓解灾难性遗忘。另一种方法ProtoAda采用原型引导的自适应调优，并结合了面向格式的任务原型来改进路由和参数整合。此外，一个名为PROXY-MIX的框架在一个小型代理模型上学习一个动态重放控制器，并将其转移到更大的模型上，以在持续调优过程中保留能力和对齐行为。 AI

影响这些进展旨在通过提高多模态LLM在不遗忘先前任务的情况下学习新任务的能力，使其在实际应用中更具适应性和效率。

排序理由多篇研究论文介绍了多模态持续指令调优的新颖方法。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 9 个来源。我们如何撰写摘要 →

报道来源 [9]

arXiv cs.AI TIER_1 English(EN) · Wayner Barrios, Andr\'es Villa, Juan Le\'on Alc\'azar, SouYoung Jin, Bernard Ghanem · 2026-06-08 04:00

MoDA：用于指令型多模态大语言模型中细粒度视觉定位的调制适配器

arXiv:2506.01850v2 Announce Type: replace-cross Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable success in instruction-following tasks by integrating pretrained visual encoders with large language models (LLMs). However, existing approaches often strug…
arXiv cs.CL TIER_1 English(EN) · Luis Palacios, Lorenzo Basile, Diego Doimo, Alberto Cazzaniga · 2026-06-03 04:00

Visual Instruction Tuning 通过抽象对齐模态

arXiv:2606.03871v1 Announce Type: cross Abstract: Visual instruction tuning effectively adapts a pre-trained Large Language Model (LLM) to process image information alongside text. Yet, it remains unclear how visual features are embedded into the layer-wise hierarchy of abstracti…
arXiv cs.CL TIER_1 English(EN) · Alberto Cazzaniga · 2026-06-02 16:42

Visual Instruction Tuning Aligns Modalities through Abstraction

Visual instruction tuning effectively adapts a pre-trained Large Language Model (LLM) to process image information alongside text. Yet, it remains unclear how visual features are embedded into the layer-wise hierarchy of abstractions of the LLM backbone. Across a diverse set of v…
arXiv cs.CL TIER_1 English(EN) · Jun-Tao Tang, Zhen-Hao Xie, Yu-Cheng Shi, Da-Wei Zhou · 2026-06-02 04:00

CRAM：用于多模态持续指令调优的质心路由和自适应MoE

arXiv:2606.02502v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) unify heterogeneous vision-language tasks under a shared generative framework via instruction tuning, yet real-world deployment demands continuous capability expansion, making Multimodal Cont…
arXiv cs.LG TIER_1 English(EN) · Ibne Farabi Shihab, Fariya Afrin, Anuj Sharma · 2026-06-02 04:00

动态代理混合：将回放控制器从小模型转移到大模型以进行持续指令调优

arXiv:2606.00400v1 Announce Type: new Abstract: Continual instruction tuning updates a language model through a sequence of new domains, yet each update can progressively erode previously learned capabilities and alignment behavior. Replay is the standard mitigation, but fixed re…
arXiv cs.LG TIER_1 English(EN) · Yu-Cheng Shi, Zhen-Hao Xie, Jun-Tao Tang, Da-Wei Zhou · 2026-06-02 04:00

ProtoAda：面向多模态持续指令微调的原型引导自适应适配器扩展与几何一致性巩固

arXiv:2606.02576v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) achieve strong performance through instruction tuning, but real-world deployment requires them to continually acquire new vision-language capabilities, making Multimodal Continual Instructi…
arXiv cs.LG TIER_1 English(EN) · Da-Wei Zhou · 2026-06-01 17:59

ProtoAda：面向多模态持续指令调优的原型引导自适应适配器扩展与几何一致性巩固

Multimodal Large Language Models (MLLMs) achieve strong performance through instruction tuning, but real-world deployment requires them to continually acquire new vision-language capabilities, making Multimodal Continual Instruction Tuning (MCIT) essential. To reduce inter-task i…
arXiv cs.CL TIER_1 English(EN) · Da-Wei Zhou · 2026-06-01 17:11

CRAM：用于多模态持续指令调优的质心路由和自适应MoE

Multimodal Large Language Models (MLLMs) unify heterogeneous vision-language tasks under a shared generative framework via instruction tuning, yet real-world deployment demands continuous capability expansion, making Multimodal Continual Instruction Tuning (MCIT) essential. Exist…
arXiv cs.CV TIER_1 English(EN) · Ziqi Wang, Chang Che, Qi Wang, Hui Ma, Zenglin Shi, Cees G. M. Snoek, Meng Wang · 2026-06-05 04:00

面向安全对齐的多模态大语言模型持续视觉指令微调中的和谐参数自适应

arXiv:2511.20158v2 Announce Type: replace Abstract: While continual visual instruction tuning (CVIT) has shown promise in adapting multimodal large language models (MLLMs), existing studies predominantly focus on models without safety alignment. This critical oversight ignores th…

报道来源 [9]

相关实体

相关话题