English(EN) MiMIC: Mitigating Visual Modality Collapse in Universal Multimodal Retrieval While Avoiding Semantic Misalignment

MiMIC论文解决了多模态检索中的视觉模态坍塌问题

作者 PulseAugur 编辑部 · [2 个来源] · 2026-04-23 06:29

研究人员开发了MiMIC，一种用于通用多模态检索（UMR）的新方法，解决了视觉模态坍塌和语义不对齐的问题。与早期或晚期融合模态的先前方法不同，MiMIC采用了解码器内融合架构。它还结合了强大的训练技术，包括单模态混合和随机字幕丢弃，以提高在WebQA+和EVQA+等数据集上的性能。 AI

影响为多模态检索系统引入了新的架构和训练策略，有望提高涉及混合视觉和文本数据的任务的性能。

排序理由这是一篇详细介绍多模态检索新方法的学术论文。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-04-23 06:29

MiMIC：在通用多模态检索中缓解视觉模态坍塌并避免语义失调

Universal Multimodal Retrieval (UMR) aims to map different modalities (e.g., visual and textual) into a shared embedding space for multi-modal retrieval. Existing UMR methods can be broadly divided into two categories: early-fusion approaches, such as Marvel, which projects visua…
arXiv cs.CV TIER_1 English(EN) · Cam-Tu Nguyen · 2026-04-23 06:29

MiMIC：在避免语义失调的同时减轻通用多模态检索中的视觉模态崩溃

Universal Multimodal Retrieval (UMR) aims to map different modalities (e.g., visual and textual) into a shared embedding space for multi-modal retrieval. Existing UMR methods can be broadly divided into two categories: early-fusion approaches, such as Marvel, which projects visua…