English(EN) COMPASS: Grounding Composition-Intent Guidance in Unified Multimodal Models

新框架增强了用于组合和自我奖励的统一多模态模型

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-30 04:00

两篇新研究论文介绍了一些改进统一多模态模型（UMMs）的框架。第一篇，COMPASS，通过将组合专业知识整合到模型的骨干网络中，并使用共享的感知和生成令牌，来专注于实现组合意图引导的对齐。第二篇，SRUM，采用了一种细粒度的自我奖励机制，其中UMM的理解模块向其生成模块提供纠正信号，从而在没有外部数据的情况下提高整体保真度和对象级准确性。 AI

影响这些框架旨在提高多模态AI的可控性和自我纠正能力，有可能从文本提示中生成更准确、更忠实图像。

排序理由两篇在arXiv上发表的学术论文，详细介绍了多模态模型的新框架。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Ziqi Zhou, Weize Quan, Mining Tan, Zhihan Chen, Dandan Zheng, Jingdong Chen, Jun Zhou, Weiming Dong, Dong-Ming Yan · 2026-06-30 04:00

COMPASS: Grounding Composition-Intent Guidance in Unified Multimodal Models

arXiv:2606.28696v1 Announce Type: new Abstract: Composition is a high-level visual intent that governs where subjects are placed and how a scene is organized, yet current unified multimodal models remain unreliable at fine-grained composition recognition and struggle to turn such…
arXiv cs.CL TIER_1 English(EN) · Weiyang Jin, Yuwei Niu, Jiaqi Liao, Chengqi Duan, Aoxue Li, Shenghua Gao, Xihui Liu · 2026-06-30 04:00

SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models

arXiv:2510.12784v2 Announce Type: replace-cross Abstract: Recently, remarkable progress has been made in Unified Multimodal Models (UMMs), which integrate vision-language generation and understanding capabilities within a single framework. However, a model's strong visual underst…

报道来源 [2]

COMPASS: Grounding Composition-Intent Guidance in Unified Multimodal Models

SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models

相关实体

相关话题