English(EN) Context Unrolling in Omni Models

Omni模型跨文本、图像、视频和3D展开上下文以进行多模态推理

作者 PulseAugur 编辑部 · [1 个来源] · 2026-04-23 17:58

研究人员推出Omni，这是一种新颖的多模态模型，专为跨文本、图像、视频和3D几何等不同数据类型的原生训练而设计。这种全面的训练方法促进了“上下文展开”，使模型能够在生成输出之前明确地跨不同模态表示进行推理。Omni在多模态生成和理解任务中均表现出增强的性能，展示了跨各种数据格式的高级推理能力。 AI

影响引入了一种新的多模态模型架构，可能改进跨模态推理和生成。

排序理由这是一篇描述新多模态模型及其能力的学术论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Haoqi Fan · 2026-04-23 17:58

Context Unrolling in Omni Models

We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We find that such training enables Context Unrolling, where the model explicitly reasons across multiple modal representati…

报道来源 [1]

Context Unrolling in Omni Models

相关实体

相关话题