新 ConsistRoll 方法通过跨视图一致性增强多模态推理能力

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-30 04:00

研究人员推出了一种名为 ConsistRoll 的新方法，旨在通过强制执行跨视图一致性来增强大型语言模型的多模态推理能力。该方法确保同一实例的语义不变视图产生一致的答案，从而解决了标准可验证奖励强化学习 (RLVR) 目标中的一个局限性。ConsistRoll 通过将原始视图和转换视图分组，仅当两个视图都正确且一致时才分配联合奖励，从而将此一致性偏差整合到 RLVR 训练中，在不增加额外生成开销或标注的情况下提高了各种推理领域的性能。 AI

影响该方法通过确保对同一数据的不同视图产生一致的输出，有望实现更强大、更可靠的多模态人工智能系统。

排序理由该集群包含一篇详细介绍多模态推理新方法的论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Xin Zou, Haolin Deng, Yibo Yan, Shuliang Liu, Kening Zheng, Zhiwei Jin, Chen Chen, Haonan Lu, Xuming Hu · 2026-06-30 04:00

Consistency as Inductive Bias: Learning Cross-View Invariance for Robust Multimodal Reasoning

arXiv:2606.29812v1 Announce Type: new Abstract: Inductive biases steer learning toward generalizable solutions by encoding task structure. In this work, we identify a crucial missing bias in MLLMs: cross-view consistency, \textit{i.e.}, semantically invariant views of the same in…

报道来源 [1]

Consistency as Inductive Bias: Learning Cross-View Invariance for Robust Multimodal Reasoning

相关实体

相关话题