English(EN) Native Active Perception as Reasoning for Omni-Modal Understanding

OmniAgent 使用主动感知进行高效视频理解 · 已追踪 2 个来源

作者 PulseAugur 编辑部 · [3 个来源] · 2026-06-17 00:00

研究人员推出 OmniAgent，这是一种新颖的全模态智能体，用于视频理解，它利用基于部分可观察马尔可夫决策过程 (POMDP) 的迭代式观察-思考-行动 (Observation-Thought-Action) 循环。这种方法允许智能体选择性地将视听线索提炼成文本记忆，从而将推理复杂性与原始视频时长解耦，提高计算效率。该论文详细介绍了两种关键的训练方法：用于引导主动感知的智能体监督微调 (Agentic Supervised Fine-Tuning) 和用于优化信用分配的带 TAURA 的智能体强化学习 (Agentic Reinforcement Learning with TAURA)。OmniAgent 在 LVBench 等基准测试中展示了最先进的性能，优于 Qwen2.5-VL-72B 等大型模型。 AI

影响通过选择性地处理信息，引入了一种更高效的视频理解方法，有可能降低长篇内容分析的计算成本。

排序理由该集群包含一篇详细介绍新模型和方法的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.CL TIER_1 English(EN) · Zhenghao Xing, Ruiyang Xu, Yuxuan Wang, Jinzheng He, Ziyang Ma, Qize Yang, Yunfei Chu, Jin Xu, Junyang Lin, Chi-Wing Fu, Pheng-Ann Heng · 2026-06-18 04:00

Native Active Perception as Reasoning for Omni-Modal Understanding

arXiv:2606.19341v1 Announce Type: cross Abstract: Passive models for long video understanding typically rely on a "watch-it-all" paradigm, processing frames uniformly regardless of query difficulty, causing computational cost to grow with video duration. Although interactive fram…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-17 00:00

Native Active Perception as Reasoning for Omni-Modal Understanding

OmniAgent is a novel omni-modal agent that addresses long video understanding by using an iterative observation-thought-action cycle with active perception, achieving superior performance compared to larger models through efficient selective processing.
arXiv cs.CV TIER_1 English(EN) · Pheng-Ann Heng · 2026-06-17 17:59

Native Active Perception as Reasoning for Omni-Modal Understanding

Passive models for long video understanding typically rely on a "watch-it-all" paradigm, processing frames uniformly regardless of query difficulty, causing computational cost to grow with video duration. Although interactive frameworks have emerged, they often rely on global pre…

报道来源 [3]

Native Active Perception as Reasoning for Omni-Modal Understanding

Native Active Perception as Reasoning for Omni-Modal Understanding

Native Active Perception as Reasoning for Omni-Modal Understanding

相关实体

相关话题