MLVU
PulseAugur coverage of MLVU — every cluster mentioning MLVU across labs, papers, and developer communities, ranked by signal.
2 天有情绪数据
-
Video-o3框架通过迭代式线索探寻增强长视频推理能力
研究人员开发了Video-o3,一个旨在通过迭代式发现相关视觉线索和细粒度检查关键片段来提高长视频理解能力的新框架。该系统通过使用任务解耦注意力掩码(Task-Decoupled Attention Masking)来分离推理和工具调用,同时保留全局上下文,从而解决了多模态模型在工具调用方面面临的挑战。为了管理上下文长度并提高效率,它采用了可验证轨迹引导奖励(Verifiable Trajectory-Guided Reward)机制…
-
ReTool-Video enhances video agents with recursive tool use
Researchers have introduced ReTool-Video, a novel approach for video understanding agents that enhances their reasoning capabilities. This method utilizes an expanded tool library with 134 specialized tools, including m…
-
New AI methods enhance video reasoning by structuring and selecting visual evidence
Researchers are developing new methods to improve how large vision-language models (VLMs) understand and reason about long videos. Several papers introduce techniques for more efficient frame selection and evidence gath…
-
New QEVA metric offers reference-free video summarization evaluation
Researchers have introduced QEVA, a novel reference-free metric designed to evaluate narrative video summarization. Unlike previous methods that rely on human-written summaries, QEVA assesses summaries by comparing them…