MVBench
PulseAugur coverage of MVBench — every cluster mentioning MVBench across labs, papers, and developer communities, ranked by signal.
1 天有情绪数据
-
ReTool-Video 通过递归工具使用增强视频代理
研究人员推出了 ReTool-Video,这是一种用于视频理解代理的新颖方法,可增强其推理能力。该方法利用一个包含 134 个专用工具的扩展工具库,包括用于过滤和聚合的元工具,以支持细粒度的组合推理。ReTool-Video 将高级视频意图递归地分解为可执行的工具链,从而实现动态参数修复和工具替换,以实现复杂的多模态操作。实验表明,ReTool-Video 在多个视频理解基准测试中优于现有基线。
-
VideoThinker framework improves lightweight MLLMs' video reasoning via causal debiasing
Researchers have developed VideoThinker, a novel framework designed to enhance the reasoning capabilities of lightweight multimodal language models (MLLMs) in video analysis. This approach addresses the issue of percept…
-
ReGATE method accelerates multimodal LLM training by selectively pruning tokens
Researchers have developed ReGATE, a novel method to accelerate the training of multimodal large language models (MLLMs) by adaptively pruning tokens. This technique uses a teacher-student framework where a frozen teach…
-
New PushupBench benchmark reveals VLMs struggle with counting repetitions
Researchers have introduced PushupBench, a new dataset designed to evaluate the ability of vision-language models (VLMs) to accurately count repetitions in videos. The benchmark highlights that even top-tier VLMs strugg…