Researchers develop DecAF for training-free video reasoning segmentation

作者 PulseAugur 编辑部 · [1 个来源] · 2026-04-27 04:00

Researchers have developed Decomposed Attention Fusion (DecAF), a novel method for video reasoning segmentation that operates without requiring model retraining. DecAF refines attention maps generated by multimodal large language models (MLLMs) by contrasting object and background activations and fusing complementary frame-level attention. This approach allows for the direct conversion of attention maps into segmentation masks, achieving performance comparable to training-based methods on video object segmentation benchmarks. AI

影响 Enables training-free video segmentation by refining MLLM attention maps, potentially simplifying deployment for video analysis tasks.

排序理由 This is a research paper detailing a new method for video reasoning segmentation.

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Su Ho Han, Jeongseok Hyun, Pilhyeon Lee, Minho Shim, Dongyoon Wee, Seon Joo Kim · 2026-04-27 04:00

Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation

arXiv:2510.19592v2 Announce Type: replace Abstract: Multimodal large language models (MLLMs) demonstrate strong video understanding by attending to visual tokens relevant to textual queries. To directly adapt this for localization in a training-free manner, we cast video reasonin…

报道来源 [1]

Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation

相关实体

相关话题