Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 1d

Distilling Counterfactual Reasoning from Language to Vision: Causal Graph Guided Post-Training for Video Understanding

Researchers have introduced CounterVQA, a new benchmark designed to evaluate the counterfactual reasoning capabilities of Vision Language Models (VLMs). Current state-of-the-art models show a significant performance gap, struggling with complex causal chains despite reasonable accuracy on simpler questions. To address this, a post-training method called CFGPT has been developed, which enhances visual counterfactual reasoning by distilling knowledge from the language modality. AI

IMPACT Highlights a critical gap in VLM reasoning, potentially guiding future model development towards more robust causal understanding.

Vision Language Models
Yuefei Chen
CFGPT
CounterVQA