New framework extracts emotion-cause pairs for richer video captions

By PulseAugur Editorial · [1 sources] · 2026-06-09 04:00

Researchers have developed a new framework for generating more accurate and emotionally rich video captions. This approach focuses on extracting fine-grained emotion-cause pairs within videos, rather than relying on global visual features which can lead to information redundancy. The proposed method enhances visual features by incorporating scene, object, and motion concepts, and refines emotional features using visual temporal dynamics and VAD-vector constraints. Experiments on three datasets showed significant improvements, including a 4.4% increase in BLEU-2 and a 5.4% increase in ROUGE-L on the EVC-MSVD dataset. AI

IMPACT Introduces a novel method for improving the accuracy and emotional depth of video captioning, potentially benefiting content analysis and accessibility tools.

RANK_REASON Academic paper published on arXiv detailing a new method for video captioning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Weidong Chen, Cheng Ye, Zhendong Mao, Liping Wang, Xinyan Liu, Yongdong Zhang · 2026-06-09 04:00

Towards Accurate Emotion-Attributed Video Captioning via Fine-grained Emotion-Cause Pair Extraction

arXiv:2606.08566v1 Announce Type: new Abstract: Emotional Video Captioning (EVC) is a challenging task that aims to generate factually accurate and emotionally rich descriptions for videos. Existing EVC methods leverage holistic visual features to mine global emotional cues, and …

COVERAGE [1]

Towards Accurate Emotion-Attributed Video Captioning via Fine-grained Emotion-Cause Pair Extraction

RELATED ENTITIES

RELATED TOPICS