New Rigel metric enhances image and video captioning evaluation

By PulseAugur Editorial · [1 sources] · 2026-06-30 04:00

Researchers have introduced Rigel, a novel metric for evaluating image and video captioning systems that aims to improve alignment with human judgments. Rigel utilizes a self-distilled score adaptation approach, employing an evaluation-specific scoring head derived from a large language model (LLM) to capture task-aligned signals without relying on large vocabulary sets. The metric's backbone is further refined using human judgment data, and its effectiveness is demonstrated through the creation of the Vid-Lepus dataset. Experiments show Rigel significantly outperforms existing metrics, achieving substantial improvements on benchmarks like ActivityNet-Fact. AI

IMPACT This new metric could lead to more accurate benchmarking of image and video captioning models, driving progress in multimodal AI development.

RANK_REASON The cluster describes a new academic paper introducing a novel evaluation metric for multimodal AI systems. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New Rigel metric enhances image and video captioning evaluation

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Shuitsu Koyama, Kazuki Matsuda, Yuiga Wada, Shinnosuke Hirano, Daichi Yashima, Komei Sugiura · 2026-06-30 04:00

Rigel: Self-Distilled Score Adaptation for Image and Video Captioning Evaluation

arXiv:2606.29997v1 Announce Type: new Abstract: Automatic evaluation of image and video captioning is essential for benchmarking multimodal systems, although standard evaluation metrics show limited alignment with human judgments. Recent approaches using large language models (LL…

COVERAGE [1]

Rigel: Self-Distilled Score Adaptation for Image and Video Captioning Evaluation

RELATED ENTITIES

RELATED TOPICS