PulseAugur
EN
LIVE 08:43:42

MERVIN framework enhances Vietnamese news video event retrieval

Researchers have developed MERVIN, a unified multimodal framework designed for event retrieval in Vietnamese news videos. This system integrates visual features, transcripts, and video summaries, enhancing transcript quality with Gemini 1.5 Flash and using a Perception Encoder for visual data. MERVIN achieved high scores in the AI Challenge HCMC 2025, successfully retrieving all query results in the final round. AI

IMPACT This framework could improve how users search and retrieve specific events from large archives of Vietnamese news videos.

RANK_REASON The cluster describes a research paper detailing a new framework and its performance in a competition. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.IR (Information Retrieval) →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Trung-Hieu Truong-Le ·

    MERVIN: A Unified Framework for Multimodal Event Retrieval in Vietnamese News Videos

    The growth of online video platforms drives the need for effective, semantically grounded event retrieval. We present MERVIN, a unified multimodal framework for Vietnamese news videos that integrates keyframes, transcripts, and video summaries. Transcript quality is enhanced via …