PulseAugur
LIVE 13:12:43
research · [4 sources] ·
0
research

HeyGen Research releases TransVLM for shot transition detection and TAVR for talking avatars

Researchers have developed TransVLM, a vision-language model framework designed to detect shot transitions in videos by incorporating optical flow to better understand temporal dynamics. This approach moves beyond traditional methods that focus on isolated cut points, aiming to identify continuous segments of transitions. The framework has been deployed to production and is accompanied by a new benchmark for shot transition detection. AI

Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →

IMPACT Introduces a novel VLM approach for video analysis, potentially improving content moderation and editing tools.

RANK_REASON Academic paper introducing a new framework and benchmark for a specific computer vision task.

Read on arXiv cs.CV →

COVERAGE [4]

  1. arXiv cs.AI TIER_1 · Ce Chen, Yi Ren, Yuanming Li, Viktor Goriachko, Zhenhui Ye, Zujin Guo, Zhibin Hong, Mingming Gong ·

    TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions

    arXiv:2604.27975v1 Announce Type: cross Abstract: Traditional Shot Boundary Detection (SBD) inherently struggles with complex transitions by formulating the task around isolated cut points, frequently yielding corrupted video shots. We address this fundamental limitation by forma…

  2. arXiv cs.CV TIER_1 · Zujin Guo, Zhenhui Ye, Yi Ren, Yuanming Li, Ce Chen, Zhibin Hong, Chen Change Loy ·

    Generate Your Talking Avatar from Video Reference

    arXiv:2604.27918v1 Announce Type: new Abstract: Existing talking avatar methods typically adopt an image-to-video pipeline conditioned on a static reference image within the same scene as the target generation. This restricted, single-view perspective lacks sufficient temporal an…

  3. arXiv cs.CV TIER_1 · Mingming Gong ·

    TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions

    Traditional Shot Boundary Detection (SBD) inherently struggles with complex transitions by formulating the task around isolated cut points, frequently yielding corrupted video shots. We address this fundamental limitation by formalizing the Shot Transition Detection (STD) task. R…

  4. arXiv cs.CV TIER_1 · Chen Change Loy ·

    Generate Your Talking Avatar from Video Reference

    Existing talking avatar methods typically adopt an image-to-video pipeline conditioned on a static reference image within the same scene as the target generation. This restricted, single-view perspective lacks sufficient temporal and expression cues, limiting the ability to synth…