PEEK: Picking Essential frames via Efficient Knowledge distillation
Researchers have developed PEEK, an efficient method for selecting essential frames from videos for captioning. This technique distills knowledge from a larger teacher model into a smaller one, enabling it to identify the most relevant frames with minimal computational overhead. PEEK outperforms existing methods, particularly when few frames are used, and significantly reduces processing time compared to other adaptive sampling approaches. AI
IMPACT Improves efficiency of video captioning models by optimizing frame selection.