Video-LLMs
PulseAugur coverage of Video-LLMs — every cluster mentioning Video-LLMs across labs, papers, and developer communities, ranked by signal.
1 day(s) with sentiment data
-
Video-LLMs struggle with temporal information flow, researchers find
Researchers have identified a significant bottleneck in how Video Large Language Models (Video-LLMs) process temporal information, hindering their ability to understand the direction of video playback. While video-centr…
-
VTAgent improves Video TextVQA by anchoring keyframes, setting new benchmarks
Researchers have introduced VTAgent, a novel framework designed to improve video text-based visual question answering (Video TextVQA). The system addresses limitations in current Video-LLMs by focusing on the crucial ta…
-
New research grounds Video-LLMs in physical reality with adversarial curriculum
A new research paper introduces the Unified Attribution Theory, suggesting that Video-LLMs' struggles with physical reasoning stem from "Semantic Prior Dominance" rather than perceptual issues. To address this, the pape…
-
Researchers benchmark sycophancy in Video-LLMs with new VISE evaluation tool
Researchers have introduced VISE, the first benchmark designed to evaluate sycophantic behavior in video large language models (Video-LLMs). Sycophancy, where models align with user input despite contradicting visual ev…
-
EMCompress introduces novel compression for Video-LLMs, improving efficiency
Researchers have introduced EMCompress, a novel method for improving the efficiency of Video-LLMs in long-video reasoning tasks. This approach uses a cognitively-inspired technique called Endomorphic Multimodal Compress…