Qwen2.5-VL-72B
PulseAugur coverage of Qwen2.5-VL-72B — every cluster mentioning Qwen2.5-VL-72B across labs, papers, and developer communities, ranked by signal.
2 day(s) with sentiment data
-
OmniAgent uses active perception for efficient video understanding · 2 sources tracked
Researchers have introduced OmniAgent, a novel omni-modal agent designed for video understanding that utilizes an iterative Observation-Thought-Action cycle based on Partially Observable Markov Decision Processes (POMDP…
-
New CapRL++ framework trains better image and video captioning models
Researchers have developed CapRL++, a novel framework for training image and video captioning models using reinforcement learning with verifiable rewards. This approach moves beyond traditional supervised fine-tuning by…
-
CodePercept boosts LLM visual perception using code, not just reasoning
Researchers from Shanghai Jiao Tong University and the Qwen team have introduced CodePercept, a novel approach to enhance large language models' visual perception capabilities, particularly for STEM tasks. Their researc…
-
WALDO framework improves VLM-based medical imaging anomaly detection
Researchers have developed WALDO, a novel framework for anomaly localization in medical imaging using vision-language models (VLMs). This method reformulates the problem as a comparative inference task, identifying anom…