ENTITY Qwen2.5-VL

Qwen2.5-VL

PulseAugur coverage of Qwen2.5-VL — every cluster mentioning Qwen2.5-VL across labs, papers, and developer communities, ranked by signal.

Total · 30d

15

41 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

10

31 over 90d

TIER MIX · 90D

frontier release 1
research 18
tool 22

TOPICS

RELATIONSHIPS

SENTIMENT · 30D

9 day(s) with sentiment data

RECENT · PAGE 1/3 · 41 TOTAL

TOOL · CL_159639 · Jul 23 · 13:51

Qwen Image Edit Plus enables text editing in images via text commands

The term "jimniting" (from Gemini) has become a popular slang for editing images with text commands, particularly for altering text within an image. Qwen Image Edit Plus, an open-source model from Alibaba, is highlighte…
TOOL · CL_158320 · Jul 23 · 01:32

Visionary app streamlines AI dataset creation on macOS

Visionary is a new, locally-run macOS application designed to streamline the process of building and curating training datasets for AI models. It consolidates functionalities from multiple existing tools, offering featu…
TOOL · CL_154098 · Jul 21 · 04:00

New AI model architecture tackles cross-modal negation detection

Researchers have identified a significant challenge in current multimodal AI systems: the difficulty in detecting high-level semantic concepts like negation across different modalities. Their analysis reveals that stand…
RESEARCH · CL_147821 · Jul 16 · 17:45

New ARMOR++ framework enhances deepfake attack transferability

Researchers have developed ARMOR++, a novel multi-agent framework designed to enhance the transferability of attacks against deepfake detectors. This system utilizes the Qwen2.5-VL Vision-Language Model for semantic pri…
TOOL · CL_141801 · Jul 14 · 04:00

AutoV framework enhances LVLM performance via visual prompt retrieval

Researchers have developed AutoV, a novel framework designed to improve the performance of large vision-language models (LVLMs) by intelligently retrieving optimal visual prompts. This method addresses the limitations o…
TOOL · CL_126156 · Jul 5 · 11:26

VLMs enable open-vocabulary video scene graph generation

A new method for Video Scene Graph Generation (SGG) leverages Vision-Language Models (VLMs) to create structured, machine-readable descriptions of video content. Unlike traditional SGG methods that rely on fixed vocabul…
TOOL · CL_125025 · Jul 4 · 10:03

Fine-tuning vision-language models for high-volume invoice extraction

A technical blog post details the process of fine-tuning vision-language models for efficient invoice extraction. The author describes building an Optical Character Recognition (OCR) pipeline capable of processing over …
RESEARCH · CL_128851 · Jul 4 · 00:00

New method tracks multimodal LLM attention token-by-token

Researchers have developed a new method called "One Token at a Time" (OTaT) to analyze how multimodal large language models (MLLMs) utilize visual and textual information during response generation. This technique track…
RESEARCH · CL_128737 · Jul 2 · 00:00

New AI frameworks tackle long-form video understanding with advanced memory and reasoning

Researchers are developing advanced frameworks to improve how AI models understand and reason about long-form videos. Homer, for instance, uses a hierarchical memory system that organizes information by temporal and cau…
RESEARCH · CL_121570 · Jul 1 · 22:17

New Semi-CoT Framework Enhances LLM Reasoning with Pseudo-Supervision

Researchers have introduced Semi-CoT, a novel framework for Semi-supervised Chain-of-Thought Learning that leverages unlabeled questions to generate pseudo reasoning supervision. This method refines the self-training ap…
TOOL · CL_121209 · Jul 1 · 11:41

New framework boosts high-resolution image perception in LLMs

Researchers have introduced Hierarchical Entity Exploration (HEE), a novel framework designed to enhance high-resolution image perception in multimodal large language models (MLLMs). Unlike existing methods that require…
TOOL · CL_121184 · Jul 1 · 04:12

New framework edits facial expressions using EEG signals

Researchers have developed MindAU, a novel framework designed to edit facial action units (AUs) based on electroencephalography (EEG) signals. This system aims to translate noisy EEG data into precise, identity-preservi…
TOOL · CL_118020 · Jun 30 · 04:00

HKVLM model improves visual reasoning by separating localization from language

Researchers have developed HKVLM, a novel approach to visual reasoning that separates localization from language generation. This model utilizes a frozen language-aligned detector and a frozen language model, connected …
TOOL · CL_117625 · Jun 30 · 04:00

New ST-Merge framework boosts VLM/VLA inference speed for robotics

Researchers have developed ST-Merge, a novel framework designed to accelerate the inference speed of vision-language models (VLMs) and vision-language action models (VLAs) used in robotics. This plug-and-play, training-…
RESEARCH · CL_107765 · Jun 23 · 12:13

New methods enhance streaming video understanding with efficient memory and re-watch capabilities · 6 sources tracked

Researchers have developed new methods to improve streaming video understanding (SVU) under strict computational and memory constraints. ProtoKV, a novel memory system, aggregates older video content into a summary stat…
TOOL · CL_99209 · Jun 18 · 19:01

Vision LLM analyzes Stable Diffusion sigma schedules for improved image generation

A user has developed a novel method for improving image generation quality by integrating a vision-capable large language model (LLM) with the Stable Diffusion workflow. This approach uses an LLM, such as Gemma 3 12B or…
RESEARCH · CL_95864 · Jun 16 · 09:22

New research tackles LVLM hallucinations and improves vision-language learning

Researchers are developing new methods to improve the robustness and capabilities of large vision-language models (LVLMs). One approach, SeeMe, focuses on mitigating hallucinations by engineering visual tokens to suppre…
RESEARCH · CL_96074 · Jun 16 · 05:25

OmniDrive uses LLM agents for advanced driving video generation

Researchers have introduced OmniDrive, a novel LLM-choreographed multi-agent world model designed for generating multi-view driving videos. This system addresses challenges in integrating heterogeneous control inputs an…
TOOL · CL_93961 · Jun 16 · 04:00

New GRACE framework boosts video MLLMs for sentiment prediction

Researchers have developed GRACE, a new framework designed to improve the performance of Multimodal Large Language Models (MLLMs) in predicting viewer sentiment for video advertisements. GRACE addresses the limitations …
TOOL · CL_93414 · Jun 16 · 04:00

New DUPL method boosts multimodal reasoning in LLMs

Researchers have introduced DUPL, a novel policy learning approach designed to enhance multimodal reasoning in large language models. This method specifically addresses the challenge of distinguishing between uncertaint…