实体 vision-language model

vision-language model

PulseAugur coverage of vision-language model — every cluster mentioning vision-language model across labs, papers, and developer communities, ranked by signal.

Show in brief

总计 · 30天

110

90 天内 110

发布 · 30天

90 天内 0

论文 · 30天

106

90 天内 106

层级分布 · 90 天

significant 1
research 42
tool 65
commentary 2

关系

instance of Vision Language Models 90%
instance of MLLMs 90%
used by VSI-Bench 70%
used by foundation model 70%
instance of foundation model 70%
instance of Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 70%

时间线

2026-05-19 research_milestone A new method is proposed to improve out-of-distribution visual document understanding in VLMs. 来源

情绪 · 30 天

16 天有情绪数据

最近 · 第 2/6 页 · 共 110 条

RESEARCH · CL_44004 · May 21 · 00:00

New frameworks tackle faithfulness in multimodal AI reasoning

Researchers have developed Faithful-MR1, a new training framework designed to improve the faithfulness of multimodal reasoning in large language models. This framework addresses the challenge of accurately perceiving an…
RESEARCH · CL_42458 · May 20 · 17:32

New benchmark reveals vision-language models struggle with temporal glitches

Researchers have introduced TempGlitch, a new benchmark designed to evaluate how well vision-language models (VLMs) can detect temporal glitches in gameplay videos. Unlike previous methods that focused on static visual …
RESEARCH · CL_41847 · May 20 · 13:14

AI research advances autonomous driving safety with new RL frameworks

Two new research papers explore advanced reinforcement learning techniques for safer autonomous driving. The first paper introduces a multi-agent reinforcement learning (MARL) approach where self-driving cars and pedest…
TOOL · CL_41913 · May 20 · 06:42

New dataset reveals semantic loss in VLM-based video editing

Researchers have developed a new diagnostic dataset and protocol called TRACE-Edit to evaluate how well semantic information is preserved when Vision-Language Models (VLMs) are used for video editing. Their findings ind…
TOOL · CL_41824 · May 20 · 05:46

Draw2Think framework enhances geometric reasoning in vision-language models

Researchers have developed Draw2Think, a new framework that enhances geometric reasoning in vision-language models by interacting with the GeoGebra constraint engine. This system uses a Propose-Draw-Verify loop to exter…
RESEARCH · CL_41927 · May 20 · 03:44

New VQA benchmarks and methods tackle knowledge, adaptation, and grounding

Researchers have introduced several new benchmarks and methods for Visual Question Answering (VQA) systems. HyLoVQA proposes a dynamic hypernetwork-generated low-rank adaptation technique for continual VQA, improving ad…
RESEARCH · CL_45018 · May 20 · 00:00

AutoRubric-T2I learns interpretable VLM rubrics with minimal data

Researchers have developed AutoRubric-T2I, a novel framework for text-to-image generation that automatically creates and refines explicit rubrics. These rubrics guide Vision-Language Models (VLMs) in evaluating image qu…
RESEARCH · CL_40912 · May 19 · 13:58

New method enhances VLM document layout understanding

Researchers have developed a new method to improve how Vision-Language Models (VLMs) understand document layouts, particularly for documents with structures not seen during training. The approach pre-resolves layout inf…
RESEARCH · CL_40914 · May 19 · 13:50

New research benchmarks and enhances VLM gaze understanding

Researchers have developed new methods to evaluate and improve how vision-language models (VLMs) understand human gaze. One study introduces EyeVLM, a framework to benchmark VLMs on gaze following and social gaze predic…
RESEARCH · CL_40787 · May 19 · 13:40

New FineBench benchmark highlights VLM struggles with human activity

Researchers have introduced FineBench, a new benchmark designed to evaluate the fine-grained human activity understanding capabilities of vision-language models (VLMs). The benchmark includes nearly 200,000 question-ans…
TOOL · CL_40940 · May 19 · 09:53

Vision-Language Models Enhance Cross-Camera Color Constancy

Researchers have developed a new framework called VLM-CC to improve cross-camera color constancy in images. This method iteratively refines color balance by using a vision-language model (VLM) to provide feedback on ima…
TOOL · CL_40822 · May 19 · 08:24

Cross-modal skill injection enhances VLM capabilities efficiently

Researchers have explored a technique called cross-modal skill injection to efficiently transfer domain-specific expertise from large language models (LLMs) to vision-language models (VLMs). This method aims to induce n…
TOOL · CL_38811 · May 18 · 17:54

New framework enhances identity tracking in long video generation

Researchers have developed IAMFlow, a novel framework designed to improve the consistency and identity tracking in long video generation. This training-free method explicitly models and follows persistent entities acros…
RESEARCH · CL_38247 · May 18 · 16:21

CATA method enables continual machine unlearning for vision-language models

Researchers have introduced CATA, a novel method for continual machine unlearning in vision-language models (VLMs). This approach addresses the challenges of sequentially removing specific data from VLMs while preservin…
TOOL · CL_38817 · May 18 · 16:13

New training method combats 'lazy perception' in vision-language models

Researchers have introduced a new training paradigm called "Starve to Perceive" to address the issue of "lazy perception" in Vision-Language Models (VLMs). This phenomenon occurs when VLMs can achieve adequate accuracy …
TOOL · CL_38258 · May 18 · 15:27

New framework uses speaker-centered visuals for emotion recognition in conversations

Researchers have developed VISAFF, a novel framework for recognizing emotions in conversations by focusing on visual cues from the active speaker. This approach leverages existing Vision-Language Models without requirin…
TOOL · CL_38271 · May 18 · 14:14

Research questions latent tokens' role in vision-language reasoning

A new research paper questions the effectiveness of latent tokens in vision-language models for visual reasoning. The study found that replacing these intermediate "imagination" tokens with uninformative ones did not im…
TOOL · CL_38273 · May 18 · 13:54

New method boosts AI diagnostics in histopathology

Researchers have developed a new method called Geometry-Aware Uncertainty Coresets (GAUC) to improve the reliability of visual in-context learning in histopathology. This training-free approach optimizes the selection o…
TOOL · CL_37943 · May 18 · 10:54

SpatioRoute boosts VLM spatial reasoning with dynamic prompt routing

Researchers have developed SpatioRoute, a novel method for enhancing zero-shot spatial reasoning in Vision-Language Models (VLMs). This approach dynamically routes incoming questions to tailored prompt templates without…
RESEARCH · CL_37951 · May 18 · 10:05

New benchmarks test VLM spatial reasoning, robustness, and consistency

Researchers have developed new benchmarks to evaluate the spatial reasoning capabilities of vision-language models (VLMs). ArchSIBench focuses on architectural space understanding, while Flat-Pack Bench assesses spatio-…

New frameworks tackle faithfulness in multimodal AI reasoning

New benchmark reveals vision-language models struggle with temporal glitches

AI research advances autonomous driving safety with new RL frameworks

New dataset reveals semantic loss in VLM-based video editing

Draw2Think framework enhances geometric reasoning in vision-language models

New VQA benchmarks and methods tackle knowledge, adaptation, and grounding

AutoRubric-T2I learns interpretable VLM rubrics with minimal data

New method enhances VLM document layout understanding

New research benchmarks and enhances VLM gaze understanding

New FineBench benchmark highlights VLM struggles with human activity

Vision-Language Models Enhance Cross-Camera Color Constancy

Cross-modal skill injection enhances VLM capabilities efficiently

New framework enhances identity tracking in long video generation

CATA method enables continual machine unlearning for vision-language models

New training method combats 'lazy perception' in vision-language models

New framework uses speaker-centered visuals for emotion recognition in conversations

Research questions latent tokens' role in vision-language reasoning

New method boosts AI diagnostics in histopathology

SpatioRoute boosts VLM spatial reasoning with dynamic prompt routing

New benchmarks test VLM spatial reasoning, robustness, and consistency