ENTITY vision-language model

vision-language model

PulseAugur coverage of vision-language model — every cluster mentioning vision-language model across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

195

195 over 90d

Releases · 30d

0 over 90d

Papers · 30d

188

188 over 90d

TIER MIX · 90D

significant 1
research 87
tool 103
commentary 4

TOPICS

paper 188
model release 61
product 57
other 52
safety 40
infra 7

RELATIONSHIPS

instance of Vision Language Models 90%
instance of VSI-Bench 90%
instance of MLLMs 90%
used by autonomous driving 80%
instance of foundation model 70%
instance of Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 70%
instance of multimodal large language model 70%
used by VSI-Bench 70%
used by foundation model 60%
affiliated with autonomous driving 50%

TIMELINE

2026-05-19 research_milestone A new method is proposed to improve out-of-distribution visual document understanding in VLMs. source

SENTIMENT · 30D

25 day(s) with sentiment data

RECENT · PAGE 10/10 · 195 TOTAL

RESEARCH · CL_06938 · Apr 28 · 04:00

Researchers unveil defenses against AR-LLM social engineering attacks

Researchers have developed two new frameworks to combat social engineering attacks that leverage augmented reality (AR) and large language models (LLMs). The first, PhySE, uses a visual language model for rapid profile …
RESEARCH · CL_06799 · Apr 28 · 04:00

AI uses hindsight to optimize financial time series advisories

Researchers have developed Hindsight Preference Optimization (HPO), a novel method for training language models to provide financial time series advisories. This technique leverages reinforcement learning principles, sp…
RESEARCH · CL_06646 · Apr 28 · 04:00

Researchers develop multimodal QUD for deeper scientific figure comprehension

Researchers have developed a new dataset and methodology called MQUD to enable Vision-Language Models (VLMs) to ask more insightful questions about scientific figures. This approach extends the linguistic theory of Ques…
RESEARCH · CL_06562 · Apr 28 · 04:00

GA2-CLIP paper introduces generic attribute anchors for VLM prompt tuning

Researchers have developed GA2-CLIP, a novel framework designed to enhance the generalization capabilities of Vision-Language Models (VLMs) in video tasks. This plug-and-play method addresses the issue of semantic space…
RESEARCH · CL_06498 · Apr 28 · 04:00

New VLM framework uses Bayesian inference for efficient expressway anomaly detection

Researchers have developed VIBES, a new framework for detecting anomalies in expressway surveillance videos. VIBES uses Vision-Language Models (VLMs) guided by Bayesian inference to efficiently identify subtle abnormal …
RESEARCH · CL_06439 · Apr 28 · 04:00

AI models offer interpretable diabetic retinopathy grading with visual and text explanations

Researchers have developed a new method for grading diabetic retinopathy (DR) that combines deep learning models with interpretable explanations. The approach uses CNN and transformer architectures, achieving a QWK scor…
RESEARCH · CL_06419 · Apr 28 · 04:00

New benchmark reveals AI models struggle with ego-motion understanding in driving

Researchers have developed EgoDyn-Bench, a new benchmark designed to evaluate how well vision-centric foundation models understand ego-motion in autonomous driving scenarios. The benchmark reveals a significant 'Percept…
RESEARCH · CL_08238 · Apr 27 · 18:59

Agentic AI faces unique challenges in remote sensing workflows

A new position paper outlines the unique technical hurdles in applying agentic AI to remote sensing tasks. It argues that standard agentic models fail due to the complex geospatial and temporal nature of Earth Observati…
RESEARCH · CL_06186 · Apr 27 · 10:45

VLMs tackle visual illusions, spatial reasoning, and evaluation benchmarks

Researchers are developing new methods to improve the robustness and reasoning capabilities of Vision-Language Models (VLMs). One approach, Structured Qualitative Inference (SQI), aims to mitigate visual illusions by en…
RESEARCH · CL_05210 · Apr 27 · 04:00

New research explores GNN interpretability and multi-graph reasoning

Researchers are exploring new methods to enhance the interpretability and utility of Graph Neural Networks (GNNs). One paper investigates the critical role of node features in graph pooling, proposing that effective poo…
RESEARCH · CL_04941 · Apr 24 · 05:30

OccDirector: Language-Guided Behavior and Interaction Generation in 4D Occupancy Space

Researchers have introduced OccDirector, a new framework designed to generate complex 4D occupancy dynamics for autonomous driving simulations based solely on natural language instructions. This system acts as a "scenar…
RESEARCH · CL_03039 · Apr 23 · 14:38

AI research explores solar profiling, institutional redesign, and surgical limitations

A new paper proposes that the AI revolution has shifted scarcity from judgment to complements like verified signal and legitimacy, necessitating institutional redesign. Another study examines university students' willin…
RESEARCH · CL_02085 · Apr 23 · 07:03

Symbolic inputs reveal representation bottlenecks in abstract visual reasoning for VLMs

A new paper investigates why vision-language models struggle with abstract visual reasoning tasks like Bongard problems. Researchers found that the primary limitation is not reasoning ability but representational capaci…
RESEARCH · CL_02944 · Apr 23 · 01:19

New frameworks enhance VLM spatial reasoning with world models and multi-agent systems

Researchers have developed World2VLM, a novel training framework that distills spatial reasoning capabilities from generative world models into vision-language models (VLMs). This approach synthesizes future views to pr…
TOOL · CL_17916 · Mar 17 · 19:46

Kita uses VLM agents to automate credit review from messy financial documents

Kita, a startup founded by Carmel and Rhea, has launched a new product designed to automate credit review for lenders in emerging markets. The system utilizes Visual Language Models (VLMs) to process diverse and often u…

Researchers unveil defenses against AR-LLM social engineering attacks

AI uses hindsight to optimize financial time series advisories

Researchers develop multimodal QUD for deeper scientific figure comprehension

GA2-CLIP paper introduces generic attribute anchors for VLM prompt tuning

New VLM framework uses Bayesian inference for efficient expressway anomaly detection

AI models offer interpretable diabetic retinopathy grading with visual and text explanations

New benchmark reveals AI models struggle with ego-motion understanding in driving

Agentic AI faces unique challenges in remote sensing workflows

VLMs tackle visual illusions, spatial reasoning, and evaluation benchmarks

New research explores GNN interpretability and multi-graph reasoning

OccDirector: Language-Guided Behavior and Interaction Generation in 4D Occupancy Space

AI research explores solar profiling, institutional redesign, and surgical limitations

Symbolic inputs reveal representation bottlenecks in abstract visual reasoning for VLMs

New frameworks enhance VLM spatial reasoning with world models and multi-agent systems

Kita uses VLM agents to automate credit review from messy financial documents