ENTITY vision-language model

vision-language model

PulseAugur coverage of vision-language model — every cluster mentioning vision-language model across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

195

195 over 90d

Releases · 30d

0 over 90d

Papers · 30d

188

188 over 90d

TIER MIX · 90D

significant 1
research 87
tool 103
commentary 4

TOPICS

paper 188
model release 61
product 57
other 52
safety 40
infra 7

RELATIONSHIPS

instance of Vision Language Models 90%
instance of VSI-Bench 90%
instance of MLLMs 90%
used by autonomous driving 80%
instance of foundation model 70%
instance of Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 70%
instance of multimodal large language model 70%
used by VSI-Bench 70%
used by foundation model 60%
affiliated with autonomous driving 50%

TIMELINE

2026-05-19 research_milestone A new method is proposed to improve out-of-distribution visual document understanding in VLMs. source

SENTIMENT · 30D

25 day(s) with sentiment data

RECENT · PAGE 8/10 · 195 TOTAL

RESEARCH · CL_27989 · May 11 · 15:59

New UJEM-KL attack bypasses VLM safety measures with entropy maximization

Researchers have developed a new method called Untargeted Jailbreak via Entropy Maximization (UJEM-KL) to bypass safety measures in vision-language models (VLMs). This technique focuses on manipulating high-entropy toke…
TOOL · CL_27992 · May 11 · 15:54

TINS method enhances OOD detection in vision-language models

Researchers have developed TINS, a novel method for Out-of-Distribution (OOD) detection in vision-language models. TINS addresses limitations of static negative labels by learning dynamic negative semantics during test-…
TOOL · CL_28024 · May 11 · 11:47

New AI method simplifies images while keeping them photorealistic

Researchers have developed a new framework for simplifying images while maintaining photorealism, moving beyond traditional non-photorealistic rendering techniques. Their method iteratively removes and inpaints elements…
TOOL · CL_28030 · May 11 · 11:20

New SleepWalk benchmark tests AI's 3D navigation and instruction grounding

Researchers have introduced SleepWalk, a new benchmark designed to rigorously test instruction-guided vision-language navigation capabilities of AI models. This benchmark focuses on localized, interaction-centric embodi…
RESEARCH · CL_26359 · May 11 · 10:12

GPT-5 Mini leads Agentick benchmark, but no agent paradigm dominates

The new Agentick benchmark, which assesses various AI agents across 37 tasks, shows GPT-5 Mini achieving the top score of 0.309. However, no single agent paradigm, including reinforcement learning, LLM, VLM, or hybrid a…
TOOL · CL_25598 · May 8 · 08:53

New SAEgis framework detects adversarial attacks on vision-language models

Researchers have developed a new framework called SAEgis to detect adversarial attacks on vision-language models (VLMs). This method utilizes sparse autoencoders (SAEs) as a plug-and-play module, requiring no additional…
TOOL · CL_22401 · May 8 · 04:00

ChartZero uses synthetic data to extract chart data without real-world annotation

Researchers have developed ChartZero, a novel framework designed to extract data from line charts with zero-shot capabilities. This approach bypasses the need for real-world annotations by training exclusively on synthe…
TOOL · CL_22124 · May 8 · 04:00

CompART training improves VLM multi-object grounding and visual understanding

Researchers have developed a new training method called Compositional Attention-Regularized Training (CompART) to improve how Vision-Language Models (VLMs) handle complex, multi-object references. Current VLMs struggle …
RESEARCH · CL_21791 · May 7 · 16:01

GeoStack framework enables efficient VLM knowledge composition, preventing catastrophic forgetting.

Researchers have developed GeoStack, a novel framework designed to enhance knowledge composition in Vision-Language Models (VLMs). This approach addresses the issue of catastrophic forgetting, where models lose previous…
TOOL · CL_20775 · May 7 · 04:00

Consensus Entropy improves VLM OCR accuracy by measuring inter-model agreement

Researchers have developed a new metric called Consensus Entropy (CE) to assess the reliability of Optical Character Recognition (OCR) outputs from Vision-Language Models (VLMs). CE measures the agreement between multip…
TOOL · CL_20754 · May 7 · 04:00

Researchers propose new framework for generative recommendation systems

Researchers have developed a new framework to improve the generation of Semantic IDs (SIDs) for generative recommendation systems. This approach addresses issues of information and semantic degradation by integrating de…
RESEARCH · CL_20275 · May 6 · 17:33

PhysForge generates physics-grounded 3D assets for virtual worlds and embodied AI

Researchers have introduced PhysForge, a novel framework designed to generate physics-grounded 3D assets for interactive virtual worlds and embodied AI. This system addresses the limitations of existing methods by focus…
RESEARCH · CL_20307 · May 6 · 06:57

New AI models InterMesh and Anny-Fit advance 3D human pose and shape recovery

Researchers have developed InterMesh, a new framework for multi-person human mesh recovery that explicitly incorporates human-environment interaction information. This approach enhances pose and shape estimation by enri…
TOOL · CL_18874 · May 6 · 04:00

VLM pipeline enables viewpoint-agnostic grasping for robots with partial observations

Researchers have developed a new end-to-end pipeline for language-guided grasping that enhances the robustness of mobile manipulators in cluttered environments. This system uses visual-language models (VLMs) and partial…
RESEARCH · CL_18576 · May 6 · 04:00

Researchers unveil new stealthy backdoor attacks on AI models using diffusion and style features

Researchers have developed new methods for backdoor attacks on advanced AI models, specifically targeting Vision-Language Models (VLMs) and Diffusion Models (DMs). One approach, CBV, uses diffusion models to create natu…
RESEARCH · CL_18299 · May 5 · 14:08

New GLANCE framework enhances VLM agents with curiosity-driven visual-linguistic exploration

Researchers have developed a new framework called GLANCE to enhance the exploration capabilities of Visual-Linguistic Model (VLM) agents. This framework aims to improve how these agents navigate complex and partially ob…
TOOL · CL_15782 · May 5 · 04:00

New benchmark reveals video models forget long-term context

Researchers have introduced SceneBench, a new benchmark designed to evaluate video understanding models' ability to retain context over long videos, particularly across different scenes. Their findings indicate that cur…
TOOL · CL_15622 · May 5 · 04:00

VISTA benchmark launched for advanced VLM spatio-temporal interaction analysis

Researchers have introduced VISTA, a new benchmark designed to evaluate the spatio-temporal understanding capabilities of Vision-Language Models (VLMs). Unlike existing benchmarks that focus on simple actions and limite…
TOOL · CL_15616 · May 5 · 04:00

Researchers propose Gromov-Wasserstein distance for VLM vision encoder selection

Researchers have developed a new method for selecting optimal vision encoders for Vision-Language Models (VLMs). Traditional approaches, like choosing encoders with high accuracy or large size, were found to be ineffect…
TOOL · CL_15611 · May 5 · 04:00

Chain of Evidence framework enables pixel-level visual attribution for retrieval-augmented generation

Researchers have developed a new framework called Chain of Evidence (CoE) to improve iterative retrieval-augmented generation (iRAG) systems. CoE utilizes Vision-Language Models to directly analyze screenshots of retrie…

New UJEM-KL attack bypasses VLM safety measures with entropy maximization

TINS method enhances OOD detection in vision-language models

New AI method simplifies images while keeping them photorealistic

New SleepWalk benchmark tests AI's 3D navigation and instruction grounding

GPT-5 Mini leads Agentick benchmark, but no agent paradigm dominates

New SAEgis framework detects adversarial attacks on vision-language models

ChartZero uses synthetic data to extract chart data without real-world annotation

CompART training improves VLM multi-object grounding and visual understanding

GeoStack framework enables efficient VLM knowledge composition, preventing catastrophic forgetting.

Consensus Entropy improves VLM OCR accuracy by measuring inter-model agreement

Researchers propose new framework for generative recommendation systems

PhysForge generates physics-grounded 3D assets for virtual worlds and embodied AI

New AI models InterMesh and Anny-Fit advance 3D human pose and shape recovery

VLM pipeline enables viewpoint-agnostic grasping for robots with partial observations

Researchers unveil new stealthy backdoor attacks on AI models using diffusion and style features

New GLANCE framework enhances VLM agents with curiosity-driven visual-linguistic exploration

New benchmark reveals video models forget long-term context

VISTA benchmark launched for advanced VLM spatio-temporal interaction analysis

Researchers propose Gromov-Wasserstein distance for VLM vision encoder selection

Chain of Evidence framework enables pixel-level visual attribution for retrieval-augmented generation