ENTITY vision-language model

vision-language model

PulseAugur coverage of vision-language model — every cluster mentioning vision-language model across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

176

176 over 90d

Releases · 30d

0 over 90d

Papers · 30d

171

171 over 90d

TIER MIX · 90D

significant 1
research 76
tool 96
commentary 3

TOPICS

paper 171
model release 53
other 50
product 48
safety 38
infra 6

RELATIONSHIPS

instance of Vision Language Models 90%
instance of VSI-Bench 90%
instance of MLLMs 90%
used by autonomous driving 80%
instance of foundation model 70%
instance of Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 70%
used by VSI-Bench 70%
used by foundation model 60%
affiliated with autonomous driving 50%

TIMELINE

2026-05-19 research_milestone A new method is proposed to improve out-of-distribution visual document understanding in VLMs. source

SENTIMENT · 30D

27 day(s) with sentiment data

RECENT · PAGE 2/9 · 176 TOTAL

TOOL · CL_68542 · Jun 3 · 04:00

New benchmark tests vision-language models on 3D oncology scans

Researchers have developed an automated pipeline to create a benchmark for evaluating vision-language models (VLMs) on 3D medical imaging, specifically for oncology. This pipeline generates question-answer datasets dire…
TOOL · CL_68539 · Jun 3 · 04:00

New benchmark tests AI models on road damage detection

Researchers have introduced WildRoadBench, a new benchmark designed to evaluate vision-language models (VLMs) and LLM-driven agents in identifying road damage from aerial imagery. The benchmark includes two tracks: one …
TOOL · CL_68399 · Jun 3 · 04:00

New PAND framework enhances VLM knowledge distillation for visual classification

Researchers have developed a new framework called PAND (Prompt-Aware Neighborhood Distillation) to improve the process of transferring knowledge from large Vision-Language Models (VLMs) to smaller, more efficient networ…
RESEARCH · CL_68584 · Jun 2 · 14:49

New methods boost VLM robustness against adversarial attacks

Researchers have developed new methods to improve the adversarial robustness of vision-language models (VLMs) like CLIP. SS-TPT uses stability and suitability scores to guide adaptation and inference, amplifying trustwo…
TOOL · CL_65011 · Jun 2 · 05:40

LiDAR detector latency cut by optimizing voxelization, not backbone

Researchers profiling a LiDAR object detector discovered that the voxelization and scatter-to-pillars steps, not the 3D convolutional backbone, consumed approximately 40% of the per-frame latency. By moving the voxeliza…
RESEARCH · CL_66306 · Jun 2 · 04:00

New frameworks reconstruct 3D objects from hand interaction videos

Two new research papers introduce novel frameworks for reconstructing 3D objects from egocentric videos, focusing on hand interactions. The first, ROHIT, uses a Constrained Optimisation and Propagation (COP) framework t…
TOOL · CL_66180 · Jun 2 · 04:00

New VLM reranking method boosts video retrieval performance

Researchers have developed a novel approach for video retrieval tasks, specifically for the CoVR-R challenge. Their method, termed Dual-Route Top-K Retrieval with 1v1 VLM Reranking, separates the process into finding a …
TOOL · CL_66047 · Jun 2 · 04:00

New method improves VLM zero-shot classification by addressing spurious correlations

Researchers have introduced Density-Aware Translation (DAT), a novel method to improve the zero-shot classification capabilities of Vision-Language Models (VLMs). DAT addresses the issue of spurious correlations by refi…
RESEARCH · CL_66037 · Jun 2 · 04:00

New methods boost video QA by compressing content and improving temporal reasoning

Researchers have developed new methods to improve video question answering (VQA) for long videos. One approach, MemoryCard, compresses video content into topic-aware "Memory Cards" to better capture event-level semantic…
TOOL · CL_65746 · Jun 2 · 04:00

SceneSmith generates realistic indoor scenes for robot simulation

Researchers have developed SceneSmith, a novel agentic framework designed to generate realistic indoor environments for robot training simulations. This system uses a hierarchical approach with interacting VLM agents to…
TOOL · CL_65656 · Jun 2 · 04:00

Vision Language Models Fail to Grasp Physical Transformations

A new research paper published on arXiv highlights significant limitations in current Vision Language Models (VLMs) regarding their understanding of physical transformations. The study introduced ConservationBench, a da…
TOOL · CL_65642 · Jun 2 · 04:00

VLM safety training flawed by spurious correlations, study finds

Researchers have identified a significant flaw in current safety training for vision-language models (VLMs), termed the "safety mirage." This occurs when models learn spurious correlations between superficial text patte…
TOOL · CL_65428 · Jun 2 · 04:00

New method optimizes VLM reward models using expert demonstrations

Researchers have developed a new method called Demo2Reward to optimize the language instructions used by Vision-Language Models (VLMs) as reward models in reinforcement learning. This technique leverages a small number …
RESEARCH · CL_65287 · Jun 2 · 04:00

New dataset reveals foundation models struggle with Newtonian physics

Researchers have introduced NewtPhys, a new dataset designed to evaluate how well foundation models understand Newtonian physics. This dataset uses real-world scenes with physics-grounded simulations and provides detail…
RESEARCH · CL_68351 · Jun 2 · 00:00

RobotValues benchmark highlights AI's struggle with conflicting human values

Researchers have developed a new benchmark called RobotValues to assess how household robots handle situations where human values conflict. The benchmark includes 10,000 scenarios with realistic household images, each p…
RESEARCH · CL_65397 · Jun 1 · 16:06

AI model bridges sim-to-real gap in semiconductor visual program synthesis

Researchers have developed a novel visual program synthesis framework to address the sim-to-real gap in semiconductor inspection. This approach uses a Vision-Language Model (VLM) to translate inspection images into edit…
RESEARCH · CL_66249 · Jun 1 · 13:59

Vision-language models enhance driver monitoring and attention analysis

Researchers are exploring the use of vision-language models (VLMs) to better understand driver behavior and attention. One study adapted a VLM with a new dataset of fine-grained driver activity descriptions, showing imp…
TOOL · CL_63108 · Jun 1 · 04:00

New benchmark reveals VLM spatial reasoning limitations

Researchers have introduced SSI-Bench, a new benchmark designed to evaluate the spatial intelligence of vision-language models (VLMs) in complex, constraint-governed environments. The benchmark features 1,000 ranking qu…
TOOL · CL_63103 · Jun 1 · 04:00

New benchmark reveals 60% of VLMs can infer private data

Researchers have developed MultiPriv, a new benchmark to assess the individual-level privacy reasoning capabilities of vision-language models (VLMs). The benchmark includes a bilingual multimodal dataset designed to lin…
RESEARCH · CL_63090 · Jun 1 · 04:00

New AI methods boost robot localization in cluttered indoor spaces

Researchers have developed new methods for robots to achieve robust global localization in complex, semi-static indoor environments. ShelfAware uses a semantic particle filter that treats scene semantics as statistical …

New benchmark tests vision-language models on 3D oncology scans

New benchmark tests AI models on road damage detection

New PAND framework enhances VLM knowledge distillation for visual classification

New methods boost VLM robustness against adversarial attacks

LiDAR detector latency cut by optimizing voxelization, not backbone

New frameworks reconstruct 3D objects from hand interaction videos

New VLM reranking method boosts video retrieval performance

New method improves VLM zero-shot classification by addressing spurious correlations

New methods boost video QA by compressing content and improving temporal reasoning

SceneSmith generates realistic indoor scenes for robot simulation

Vision Language Models Fail to Grasp Physical Transformations

VLM safety training flawed by spurious correlations, study finds

New method optimizes VLM reward models using expert demonstrations

New dataset reveals foundation models struggle with Newtonian physics

RobotValues benchmark highlights AI's struggle with conflicting human values

AI model bridges sim-to-real gap in semiconductor visual program synthesis

Vision-language models enhance driver monitoring and attention analysis

New benchmark reveals VLM spatial reasoning limitations

New benchmark reveals 60% of VLMs can infer private data

New AI methods boost robot localization in cluttered indoor spaces