ENTITY vision-language model

vision-language model

PulseAugur coverage of vision-language model — every cluster mentioning vision-language model across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

195

195 over 90d

Releases · 30d

0 over 90d

Papers · 30d

188

188 over 90d

TIER MIX · 90D

significant 1
research 87
tool 103
commentary 4

TOPICS

paper 188
model release 61
product 57
other 52
safety 40
infra 7

RELATIONSHIPS

instance of Vision Language Models 90%
instance of VSI-Bench 90%
instance of MLLMs 90%
used by autonomous driving 80%
instance of foundation model 70%
instance of Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 70%
instance of multimodal large language model 70%
used by VSI-Bench 70%
used by foundation model 60%
affiliated with autonomous driving 50%

TIMELINE

2026-05-19 research_milestone A new method is proposed to improve out-of-distribution visual document understanding in VLMs. source

SENTIMENT · 30D

25 day(s) with sentiment data

RECENT · PAGE 7/10 · 195 TOTAL

TOOL · CL_38258 · May 18 · 15:27

New framework uses speaker-centered visuals for emotion recognition in conversations

Researchers have developed VISAFF, a novel framework for recognizing emotions in conversations by focusing on visual cues from the active speaker. This approach leverages existing Vision-Language Models without requirin…
TOOL · CL_38271 · May 18 · 14:14

Research questions latent tokens' role in vision-language reasoning

A new research paper questions the effectiveness of latent tokens in vision-language models for visual reasoning. The study found that replacing these intermediate "imagination" tokens with uninformative ones did not im…
TOOL · CL_38273 · May 18 · 13:54

New method boosts AI diagnostics in histopathology

Researchers have developed a new method called Geometry-Aware Uncertainty Coresets (GAUC) to improve the reliability of visual in-context learning in histopathology. This training-free approach optimizes the selection o…
TOOL · CL_37943 · May 18 · 10:54

SpatioRoute boosts VLM spatial reasoning with dynamic prompt routing

Researchers have developed SpatioRoute, a novel method for enhancing zero-shot spatial reasoning in Vision-Language Models (VLMs). This approach dynamically routes incoming questions to tailored prompt templates without…
RESEARCH · CL_37951 · May 18 · 10:05

New research tackles VLM spatial reasoning with geometric priors

Researchers are developing new methods to improve the spatial reasoning capabilities of Vision-Language Models (VLMs), which currently struggle with 3D understanding. Several papers propose injecting geometric priors an…
TOOL · CL_37996 · May 18 · 04:00

New framework exposes counting bias in Vision-Language Models

Researchers have developed CounterCount, a new framework designed to diagnose counting biases in Vision-Language Models (VLMs). The framework uses paired factual and counterfactual images to test whether VLMs rely on vi…
TOOL · CL_38011 · May 18 · 01:10

GraSP-VL method unlocks semantic granularity in vision-language embeddings

Researchers have developed GraSP-VL, a method to better utilize frozen vision-language model (VLM) embeddings by treating their length as a semantic interface. This approach learns a shared prefix transform that allows …
RESEARCH · CL_43941 · May 16 · 16:15

New architectures enable real-time video understanding

Researchers are developing new methods for real-time video understanding, moving beyond traditional offline analysis. Several papers propose architectures that decouple visual perception from language generation to impr…
TOOL · CL_36541 · May 15 · 14:42

New rubric assesses VLM adaptivity in math education

Researchers have developed a new rubric to assess the adaptivity of Vision Language Models (VLMs) in mathematics education. The rubric evaluates VLMs based on cognitive and motivational aspects, as well as response corr…
TOOL · CL_36046 · May 15 · 14:27

New framework unifies CT image analysis with language-guided reasoning

Researchers have developed a unified framework that integrates language-guided visual reasoning for CT image interpretation. This autoregressive model uses task-routing tokens to trigger detection and segmentation heads…
TOOL · CL_36058 · May 15 · 11:54

DepthVLM enables vision-language models to predict dense depth maps

Researchers have developed DepthVLM, a new framework that enables Vision-Language Models (VLMs) to predict dense metric depth maps from single images. Unlike previous methods that relied on external models or inefficien…
TOOL · CL_36564 · May 15 · 02:04

DeltaPrompts boosts VLM reasoning by targeting model capability gaps

Researchers have introduced DeltaPrompts, a new method to improve the distillation of knowledge into smaller Vision-Language Models (VLMs). They identified that many existing prompts provide minimal learning signals bec…
TOOL · CL_33402 · May 14 · 03:22

ICED framework enables concept-level unlearning in Vision-Language Models

Researchers have developed a new machine unlearning framework called ICED for Vision-Language Models (VLMs). This method allows for the precise removal of specific concepts from a VLM's knowledge without impacting unrel…
TOOL · CL_31314 · May 13 · 16:54

RoboEvolve framework boosts robotic manipulation with co-evolving AI

Researchers have developed RoboEvolve, a new framework designed to improve robotic manipulation capabilities by addressing the scarcity of training data. This system co-evolves a vision-language model planner with a vid…
COMMENTARY · CL_29648 · May 13 · 06:30

AI transforms robotics, journalism, and environmental monitoring

A new survey highlights the significant impact of vision-language models on industrial robotics, achieving a 90% task success rate in human-robot collaboration. Separately, Al Jazeera is partnering with Google Cloud to …
TOOL · CL_29263 · May 12 · 15:07

New benchmark reveals VLMs struggle with high-res Earth observation details

Researchers have introduced UHR-Micro, a new benchmark designed to evaluate Vision-Language Models (VLMs) on their ability to perceive small, critical details within ultra-high-resolution Earth observation imagery. Curr…
TOOL · CL_28149 · May 12 · 09:15

Fine-tuning VLMs hinges on strategic choices, not just training

This article argues that fine-tuning a vision-language model (VLM) is less about the technical training process and more about strategic decisions made beforehand. The author highlights four key choices that significant…
TOOL · CL_27973 · May 11 · 17:32

New model HieraCount improves object counting with multi-grained approach

Researchers have introduced a new framework for open-world object counting, addressing the brittleness of current vision-language models in accurately identifying and counting objects based on user intent. They propose …
TOOL · CL_28312 · May 11 · 17:02

New framework boosts VLM chart understanding with counterfactual data

Researchers have developed ChartCF, a new framework to improve the data efficiency of vision-language models (VLMs) used for chart understanding. This method leverages counterfactual data synthesis, where small code-con…
TOOL · CL_27979 · May 11 · 17:00

Medical VQA self-verification unreliable, study finds

A new research paper introduces a diagnostic framework called [METHOD NAME] to expose the unreliability of self-verification in medical visual question answering (VQA) systems. The study argues that current self-verific…

New framework uses speaker-centered visuals for emotion recognition in conversations

Research questions latent tokens' role in vision-language reasoning

New method boosts AI diagnostics in histopathology

SpatioRoute boosts VLM spatial reasoning with dynamic prompt routing

New research tackles VLM spatial reasoning with geometric priors

New framework exposes counting bias in Vision-Language Models

GraSP-VL method unlocks semantic granularity in vision-language embeddings

New architectures enable real-time video understanding

New rubric assesses VLM adaptivity in math education

New framework unifies CT image analysis with language-guided reasoning

DepthVLM enables vision-language models to predict dense depth maps

DeltaPrompts boosts VLM reasoning by targeting model capability gaps

ICED framework enables concept-level unlearning in Vision-Language Models

RoboEvolve framework boosts robotic manipulation with co-evolving AI

AI transforms robotics, journalism, and environmental monitoring

New benchmark reveals VLMs struggle with high-res Earth observation details

Fine-tuning VLMs hinges on strategic choices, not just training

New model HieraCount improves object counting with multi-grained approach

New framework boosts VLM chart understanding with counterfactual data

Medical VQA self-verification unreliable, study finds