ENTITY vision-language model

vision-language model

PulseAugur coverage of vision-language model — every cluster mentioning vision-language model across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

288

288 over 90d

Releases · 30d

0 over 90d

Papers · 30d

274

274 over 90d

TIER MIX · 90D

significant 1
research 135
tool 146
commentary 6

TOPICS

paper 274
model release 118
product 77
other 63
safety 52
infra 16
opinion 1
funding 1

RELATIONSHIPS

instance of Vision-language-action model 90%
instance of Vista 90%
used by autonomous driving 80%
used by CatalyzeX 70%
instance of Vision--Language Models 70%
instance of Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 70%
instance of VSI-Bench 70%
used by DagsHub 70%
used by VSI-Bench 70%
instance of foundation model 70%
developed computed tomography 70%
used by Bifröst 70%

TIMELINE

2026-05-26 research_milestone A new self-ensembling method for vision-language models was proposed to improve chart data extraction. source
2026-05-19 research_milestone A new method is proposed to improve out-of-distribution visual document understanding in VLMs. source

SENTIMENT · 30D

25 day(s) with sentiment data

RECENT · PAGE 1/10 · 200 TOTAL

TOOL · CL_111809 · Jun 26 · 04:00

New benchmark reveals critical weaknesses in VLMs for rare medical anatomy

A new benchmark, AdversarialAnatomyBench, has been introduced to evaluate vision-language models (VLMs) on rare anatomical variants in medical imaging. Testing 25 state-of-the-art VLMs revealed a significant drop in acc…
TOOL · CL_111700 · Jun 26 · 04:00

New framework automates editable scientific figure generation

Researchers have developed SciFig, a novel multi-agent framework designed to automate the creation of editable methodology figures for scientific papers. This system addresses the common trade-off between visual quality…
TOOL · CL_110040 · Jun 25 · 04:00

AR system fARfetch boosts human-robot collaboration in outdoor tasks

Researchers have developed fARfetch, a novel augmented reality system designed to enhance human-robot collaboration in large, visually diverse outdoor environments. The system integrates shared semantic mapping for land…
TOOL · CL_109945 · Jun 25 · 04:00

New RL method trains AI to reason about geological event histories

Researchers have developed Geo-Strat-RL, a synthetic environment designed to train vision-language models (VLMs) in reasoning about geological event histories. This system uses reinforcement learning with verifiable rew…
RESEARCH · CL_111341 · Jun 25 · 02:18

New CRISP framework diagnoses VLM spatial reasoning beyond language priors

Researchers have introduced CRISP, a new evaluation framework designed to diagnose the visual spatial intelligence of Vision-Language Models (VLMs). CRISP aims to distinguish genuine spatial reasoning from language prio…
RESEARCH · CL_109666 · Jun 24 · 04:08

New benchmark audits VLM robustness in synthetic medical image detection

A new research paper introduces a benchmark for evaluating the multimodal robustness of vision-language models (VLMs) in detecting synthetic medical images. The study highlights a vulnerability where VLMs may incorrectl…
TOOL · CL_108175 · Jun 24 · 04:00

New benchmark tests VLMs on verifiable map-based mobility decisions

Researchers have introduced MapReason-OSM, a new benchmark designed to evaluate the ability of vision-language models (VLMs) to make verifiable mobility decisions from street maps. The benchmark includes over 6,000 inst…
TOOL · CL_108134 · Jun 24 · 04:00

DriveStack-VLA enhances driving models with spatial intelligence and self-critique

Researchers have introduced DriveStack-VLA, a novel framework designed to enhance the spatial intelligence of vision-language-action driving models. This system leverages a large vision-language model backbone and incor…
TOOL · CL_108119 · Jun 24 · 04:00

New SWIFT method enhances semi-supervised few-shot learning with VLMs

A new paper proposes SWIFT (Stage-Wise Finetuning with Temperatures), a method to improve semi-supervised few-shot learning (SSFSL) by leveraging open-source vision-language models (VLMs) and publicly available data. Ex…
RESEARCH · CL_108054 · Jun 24 · 04:00

Vision-Language Models Tested for Robustness, Causal Reasoning, and Visual Search

Researchers are investigating the robustness and reasoning capabilities of vision-language models (VLMs) across several dimensions. One study introduces OCR-Robust, a benchmark to evaluate VLMs' resilience to visual per…
TOOL · CL_107992 · Jun 24 · 04:00

New E-MRL framework enhances 3D tumor analysis with grounded AI reasoning

Researchers have developed a novel reinforcement learning framework called E-MRL to improve the reliability of 3D tumor analysis using Vision-Language Models (VLMs). This new approach addresses the issue of visual hallu…
RESEARCH · CL_109579 · Jun 24 · 00:06

New bilingual dataset enhances multilingual AI for hematology VQA

Researchers have developed the WBCMor VQA, a new bilingual dataset for hematology visual question answering, supporting both English and Urdu. This benchmark addresses the gap in multilingual resources for medical AI, p…
RESEARCH · CL_109874 · Jun 24 · 00:00

New framework evaluates AI video generation for physical plausibility · 3 sources tracked

Researchers have developed a new evaluation framework called Physics Question Scene Graph (PQSG) to assess the physical plausibility of videos generated by AI models. PQSG uses a hierarchical question-based approach, le…
RESEARCH · CL_109472 · Jun 23 · 23:41

New research tackles zero-shot retrieval with advanced AI frameworks · 2 sources tracked

Two new research papers explore advanced retrieval techniques for large-scale zero-shot scenarios. One paper introduces EMMETT and IRENE, frameworks designed to synthesize classifiers on-the-fly for novel items, improvi…
RESEARCH · CL_107906 · Jun 23 · 15:50

New SER method enhances Video MLLM reasoning with semantic evidence rewards · 4 sources tracked

Researchers have developed a new method called Semantic Evidence Reward (SER) to improve the spatio-temporal reasoning capabilities of Video Multimodal Large Language Models (Video MLLMs). Existing models often struggle…
RESEARCH · CL_107909 · Jun 23 · 13:35

New AI methods boost efficiency and accuracy in 3D medical imaging analysis · 7 sources tracked

Researchers are developing new methods to improve the efficiency and accuracy of vision-language models (VLMs) for 3D medical imaging. MedPruner introduces a training-free framework to prune redundant tokens in 3D medic…
RESEARCH · CL_107916 · Jun 23 · 12:57

VisCritic framework enhances GUI agents with visual state comparison

Researchers have introduced VisCritic, a novel visual process reward framework designed to enhance the performance of GUI agents. Unlike previous methods that rely solely on textual reasoning, VisCritic directly compare…
RESEARCH · CL_107758 · Jun 23 · 12:46

New RL framework uses vision-language models for GUI agent supervision

Researchers have developed a new reinforcement learning framework for Computer-Use Agents (CUAs) that leverages autonomous vision-language evaluation for supervision. This approach addresses the challenge of obtaining s…
RESEARCH · CL_107924 · Jun 23 · 11:34

P-MTP framework accelerates VLM document parsing with 5x speedup

Researchers have introduced P-MTP, a novel framework designed to significantly accelerate document parsing by Vision-Language Models (VLMs). P-MTP employs Progressive Multi-Token Prediction and a Progressive Curriculum …
RESEARCH · CL_107926 · Jun 23 · 10:59

New EgoSAT benchmark tests vision-language models on egocentric video reasoning

Researchers have introduced EgoSAT, a new benchmark designed to evaluate vision-language models (VLMs) on their ability to understand egocentric video streams. This benchmark unifies various tasks into a single streamin…

New benchmark reveals critical weaknesses in VLMs for rare medical anatomy

New framework automates editable scientific figure generation

AR system fARfetch boosts human-robot collaboration in outdoor tasks

New RL method trains AI to reason about geological event histories

New CRISP framework diagnoses VLM spatial reasoning beyond language priors

New benchmark audits VLM robustness in synthetic medical image detection

New benchmark tests VLMs on verifiable map-based mobility decisions

DriveStack-VLA enhances driving models with spatial intelligence and self-critique

New SWIFT method enhances semi-supervised few-shot learning with VLMs

Vision-Language Models Tested for Robustness, Causal Reasoning, and Visual Search

New E-MRL framework enhances 3D tumor analysis with grounded AI reasoning

New bilingual dataset enhances multilingual AI for hematology VQA

New framework evaluates AI video generation for physical plausibility · 3 sources tracked

New research tackles zero-shot retrieval with advanced AI frameworks · 2 sources tracked

New SER method enhances Video MLLM reasoning with semantic evidence rewards · 4 sources tracked

New AI methods boost efficiency and accuracy in 3D medical imaging analysis · 7 sources tracked

VisCritic framework enhances GUI agents with visual state comparison

New RL framework uses vision-language models for GUI agent supervision

P-MTP framework accelerates VLM document parsing with 5x speedup

New EgoSAT benchmark tests vision-language models on egocentric video reasoning