实体 vision-language model

vision-language model

PulseAugur coverage of vision-language model — every cluster mentioning vision-language model across labs, papers, and developer communities, ranked by signal.

Show in brief

总计 · 30天

110

90 天内 110

发布 · 30天

90 天内 0

论文 · 30天

106

90 天内 106

层级分布 · 90 天

significant 1
research 42
tool 65
commentary 2

关系

instance of Vision Language Models 90%
instance of MLLMs 90%
used by VSI-Bench 70%
used by foundation model 70%
instance of foundation model 70%
instance of Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 70%

时间线

2026-05-19 research_milestone A new method is proposed to improve out-of-distribution visual document understanding in VLMs. 来源

情绪 · 30 天

16 天有情绪数据

最近 · 第 3/6 页 · 共 110 条

TOOL · CL_37996 · May 18 · 04:00

New framework exposes counting bias in Vision-Language Models

Researchers have developed CounterCount, a new framework designed to diagnose counting biases in Vision-Language Models (VLMs). The framework uses paired factual and counterfactual images to test whether VLMs rely on vi…
TOOL · CL_38011 · May 18 · 01:10

GraSP-VL method unlocks semantic granularity in vision-language embeddings

Researchers have developed GraSP-VL, a method to better utilize frozen vision-language model (VLM) embeddings by treating their length as a semantic interface. This approach learns a shared prefix transform that allows …
RESEARCH · CL_43941 · May 16 · 16:15

New benchmarks VGenST-Bench and CaST-Bench target MLLM spatio-temporal reasoning

Researchers have introduced two new benchmarks, VGenST-Bench and CaST-Bench, designed to more rigorously evaluate the spatio-temporal reasoning capabilities of Multimodal Large Language Models (MLLMs) and Vision-Languag…
TOOL · CL_36541 · May 15 · 14:42

New rubric assesses VLM adaptivity in math education

Researchers have developed a new rubric to assess the adaptivity of Vision Language Models (VLMs) in mathematics education. The rubric evaluates VLMs based on cognitive and motivational aspects, as well as response corr…
TOOL · CL_36046 · May 15 · 14:27

New framework unifies CT image analysis with language-guided reasoning

Researchers have developed a unified framework that integrates language-guided visual reasoning for CT image interpretation. This autoregressive model uses task-routing tokens to trigger detection and segmentation heads…
TOOL · CL_36058 · May 15 · 11:54

DepthVLM enables vision-language models to predict dense depth maps

Researchers have developed DepthVLM, a new framework that enables Vision-Language Models (VLMs) to predict dense metric depth maps from single images. Unlike previous methods that relied on external models or inefficien…
TOOL · CL_36564 · May 15 · 02:04

DeltaPrompts boosts VLM reasoning by targeting model capability gaps

Researchers have introduced DeltaPrompts, a new method to improve the distillation of knowledge into smaller Vision-Language Models (VLMs). They identified that many existing prompts provide minimal learning signals bec…
TOOL · CL_33402 · May 14 · 03:22

ICED framework enables concept-level unlearning in Vision-Language Models

Researchers have developed a new machine unlearning framework called ICED for Vision-Language Models (VLMs). This method allows for the precise removal of specific concepts from a VLM's knowledge without impacting unrel…
TOOL · CL_31314 · May 13 · 16:54

RoboEvolve framework boosts robotic manipulation with co-evolving AI

Researchers have developed RoboEvolve, a new framework designed to improve robotic manipulation capabilities by addressing the scarcity of training data. This system co-evolves a vision-language model planner with a vid…
COMMENTARY · CL_29648 · May 13 · 06:30

AI transforms robotics, journalism, and environmental monitoring

A new survey highlights the significant impact of vision-language models on industrial robotics, achieving a 90% task success rate in human-robot collaboration. Separately, Al Jazeera is partnering with Google Cloud to …
TOOL · CL_29263 · May 12 · 15:07

New benchmark reveals VLMs struggle with high-res Earth observation details

Researchers have introduced UHR-Micro, a new benchmark designed to evaluate Vision-Language Models (VLMs) on their ability to perceive small, critical details within ultra-high-resolution Earth observation imagery. Curr…
TOOL · CL_28149 · May 12 · 09:15

Fine-tuning VLMs hinges on strategic choices, not just training

This article argues that fine-tuning a vision-language model (VLM) is less about the technical training process and more about strategic decisions made beforehand. The author highlights four key choices that significant…
TOOL · CL_27973 · May 11 · 17:32

New model HieraCount improves object counting with multi-grained approach

Researchers have introduced a new framework for open-world object counting, addressing the brittleness of current vision-language models in accurately identifying and counting objects based on user intent. They propose …
TOOL · CL_28312 · May 11 · 17:02

New framework boosts VLM chart understanding with counterfactual data

Researchers have developed ChartCF, a new framework to improve the data efficiency of vision-language models (VLMs) used for chart understanding. This method leverages counterfactual data synthesis, where small code-con…
TOOL · CL_27979 · May 11 · 17:00

Medical VQA self-verification unreliable, study finds

A new research paper introduces a diagnostic framework called [METHOD NAME] to expose the unreliability of self-verification in medical visual question answering (VQA) systems. The study argues that current self-verific…
RESEARCH · CL_27989 · May 11 · 15:59

New UJEM-KL attack bypasses VLM safety measures with entropy maximization

Researchers have developed a new method called Untargeted Jailbreak via Entropy Maximization (UJEM-KL) to bypass safety measures in vision-language models (VLMs). This technique focuses on manipulating high-entropy toke…
TOOL · CL_27992 · May 11 · 15:54

TINS method enhances OOD detection in vision-language models

Researchers have developed TINS, a novel method for Out-of-Distribution (OOD) detection in vision-language models. TINS addresses limitations of static negative labels by learning dynamic negative semantics during test-…
TOOL · CL_28024 · May 11 · 11:47

New AI method simplifies images while keeping them photorealistic

Researchers have developed a new framework for simplifying images while maintaining photorealism, moving beyond traditional non-photorealistic rendering techniques. Their method iteratively removes and inpaints elements…
TOOL · CL_28030 · May 11 · 11:20

New SleepWalk benchmark tests AI's 3D navigation and instruction grounding

Researchers have introduced SleepWalk, a new benchmark designed to rigorously test instruction-guided vision-language navigation capabilities of AI models. This benchmark focuses on localized, interaction-centric embodi…
RESEARCH · CL_26359 · May 11 · 10:12

GPT-5 Mini leads Agentick benchmark, but no agent paradigm dominates

The new Agentick benchmark, which assesses various AI agents across 37 tasks, shows GPT-5 Mini achieving the top score of 0.309. However, no single agent paradigm, including reinforcement learning, LLM, VLM, or hybrid a…

New framework exposes counting bias in Vision-Language Models

GraSP-VL method unlocks semantic granularity in vision-language embeddings

New benchmarks VGenST-Bench and CaST-Bench target MLLM spatio-temporal reasoning

New rubric assesses VLM adaptivity in math education

New framework unifies CT image analysis with language-guided reasoning

DepthVLM enables vision-language models to predict dense depth maps

DeltaPrompts boosts VLM reasoning by targeting model capability gaps

ICED framework enables concept-level unlearning in Vision-Language Models

RoboEvolve framework boosts robotic manipulation with co-evolving AI

AI transforms robotics, journalism, and environmental monitoring

New benchmark reveals VLMs struggle with high-res Earth observation details

Fine-tuning VLMs hinges on strategic choices, not just training

New model HieraCount improves object counting with multi-grained approach

New framework boosts VLM chart understanding with counterfactual data

Medical VQA self-verification unreliable, study finds

New UJEM-KL attack bypasses VLM safety measures with entropy maximization

TINS method enhances OOD detection in vision-language models

New AI method simplifies images while keeping them photorealistic

New SleepWalk benchmark tests AI's 3D navigation and instruction grounding

GPT-5 Mini leads Agentick benchmark, but no agent paradigm dominates