ENTITY Vision--Language Models

Vision--Language Models

PulseAugur coverage of Vision--Language Models — every cluster mentioning Vision--Language Models across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

84 over 90d

Releases · 30d

0 over 90d

Papers · 30d

82 over 90d

TIER MIX · 90D

research 45
tool 38
commentary 1

TOPICS

SENTIMENT · 30D

20 day(s) with sentiment data

RECENT · PAGE 1/5 · 84 TOTAL

TOOL · CL_112872 · Jun 26 · 20:30

Databricks enables searchable video intelligence with VLMs and GPUs

Databricks has developed a new approach to video analysis, treating it as a data engineering problem to make video content searchable and actionable. Their system utilizes Vision Language Models (VLMs) and serverless GP…
TOOL · CL_110055 · Jun 25 · 04:00

New AMVICC benchmark reveals shared failure modes in vision-language and image generation models

Researchers have developed AMVICC, a new benchmark designed to identify and profile failure modes in vision-language models (VLMs) and image generation models (IGMs). The benchmark systematically compares how these mode…
TOOL · CL_109912 · Jun 25 · 04:00

SPARC framework decouples VLM perception and reasoning for enhanced scaling

Researchers have developed SPARC, a novel framework designed to enhance the performance and scalability of vision-language models (VLMs). SPARC separates visual perception from reasoning, allowing for dynamic scaling of…
RESEARCH · CL_109666 · Jun 24 · 04:08

New benchmark audits VLM robustness in synthetic medical image detection

A new research paper introduces a benchmark for evaluating the multimodal robustness of vision-language models (VLMs) in detecting synthetic medical images. The study highlights a vulnerability where VLMs may incorrectl…
TOOL · CL_108130 · Jun 24 · 04:00

New REALM benchmark unifies VLM red-teaming for physical-world safety

Researchers have introduced REALM, a novel benchmark designed to evaluate the vulnerabilities of physical-world Vision-Language Models (VLMs). This benchmark unifies 12 red-teaming methods, 3 defenses, and 13 VLMs under…
TOOL · CL_107981 · Jun 24 · 04:00

New PV-TAM method improves vision-language model evaluation

Researchers have developed a new method called Prompt-Vision Token Activation Map (PV-TAM) to more accurately assess the vision-language consistency in large visual-language models (VLMs). Traditional methods often rely…
RESEARCH · CL_107839 · Jun 23 · 09:43

New OVBS framework enhances autonomous driving perception with VLMs

Researchers have developed OVBEVSeg, a novel framework for open-vocabulary Bird's-Eye View (BEV) segmentation in autonomous driving. This system leverages vision-language models (VLMs) to recognize objects beyond its tr…
RESEARCH · CL_107930 · Jun 23 · 09:20

New VLM evaluation method reveals poor evidence use in large models

A new research paper introduces "Ill-Posed by Design," a novel method for evaluating how Vision-Language Models (VLMs) utilize evidence. The study proposes using monocular metric object-size estimation as an ill-posed t…
RESEARCH · CL_104739 · Jun 20 · 16:53

New benchmarks tackle hallucination in GI endoscopy AI models

Researchers have developed new benchmarks and datasets to address hallucination issues in vision-language models (VLMs) used for gastrointestinal endoscopy. One study introduces a benchmark using the Gut-VLM dataset to …
RESEARCH · CL_99768 · Jun 18 · 17:59

TimeProVe framework enhances long video temporal reasoning with efficient verification

Researchers have developed TimeProVe, a novel framework designed to improve the efficiency of temporal reasoning in long videos. This approach uses lightweight modules to propose potential answers and evidence, only eng…
RESEARCH · CL_99577 · Jun 18 · 13:56

New SPOT-E method enhances frozen vision-language models with visual spotlights

Researchers have developed SPOT-E, a novel test-time method designed to improve the performance of frozen vision-language models (VLMs) on evidence-intensive tasks. SPOT-E addresses the issue of VLMs overlooking crucial…
RESEARCH · CL_99806 · Jun 18 · 09:24

New framework uses vision-language models for occlusion removal in light fields

Researchers have developed a novel framework for occlusion removal in light fields, combining light field integration (LFI) with vision-language models (VLMs). This approach first uses LFI to enhance visibility by suppr…
TOOL · CL_105978 · Jun 18 · 00:00

Humans and VLMs show similar driving generalization across cities

A new research paper, "Robusto-2: Benchmarking Humans & VLMs for Autonomous Driving in Lima & New York City," investigates how well visual language models (VLMs) and human drivers generalize to new geographic locations …
RESEARCH · CL_104715 · Jun 18 · 00:00

New methods enhance AI model adaptation robustness against adversarial attacks and data shifts · 6 sources tracked

Researchers have developed new methods to improve the robustness of test-time adaptation (TTA) for machine learning models, particularly in scenarios with adversarial attacks and evolving data distributions. One approac…
RESEARCH · CL_99778 · Jun 18 · 00:00

S-Agent framework enhances VLMs for 3D spatial reasoning · 4 sources tracked

Researchers have introduced S-Agent, a novel framework designed to enhance visual language models (VLMs) for spatial reasoning in 3D environments. S-Agent integrates temporal memory and a hierarchy of spatial tools to e…
RESEARCH · CL_97665 · Jun 17 · 04:00

New AI Agent and Dataset Enhance Landslide Analysis

Researchers have developed LandslideAgent, an AI framework designed for autonomous landslide identification and analysis. This system utilizes LandslideBench, a new multimodal dataset, and LandslideVLM, a specialized vi…
RESEARCH · CL_97670 · Jun 17 · 01:26

New APT method enhances VLM understanding of physical causality in videos

Researchers have introduced Atomic Physical Transitions (APTs) as a novel method for improving causal video-language understanding in Vision--Language Models (VLMs). Current VLMs struggle to grasp the underlying physics…
TOOL · CL_93978 · Jun 16 · 04:00

New framework Uni-Plan uses multimodal models for enhanced AI decision-making

Researchers have introduced Uni-Plan, a novel planning framework that leverages unified multimodal models (UMMs) for enhanced decision-making. Unlike previous methods that rely solely on language-based reasoning, Uni-Pl…
TOOL · CL_93916 · Jun 16 · 04:00

New Transformer Model Enhances 3D Scene Graph Generation

Researchers have developed SGFormer++, a novel Semantic Graph Transformer designed for incremental 3D scene graph generation. This model utilizes Transformer layers for global message passing, overcoming limitations of …
TOOL · CL_93150 · Jun 16 · 04:00

New STRIDE framework enhances LLM reasoning with verifiable rewards

Researchers have introduced STRIDE, a novel framework for Reinforcement Learning with Verifiable Rewards (RLVR) designed to enhance the reasoning capabilities of large language models. Unlike previous methods that rely …

Databricks enables searchable video intelligence with VLMs and GPUs

New AMVICC benchmark reveals shared failure modes in vision-language and image generation models

SPARC framework decouples VLM perception and reasoning for enhanced scaling

New benchmark audits VLM robustness in synthetic medical image detection

New REALM benchmark unifies VLM red-teaming for physical-world safety

New PV-TAM method improves vision-language model evaluation

New OVBS framework enhances autonomous driving perception with VLMs

New VLM evaluation method reveals poor evidence use in large models

New benchmarks tackle hallucination in GI endoscopy AI models

TimeProVe framework enhances long video temporal reasoning with efficient verification

New SPOT-E method enhances frozen vision-language models with visual spotlights

New framework uses vision-language models for occlusion removal in light fields

Humans and VLMs show similar driving generalization across cities

New methods enhance AI model adaptation robustness against adversarial attacks and data shifts · 6 sources tracked

S-Agent framework enhances VLMs for 3D spatial reasoning · 4 sources tracked

New AI Agent and Dataset Enhance Landslide Analysis

New APT method enhances VLM understanding of physical causality in videos

New framework Uni-Plan uses multimodal models for enhanced AI decision-making

New Transformer Model Enhances 3D Scene Graph Generation

New STRIDE framework enhances LLM reasoning with verifiable rewards