ENTITY Visual Language Models

Visual Language Models

PulseAugur coverage of Visual Language Models — every cluster mentioning Visual Language Models across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

14 over 90d

Releases · 30d

0 over 90d

Papers · 30d

14 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

7 day(s) with sentiment data

RECENT · PAGE 1/1 · 14 TOTAL

TOOL · CL_121136 · Jun 30 · 20:35

New research reveals critical flaws in AI visual question-answering benchmarks

A new paper published on arXiv details significant issues with current Knowledge-Based Visual Question Answering (KB-VQA) benchmarks. The research highlights that common evaluation metrics, such as answer accuracy, are …
RESEARCH · CL_117880 · Jun 30 · 04:00

New benchmarks TSHA and CAREBench reveal LLM safety gaps

Two new benchmarks have been released to evaluate the safety capabilities of language models. TSHA focuses on assessing visual language models' ability to identify safety hazards in real-world indoor environments, using…
TOOL · CL_109908 · Jun 25 · 04:00

New benchmarks and tuning data improve VLM privacy awareness

Researchers have developed new methods to enhance the privacy awareness of Visual Language Models (VLMs). They introduced two benchmarks, PrivBench and PrivBench-H, designed to evaluate VLMs' understanding of visual pri…
RESEARCH · CL_99778 · Jun 18 · 00:00

S-Agent framework enhances VLMs for 3D spatial reasoning · 4 sources tracked

Researchers have introduced S-Agent, a novel framework designed to enhance visual language models (VLMs) for spatial reasoning in 3D environments. S-Agent integrates temporal memory and a hierarchy of spatial tools to e…
TOOL · CL_91498 · Jun 15 · 04:00

VLMs benchmarked for textile sorting, Qwen leads accuracy

Researchers have developed a digital twin-driven robotic system for automated textile sorting, integrating visual language models (VLMs) for classification and foreign object detection. The system was benchmarked using …
RESEARCH · CL_86880 · Jun 11 · 08:21

SeamEdit pipeline enables black-box VLM image editing

Researchers have introduced SeamEdit, a novel pipeline designed for semantic editing of large images using Visual-Language Models (VLMs). This training-free, model-agnostic approach treats VLMs as black-box oracles, add…
TOOL · CL_72812 · Jun 5 · 04:00

FUSAR-GPT advances SAR image interpretation with spatiotemporal features

Researchers have developed FUSAR-GPT, a novel Visual Language Model (VLM) specifically designed for Synthetic Aperture Radar (SAR) imagery. This model addresses the limitations of existing VLMs in interpreting SAR data …
RESEARCH · CL_76815 · Jun 4 · 22:19

AI Research Tackles Hallucinations in Medical Imaging and Document Analysis

Multiple research papers explore methods for detecting and mitigating hallucinations in AI systems, particularly in safety-critical applications like medical imaging and document analysis. One study proposes a cross-mod…
RESEARCH · CL_65287 · Jun 2 · 04:00

New dataset reveals foundation models struggle with Newtonian physics

Researchers have introduced NewtPhys, a new dataset designed to evaluate how well foundation models understand Newtonian physics. This dataset uses real-world scenes with physics-grounded simulations and provides detail…
TOOL · CL_59106 · May 29 · 04:00

New VLM evaluation tackles complex Ancient Greek text recognition

Researchers have developed new resources and evaluated existing visual language models (VLMs) for the complex task of text recognition in Ancient Greek critical editions. These historical texts feature intricate layout …
RESEARCH · CL_48261 · May 22 · 13:41

New DDX-TRACE benchmark evaluates VLM medical diagnostic trajectories

Researchers have introduced DDX-TRACE, a new benchmark designed to evaluate the diagnostic reasoning capabilities of Visual Language Models (VLMs) in medical contexts. Unlike existing benchmarks that focus solely on fin…
TOOL · CL_36087 · May 15 · 06:59

New VCG-Bench benchmark targets VLM diagram generation and editing

Researchers have introduced VCG-Bench, a new benchmark designed to evaluate Visual-Language Models (VLMs) on structured diagram generation and editing tasks. Current VLMs struggle with these professional workflows, ofte…
TOOL · CL_32692 · May 14 · 15:03

New framework boosts visual-language models for procedural tasks

Researchers have introduced a new framework called Chain-of-Procedure (CoP) to enhance visual-language models' ability to answer questions about procedural tasks. This framework addresses limitations in current models b…
RESEARCH · CL_08217 · Apr 28 · 06:02

New algorithm refines VLM supervision for speech-preserving facial expression manipulation

Researchers have developed a new algorithm called Personalized Cross-Modal Emotional Correlation Learning (PCMECL) to improve speech-preserving facial expression manipulation. This method addresses the challenge of limi…

New research reveals critical flaws in AI visual question-answering benchmarks

New benchmarks TSHA and CAREBench reveal LLM safety gaps

New benchmarks and tuning data improve VLM privacy awareness

S-Agent framework enhances VLMs for 3D spatial reasoning · 4 sources tracked

VLMs benchmarked for textile sorting, Qwen leads accuracy

SeamEdit pipeline enables black-box VLM image editing

FUSAR-GPT advances SAR image interpretation with spatiotemporal features

AI Research Tackles Hallucinations in Medical Imaging and Document Analysis

New dataset reveals foundation models struggle with Newtonian physics

New VLM evaluation tackles complex Ancient Greek text recognition

New DDX-TRACE benchmark evaluates VLM medical diagnostic trajectories

New VCG-Bench benchmark targets VLM diagram generation and editing

New framework boosts visual-language models for procedural tasks

New algorithm refines VLM supervision for speech-preserving facial expression manipulation