ENTITY Large Vision Language Models

Large Vision Language Models

PulseAugur coverage of Large Vision Language Models — every cluster mentioning Large Vision Language Models across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

35 over 90d

Releases · 30d

0 over 90d

Papers · 30d

35 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

11 day(s) with sentiment data

RECENT · PAGE 1/2 · 35 TOTAL

RESEARCH · CL_111283 · Jun 25 · 15:50

New HarmVideoBench evaluates LLMs on nuanced harmful video understanding · 2 sources tracked

Researchers have introduced HarmVideoBench, a new benchmark designed to evaluate the harmful video understanding capabilities of large vision-language models (LVLMs). Existing benchmarks often oversimplify harmful conte…
TOOL · CL_105159 · Jun 22 · 11:51

New CFPO framework enhances multimodal reasoning in LVLMs

Researchers have introduced CounterFactual Policy Optimization (CFPO), a new framework designed to improve multimodal reasoning in Large Vision-Language Models (LVLMs). CFPO addresses grounding failures and hallucinatio…
TOOL · CL_100163 · Jun 19 · 04:00

New Med-R2 strategy enhances AI medical report generation accuracy

Researchers have introduced Med-R2, a novel fine-tuning strategy designed to improve automated medical report generation (MRG) using large vision-language models (LVLMs). This approach addresses limitations in current m…
RESEARCH · CL_95864 · Jun 16 · 09:22

New research tackles VLM hallucinations, distillation, and interpretability

Researchers are developing new methods to improve the capabilities and reliability of vision-language models (VLMs). One approach, DCLA, focuses on mitigating hallucinations by ensuring consistency across different laye…
TOOL · CL_93484 · Jun 16 · 04:00

New RL framework enhances LVLM image captioning by minimizing information loss

Researchers have developed a new reinforcement learning framework called Cross-modal Identity Mapping (CIM) to improve image captioning in Large Vision-Language Models (LVLMs). CIM quantifies information loss by measuri…
TOOL · CL_93476 · Jun 16 · 04:00

New MAD-RAG method tackles Attention Distraction in LVLMs

Researchers have identified a new failure mode in retrieval-augmented large vision-language models (LVLMs) called Attention Distraction (AD). This occurs when highly relevant retrieved text globally suppresses visual at…
RESEARCH · CL_95875 · Jun 16 · 03:06

New MODE-RAG system tackles hallucinations in multimodal AI generation

Researchers have introduced MODE-RAG, a novel multi-agent system designed to combat hallucinations and fabrications in Multimodal Retrieval-Augmented Generation (M-RAG) systems. The system utilizes Variational Free Ener…
RESEARCH · CL_93074 · Jun 15 · 13:02

New method tackles vision-language model hallucinations with evidence acquisition

Researchers have developed a new method called Budgeted Conformal Evidence Acquisition (BCEA) to address hallucinations in large vision-language models (LVLMs). Traditional methods that require abstaining from predictio…
RESEARCH · CL_91209 · Jun 12 · 17:54

New CORA method bridges thinking-answer gap in multimodal AI

Researchers have introduced CORA, a new method to address the thinking-answer inconsistency in multimodal large vision-language models (LVLMs). This inconsistency, where the reasoning process does not align semantically…
RESEARCH · CL_79677 · Jun 8 · 12:09

New CapRL++ framework trains better image and video captioning models

Researchers have developed CapRL++, a novel framework for training image and video captioning models using reinforcement learning with verifiable rewards. This approach moves beyond traditional supervised fine-tuning by…
RESEARCH · CL_70556 · Jun 3 · 07:27

New Impostor benchmark dataset challenges AI image manipulation detection

Researchers have introduced Impostor, a new benchmark dataset designed to improve the detection and localization of AI-generated image manipulations. This dataset comprises 100,000 manipulated images created using a clo…
TOOL · CL_68554 · Jun 3 · 04:00

New framework tests LVLMs' visual reasoning vs. factual recall

Researchers have developed a new framework to distinguish between visual interpretation and factual recall in Large Vision-Language Models (LVLMs). Existing evaluations often conflate these two abilities, making it diff…
RESEARCH · CL_68182 · Jun 2 · 13:09

New framework improves AI's understanding of meme intent

Researchers have developed a new framework called "Intent Projection" to improve how Large Vision Language Models (LVLMs) understand the pragmatic meaning behind multimodal content like memes. This approach separates th…
TOOL · CL_66043 · Jun 2 · 04:00

LVLMs can self-improve small object grounding using attention patterns

Researchers have developed a novel framework, ACS-Learned, that leverages the internal attention patterns of Large Vision Language Models (LVLMs) to improve the grounding of small objects without requiring fine-tuning. …
RESEARCH · CL_65730 · Jun 2 · 04:00

New AI defenses and attacks target vision-language models

Researchers have developed new methods to defend against and exploit backdoor attacks in advanced AI models. One approach, BYORn, aims to improve the robustness of large vision-language models by identifying and replaci…
TOOL · CL_63043 · Jun 1 · 04:00

New Immuno-VLM framework boosts vision-language model trustworthiness

Researchers have introduced Immuno-VLM, a novel framework designed to enhance the trustworthiness of large vision-language models in open-world scenarios. This bio-inspired approach utilizes generative semantic antibodi…
RESEARCH · CL_65234 · May 29 · 19:24

New methods enhance uncertainty quantification in large AI models

Researchers are developing new methods to improve uncertainty quantification in large models. One approach, Semantic Gaussian Process Uncertainty (SGPU), analyzes the geometric structure of answer embeddings to estimate…
TOOL · CL_59108 · May 29 · 04:00

Pointing methods boost LVLM counting accuracy via spatial coordinates

A new research paper explores how "pointing-based methods" can enhance the counting abilities of Large Vision-Language Models (LVLMs). These methods involve the model first identifying and generating coordinates for tar…
RESEARCH · CL_56537 · May 27 · 17:01

New framework SeProD boosts LVLM visual search with self-prophetic decoding

Researchers have introduced SeProD, a novel self-prophetic decoding framework designed to enhance the visual search capabilities of Large Vision-Language Models (LVLMs). This framework addresses challenges such as post-…
TOOL · CL_63444 · May 27 · 04:04

Image-tool interaction boosts multimodal AI safety against jailbreaks

A new paper explores the safety implications of the "think-with-image" reasoning paradigm in large vision-language models. Researchers found that systems using explicit image-tool interaction were significantly more rob…

New HarmVideoBench evaluates LLMs on nuanced harmful video understanding · 2 sources tracked

New CFPO framework enhances multimodal reasoning in LVLMs

New Med-R2 strategy enhances AI medical report generation accuracy

New research tackles VLM hallucinations, distillation, and interpretability

New RL framework enhances LVLM image captioning by minimizing information loss

New MAD-RAG method tackles Attention Distraction in LVLMs

New MODE-RAG system tackles hallucinations in multimodal AI generation

New method tackles vision-language model hallucinations with evidence acquisition

New CORA method bridges thinking-answer gap in multimodal AI

New CapRL++ framework trains better image and video captioning models

New Impostor benchmark dataset challenges AI image manipulation detection

New framework tests LVLMs' visual reasoning vs. factual recall

New framework improves AI's understanding of meme intent

LVLMs can self-improve small object grounding using attention patterns

New AI defenses and attacks target vision-language models

New Immuno-VLM framework boosts vision-language model trustworthiness

New methods enhance uncertainty quantification in large AI models

Pointing methods boost LVLM counting accuracy via spatial coordinates

New framework SeProD boosts LVLM visual search with self-prophetic decoding

Image-tool interaction boosts multimodal AI safety against jailbreaks