ENTITY MLLMs

MLLMs

PulseAugur coverage of MLLMs — every cluster mentioning MLLMs across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

103

103 over 90d

Releases · 30d

0 over 90d

Papers · 30d

103

103 over 90d

TIER MIX · 90D

TOPICS

paper 103
other 35
model release 34
safety 21
product 13
infra 4
policy 1

RELATIONSHIPS

instance of Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 90%
used by Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 70%
used by train of thought 70%
used by Standard Chinese 70%
used by Chain Of Thought 70%
used by English 60%

TIMELINE

2026-05-22 research_milestone A new pipeline was introduced to enhance MLLMs for safety-critical driving video analysis. source
2026-05-22 research_milestone Researchers reveal and propose a method to recover temporal grounding in multimodal large language models. source
2026-05-22 research_milestone A new benchmark and dataset were introduced to evaluate MLLMs' ability to reason about personality beyond superficial cues. source
2026-05-21 research_milestone A new method using MLLMs for detecting AI-generated Chinese poetry achieves state-of-the-art results. source

SENTIMENT · 30D

18 day(s) with sentiment data

RECENT · PAGE 5/6 · 103 TOTAL

RESEARCH · CL_10110 · Apr 30 · 04:00

ReGATE method accelerates multimodal LLM training by selectively pruning tokens

Researchers have developed ReGATE, a novel method to accelerate the training of multimodal large language models (MLLMs) by adaptively pruning tokens. This technique uses a teacher-student framework where a frozen teach…
RESEARCH · CL_11400 · Apr 30 · 03:59

COHERENCE benchmark evaluates MLLMs' fine-grained image-text alignment in interleaved contexts

Researchers have introduced COHERENCE, a new benchmark designed to assess the fine-grained image-text alignment capabilities of Multimodal Large Language Models (MLLMs). Existing benchmarks often overlook the complexiti…
RESEARCH · CL_09749 · Apr 29 · 12:41

New framework improves MLLMs' accuracy in dial-based measurement reading

Researchers have identified a significant weakness in multimodal large language models (MLLMs) when it comes to reading dial-based measurements. These models struggle with accuracy and are highly sensitive to changes in…
RESEARCH · CL_08517 · Apr 28 · 16:57

SIEVES method boosts multimodal LLM coverage on visual tasks with evidence scoring

Researchers have developed SIEVES, a novel method for improving the reliability of multimodal large language models (MLLMs) in out-of-distribution scenarios. SIEVES works by learning to estimate the quality of visual ev…
RESEARCH · CL_07047 · Apr 28 · 04:00

CrossGuard safeguards multimodal LLMs against implicit and explicit attacks

Researchers have developed CrossGuard, a new defense system designed to protect Multimodal Large Language Models (MLLMs) from sophisticated implicit attacks. These attacks combine seemingly benign text and image inputs …
RESEARCH · CL_07035 · Apr 28 · 04:00

MLLMs tested on reconstructing masked text from visual context with MMTR-Bench

Researchers have developed MMTR-Bench, a new benchmark designed to test the ability of Multimodal Large Language Models (MLLMs) to reconstruct missing text solely from visual context. This benchmark avoids explicit prom…
RESEARCH · CL_06941 · Apr 28 · 04:00

AI system SoccerRef-Agents uses multi-agent reasoning for soccer refereeing

Researchers have introduced SoccerRef-Agents, a multi-agent system designed to automate soccer refereeing with enhanced accuracy and explainability. The framework incorporates a new benchmark dataset, SoccerRefBench, fe…
RESEARCH · CL_06571 · Apr 28 · 04:00

New methods enhance LLMs for fine-grained visual recognition tasks

Two new research papers propose novel methods for improving Fine-Grained Visual Recognition (FGVR) using Large Vision-Language Models (LVLMs). The first paper introduces SARE, a framework that adaptively applies reasoni…
RESEARCH · CL_06531 · Apr 28 · 04:00

OmniVTG dataset and CoT paradigm enhance open-world video temporal grounding

Researchers have introduced OmniVTG, a large-scale dataset and training paradigm designed to improve open-world Video Temporal Grounding (VTG) for Multimodal Large Language Models (MLLMs). The dataset was created using …
RESEARCH · CL_06419 · Apr 28 · 04:00

New benchmark reveals AI models struggle with ego-motion understanding in driving

Researchers have developed EgoDyn-Bench, a new benchmark designed to evaluate how well vision-centric foundation models understand ego-motion in autonomous driving scenarios. The benchmark reveals a significant 'Percept…
RESEARCH · CL_06400 · Apr 28 · 04:00

PivotMerge framework integrates multimodal LLM alignment capabilities

Researchers have introduced PivotMerge, a novel framework designed to integrate the cross-modal alignment capabilities of different multimodal large language models (MLLMs). This approach addresses challenges in merging…
RESEARCH · CL_06631 · Apr 28 · 01:57

New benchmarks SpecVQA and M3-VQA challenge multimodal LLMs in scientific and multi-hop reasoning

Researchers have introduced M$^3$-VQA, a new benchmark designed to evaluate multimodal large language models (MLLMs) on complex reasoning tasks involving multiple entities and multi-hop inference. The benchmark challeng…
RESEARCH · CL_06263 · Apr 27 · 14:51

MEG-RAG framework improves multimodal evidence selection for LLMs

Researchers have introduced MEG-RAG, a novel framework designed to improve Multimodal Retrieval-Augmented Generation (MRAG) systems. Current MRAG models often struggle to accurately assess the relevance of retrieved mul…
RESEARCH · CL_06208 · Apr 27 · 04:42

MLLMs improve object grounding in crowded scenes using language-guided semantic cues

Researchers have developed a new method to improve the robustness of Multimodal Large Language Models (MLLMs) in challenging visual scenarios like crowded scenes. The approach leverages Language-Guided Semantic Cues (LG…
RESEARCH · CL_06209 · Apr 27 · 04:34

New benchmarks and frameworks tackle AI agent limitations in website generation and remote sensing tasks

Researchers have introduced InteractWeb-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) in website generation tasks. This benchmark simulates real-world conditions where user instruc…
RESEARCH · CL_05105 · Apr 27 · 04:00

Researchers develop DecAF for training-free video reasoning segmentation

Researchers have developed Decomposed Attention Fusion (DecAF), a novel method for video reasoning segmentation that operates without requiring model retraining. DecAF refines attention maps generated by multimodal larg…
RESEARCH · CL_06302 · Apr 26 · 17:26

New benchmarks SciMDR and ShredBench evaluate multimodal LLMs on scientific documents and reconstruction

Researchers have introduced ShredBench, a new benchmark designed to evaluate the semantic reasoning abilities of multimodal large language models (MLLMs) in reconstructing documents from shredded fragments. This benchma…
RESEARCH · CL_04920 · Apr 24 · 12:26

New CGC framework boosts multimodal LLMs for fine-grained image understanding

Researchers have introduced Compositional Grounded Contrast (CGC), a new framework designed to enhance the fine-grained multi-image understanding capabilities of Multimodal Large Language Models (MLLMs). This approach a…
RESEARCH · CL_04921 · Apr 24 · 12:20

MLLMs predict mouse social dominance in novel MTT-Bench benchmark

Researchers have developed MTT-Bench, a new benchmark for analyzing mouse social dominance using Multimodal Large Language Models (MLLMs). This framework fine-tunes existing MLLM architectures to predict dominance hiera…
RESEARCH · CL_04980 · Apr 24 · 08:59

New benchmark tests MLLMs' Chinese sign language understanding capabilities

Researchers have developed CNSL-bench, a new benchmark designed to evaluate the sign language understanding capabilities of multimodal large language models (MLLMs). This benchmark is grounded in the official Chinese Na…

ReGATE method accelerates multimodal LLM training by selectively pruning tokens

COHERENCE benchmark evaluates MLLMs' fine-grained image-text alignment in interleaved contexts

New framework improves MLLMs' accuracy in dial-based measurement reading

SIEVES method boosts multimodal LLM coverage on visual tasks with evidence scoring

CrossGuard safeguards multimodal LLMs against implicit and explicit attacks

MLLMs tested on reconstructing masked text from visual context with MMTR-Bench

AI system SoccerRef-Agents uses multi-agent reasoning for soccer refereeing

New methods enhance LLMs for fine-grained visual recognition tasks

OmniVTG dataset and CoT paradigm enhance open-world video temporal grounding

New benchmark reveals AI models struggle with ego-motion understanding in driving

PivotMerge framework integrates multimodal LLM alignment capabilities

New benchmarks SpecVQA and M3-VQA challenge multimodal LLMs in scientific and multi-hop reasoning

MEG-RAG framework improves multimodal evidence selection for LLMs

MLLMs improve object grounding in crowded scenes using language-guided semantic cues

New benchmarks and frameworks tackle AI agent limitations in website generation and remote sensing tasks

Researchers develop DecAF for training-free video reasoning segmentation

New benchmarks SciMDR and ShredBench evaluate multimodal LLMs on scientific documents and reconstruction

New CGC framework boosts multimodal LLMs for fine-grained image understanding

MLLMs predict mouse social dominance in novel MTT-Bench benchmark

New benchmark tests MLLMs' Chinese sign language understanding capabilities