MLLMs
PulseAugur coverage of MLLMs — every cluster mentioning MLLMs across labs, papers, and developer communities, ranked by signal.
- instance of Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 90%
- used by Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 70%
- used by train of thought 70%
- used by Standard Chinese 70%
- used by Chain Of Thought 70%
- used by English 60%
- 2026-05-22 research_milestone A new pipeline was introduced to enhance MLLMs for safety-critical driving video analysis. source
- 2026-05-22 research_milestone Researchers reveal and propose a method to recover temporal grounding in multimodal large language models. source
- 2026-05-22 research_milestone A new benchmark and dataset were introduced to evaluate MLLMs' ability to reason about personality beyond superficial cues. source
- 2026-05-21 research_milestone A new method using MLLMs for detecting AI-generated Chinese poetry achieves state-of-the-art results. source
18 day(s) with sentiment data
-
ReGATE method accelerates multimodal LLM training by selectively pruning tokens
Researchers have developed ReGATE, a novel method to accelerate the training of multimodal large language models (MLLMs) by adaptively pruning tokens. This technique uses a teacher-student framework where a frozen teach…
-
COHERENCE benchmark evaluates MLLMs' fine-grained image-text alignment in interleaved contexts
Researchers have introduced COHERENCE, a new benchmark designed to assess the fine-grained image-text alignment capabilities of Multimodal Large Language Models (MLLMs). Existing benchmarks often overlook the complexiti…
-
New framework improves MLLMs' accuracy in dial-based measurement reading
Researchers have identified a significant weakness in multimodal large language models (MLLMs) when it comes to reading dial-based measurements. These models struggle with accuracy and are highly sensitive to changes in…
-
SIEVES method boosts multimodal LLM coverage on visual tasks with evidence scoring
Researchers have developed SIEVES, a novel method for improving the reliability of multimodal large language models (MLLMs) in out-of-distribution scenarios. SIEVES works by learning to estimate the quality of visual ev…
-
CrossGuard safeguards multimodal LLMs against implicit and explicit attacks
Researchers have developed CrossGuard, a new defense system designed to protect Multimodal Large Language Models (MLLMs) from sophisticated implicit attacks. These attacks combine seemingly benign text and image inputs …
-
MLLMs tested on reconstructing masked text from visual context with MMTR-Bench
Researchers have developed MMTR-Bench, a new benchmark designed to test the ability of Multimodal Large Language Models (MLLMs) to reconstruct missing text solely from visual context. This benchmark avoids explicit prom…
-
AI system SoccerRef-Agents uses multi-agent reasoning for soccer refereeing
Researchers have introduced SoccerRef-Agents, a multi-agent system designed to automate soccer refereeing with enhanced accuracy and explainability. The framework incorporates a new benchmark dataset, SoccerRefBench, fe…
-
New methods enhance LLMs for fine-grained visual recognition tasks
Two new research papers propose novel methods for improving Fine-Grained Visual Recognition (FGVR) using Large Vision-Language Models (LVLMs). The first paper introduces SARE, a framework that adaptively applies reasoni…
-
OmniVTG dataset and CoT paradigm enhance open-world video temporal grounding
Researchers have introduced OmniVTG, a large-scale dataset and training paradigm designed to improve open-world Video Temporal Grounding (VTG) for Multimodal Large Language Models (MLLMs). The dataset was created using …
-
New benchmark reveals AI models struggle with ego-motion understanding in driving
Researchers have developed EgoDyn-Bench, a new benchmark designed to evaluate how well vision-centric foundation models understand ego-motion in autonomous driving scenarios. The benchmark reveals a significant 'Percept…
-
PivotMerge framework integrates multimodal LLM alignment capabilities
Researchers have introduced PivotMerge, a novel framework designed to integrate the cross-modal alignment capabilities of different multimodal large language models (MLLMs). This approach addresses challenges in merging…
-
New benchmarks SpecVQA and M3-VQA challenge multimodal LLMs in scientific and multi-hop reasoning
Researchers have introduced M$^3$-VQA, a new benchmark designed to evaluate multimodal large language models (MLLMs) on complex reasoning tasks involving multiple entities and multi-hop inference. The benchmark challeng…
-
MEG-RAG framework improves multimodal evidence selection for LLMs
Researchers have introduced MEG-RAG, a novel framework designed to improve Multimodal Retrieval-Augmented Generation (MRAG) systems. Current MRAG models often struggle to accurately assess the relevance of retrieved mul…
-
MLLMs improve object grounding in crowded scenes using language-guided semantic cues
Researchers have developed a new method to improve the robustness of Multimodal Large Language Models (MLLMs) in challenging visual scenarios like crowded scenes. The approach leverages Language-Guided Semantic Cues (LG…
-
New benchmarks and frameworks tackle AI agent limitations in website generation and remote sensing tasks
Researchers have introduced InteractWeb-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) in website generation tasks. This benchmark simulates real-world conditions where user instruc…
-
Researchers develop DecAF for training-free video reasoning segmentation
Researchers have developed Decomposed Attention Fusion (DecAF), a novel method for video reasoning segmentation that operates without requiring model retraining. DecAF refines attention maps generated by multimodal larg…
-
New benchmarks SciMDR and ShredBench evaluate multimodal LLMs on scientific documents and reconstruction
Researchers have introduced ShredBench, a new benchmark designed to evaluate the semantic reasoning abilities of multimodal large language models (MLLMs) in reconstructing documents from shredded fragments. This benchma…
-
New CGC framework boosts multimodal LLMs for fine-grained image understanding
Researchers have introduced Compositional Grounded Contrast (CGC), a new framework designed to enhance the fine-grained multi-image understanding capabilities of Multimodal Large Language Models (MLLMs). This approach a…
-
MLLMs predict mouse social dominance in novel MTT-Bench benchmark
Researchers have developed MTT-Bench, a new benchmark for analyzing mouse social dominance using Multimodal Large Language Models (MLLMs). This framework fine-tunes existing MLLM architectures to predict dominance hiera…
-
New benchmark tests MLLMs' Chinese sign language understanding capabilities
Researchers have developed CNSL-bench, a new benchmark designed to evaluate the sign language understanding capabilities of multimodal large language models (MLLMs). This benchmark is grounded in the official Chinese Na…