ENTITY MLLMs

MLLMs

PulseAugur coverage of MLLMs — every cluster mentioning MLLMs across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

96 over 90d

Releases · 30d

0 over 90d

Papers · 30d

96 over 90d

TIER MIX · 90D

TOPICS

paper 96
other 35
model release 31
safety 18
product 12
infra 4

RELATIONSHIPS

instance of Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 90%
used by Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 70%
used by train of thought 70%
used by Standard Chinese 70%
used by Chain Of Thought 70%

TIMELINE

2026-05-22 research_milestone A new pipeline was introduced to enhance MLLMs for safety-critical driving video analysis. source
2026-05-22 research_milestone Researchers reveal and propose a method to recover temporal grounding in multimodal large language models. source
2026-05-22 research_milestone A new benchmark and dataset were introduced to evaluate MLLMs' ability to reason about personality beyond superficial cues. source
2026-05-21 research_milestone A new method using MLLMs for detecting AI-generated Chinese poetry achieves state-of-the-art results. source

SENTIMENT · 30D

18 day(s) with sentiment data

RECENT · PAGE 3/5 · 96 TOTAL

TOOL · CL_38243 · May 18 · 16:31

New CrossView Suite enhances multimodal models' spatial reasoning

Researchers have introduced the CrossView Suite, a comprehensive framework designed to enhance the spatial reasoning capabilities of multimodal large language models (MLLMs). This suite addresses limitations in cross-vi…
RESEARCH · CL_37979 · May 18 · 07:09

New image tokenization methods boost MLLM performance

Two new research papers propose novel methods for tokenizing images to improve multimodal large language models (MLLMs). The first paper, VFMTok, uses a frozen vision foundation model as a tokenizer, achieving significa…
RESEARCH · CL_43941 · May 16 · 16:15

New architectures enable real-time video understanding

Researchers are developing new methods for real-time video understanding, moving beyond traditional offline analysis. Several papers propose architectures that decouple visual perception from language generation to impr…
TOOL · CL_36926 · May 12 · 17:11

New benchmark reveals MLLMs struggle with spatial reasoning

Researchers have developed PCSR-Bench, a new benchmark designed to evaluate the spatial reasoning capabilities of Multimodal Large Language Models (MLLMs) when processing omnidirectional images. The benchmark, comprisin…
TOOL · CL_27571 · May 11 · 01:59

New benchmark EgoMemReason tests AI memory in week-long videos

Researchers have introduced EgoMemReason, a new benchmark designed to test the memory capabilities of multimodal large language models (MLLMs) and agentic frameworks in understanding long-horizon egocentric videos. The …
TOOL · CL_22498 · May 8 · 04:00

New metric evaluates MLLMs for logical consistency without annotations

Researchers have introduced a new metric, VL-LCM, to evaluate the logical consistency of multimodal large language models (MLLMs) without requiring ground-truth annotations. This metric assesses the cause-effect reasoni…
RESEARCH · CL_22492 · May 8 · 04:00

AI research highlights challenges in cross-cultural and non-English language model development

Two new research papers highlight challenges in developing AI for non-English languages and cultures. One paper reflects on two decades of building Arabic NLP resources, concluding that social and institutional factors …
TOOL · CL_22465 · May 8 · 04:00

New research reveals MLLM jailbreaks exploit reconstruction-concealment tradeoff

Researchers have identified a critical tradeoff in multimodal large language models (MLLMs) related to how harmful queries are concealed and reconstructed. They found that existing methods for transforming harmful input…
TOOL · CL_22437 · May 8 · 04:00

Visual Para-Thinker introduces parallel reasoning to multimodal LLMs

Researchers have introduced Visual Para-Thinker, a novel framework for parallel reasoning in multimodal large language models (MLLMs). This approach shifts from vertical scaling of reasoning depth to a parallel strategy…
TOOL · CL_22420 · May 8 · 04:00

New SOW method uses MLLMs to improve image generation coherence

Researchers have introduced Selective One-Way Diffusion (SOW), a novel approach to image generation that reframes diffusion models for improved contextual coherence. SOW utilizes Multimodal Large Language Models (MLLMs)…
TOOL · CL_22405 · May 8 · 04:00

MLLMs enable training-free dense hand contact estimation, outperforming supervised methods

Researchers have developed ContactPrompt, a novel training-free method for dense hand contact estimation that utilizes multi-modal large language models (MLLMs). This approach addresses challenges in encoding 3D hand ge…
RESEARCH · CL_21787 · May 7 · 16:37

New MedHorizon benchmark tests AI's ability to understand long medical videos

Researchers have introduced MedHorizon, a new benchmark designed to test multimodal large language models (MLLMs) on understanding long-form medical videos. This benchmark includes 759 hours of clinical procedures and 1…
TOOL · CL_20778 · May 7 · 04:00

Vision-EKIPL framework boosts MLLM visual reasoning with external knowledge infusion

Researchers have introduced Vision-EKIPL, a novel reinforcement learning framework designed to enhance visual reasoning in Multimodal Large Language Models (MLLMs). This approach incorporates high-quality actions genera…
TOOL · CL_18628 · May 6 · 04:00

New MSEarth benchmark uses MLLMs for Earth science discovery

Researchers have developed MSEarth, a new multimodal benchmark designed to evaluate the capabilities of multimodal large language models (MLLMs) in Earth science reasoning. This dataset comprises over 289,000 figures wi…
RESEARCH · CL_18678 · May 5 · 14:18

New VQA methods enhance explainability and knowledge integration for multimodal LLMs

Researchers have developed CoExVQA, a new framework for Document Visual Question Answering (DocVQA) that enhances explainability by breaking down the reasoning process. This method first identifies relevant evidence, th…
RESEARCH · CL_18700 · May 5 · 04:14

MLLMs show promise in analyzing seizure movements, outperforming traditional models

A pilot study explored the use of multimodal large language models (MLLMs) for analyzing pathological movements in seizure videos. The research found that MLLMs, without specific training, outperformed traditional compu…
RESEARCH · CL_21948 · May 5 · 04:00

New AI unlearning methods balance data removal with model utility

Researchers have developed new methods for machine unlearning, a process that removes specific data from AI models without full retraining. One approach, SHRED, uses self-distillation and logit demotion to identify and …
TOOL · CL_15945 · May 5 · 04:00

New In-Prompt Process Supervision framework enhances MLLMs for video moderation

Researchers have developed a new framework called IPS (In-Prompt Process Supervision) to enhance the accuracy of multimodal large language models (MLLMs) in content moderation for short videos. This method incorporates …
TOOL · CL_15707 · May 5 · 04:00

Researchers use RL to improve MLLM regression on imbalanced data

Researchers have developed a new framework to improve how multimodal large language models (MLLMs) handle numerical regression tasks, particularly those with imbalanced data distributions. Existing training methods ofte…
RESEARCH · CL_15670 · May 5 · 04:00

New HERMES and DSCache methods improve streaming video understanding with KV cache

Researchers have developed new methods to improve the efficiency of multimodal large language models (MLLMs) for understanding streaming video. One approach, HERMES, conceptualizes the KV cache as a hierarchical memory …

New CrossView Suite enhances multimodal models' spatial reasoning

New image tokenization methods boost MLLM performance

New architectures enable real-time video understanding

New benchmark reveals MLLMs struggle with spatial reasoning

New benchmark EgoMemReason tests AI memory in week-long videos

New metric evaluates MLLMs for logical consistency without annotations

AI research highlights challenges in cross-cultural and non-English language model development

New research reveals MLLM jailbreaks exploit reconstruction-concealment tradeoff

Visual Para-Thinker introduces parallel reasoning to multimodal LLMs

New SOW method uses MLLMs to improve image generation coherence

MLLMs enable training-free dense hand contact estimation, outperforming supervised methods

New MedHorizon benchmark tests AI's ability to understand long medical videos

Vision-EKIPL framework boosts MLLM visual reasoning with external knowledge infusion

New MSEarth benchmark uses MLLMs for Earth science discovery

New VQA methods enhance explainability and knowledge integration for multimodal LLMs

MLLMs show promise in analyzing seizure movements, outperforming traditional models

New AI unlearning methods balance data removal with model utility

New In-Prompt Process Supervision framework enhances MLLMs for video moderation

Researchers use RL to improve MLLM regression on imbalanced data

New HERMES and DSCache methods improve streaming video understanding with KV cache