ENTITY MLLMs

MLLMs

PulseAugur coverage of MLLMs — every cluster mentioning MLLMs across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

103

103 over 90d

Releases · 30d

0 over 90d

Papers · 30d

103

103 over 90d

TIER MIX · 90D

TOPICS

paper 103
other 35
model release 34
safety 21
product 13
infra 4
policy 1

RELATIONSHIPS

instance of Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 90%
used by Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 70%
used by train of thought 70%
used by Standard Chinese 70%
used by Chain Of Thought 70%
used by English 60%

TIMELINE

2026-05-22 research_milestone A new pipeline was introduced to enhance MLLMs for safety-critical driving video analysis. source
2026-05-22 research_milestone Researchers reveal and propose a method to recover temporal grounding in multimodal large language models. source
2026-05-22 research_milestone A new benchmark and dataset were introduced to evaluate MLLMs' ability to reason about personality beyond superficial cues. source
2026-05-21 research_milestone A new method using MLLMs for detecting AI-generated Chinese poetry achieves state-of-the-art results. source

SENTIMENT · 30D

18 day(s) with sentiment data

RECENT · PAGE 4/6 · 103 TOTAL

TOOL · CL_18628 · May 6 · 04:00

New MSEarth benchmark uses MLLMs for Earth science discovery

Researchers have developed MSEarth, a new multimodal benchmark designed to evaluate the capabilities of multimodal large language models (MLLMs) in Earth science reasoning. This dataset comprises over 289,000 figures wi…
RESEARCH · CL_18678 · May 5 · 14:18

New VQA methods enhance explainability and knowledge integration for multimodal LLMs

Researchers have developed CoExVQA, a new framework for Document Visual Question Answering (DocVQA) that enhances explainability by breaking down the reasoning process. This method first identifies relevant evidence, th…
RESEARCH · CL_18700 · May 5 · 04:14

MLLMs show promise in analyzing seizure movements, outperforming traditional models

A pilot study explored the use of multimodal large language models (MLLMs) for analyzing pathological movements in seizure videos. The research found that MLLMs, without specific training, outperformed traditional compu…
RESEARCH · CL_21948 · May 5 · 04:00

New AI unlearning methods balance data removal with model utility

Researchers have developed new methods for machine unlearning, a process that removes specific data from AI models without full retraining. One approach, SHRED, uses self-distillation and logit demotion to identify and …
TOOL · CL_15945 · May 5 · 04:00

New In-Prompt Process Supervision framework enhances MLLMs for video moderation

Researchers have developed a new framework called IPS (In-Prompt Process Supervision) to enhance the accuracy of multimodal large language models (MLLMs) in content moderation for short videos. This method incorporates …
TOOL · CL_15707 · May 5 · 04:00

Researchers use RL to improve MLLM regression on imbalanced data

Researchers have developed a new framework to improve how multimodal large language models (MLLMs) handle numerical regression tasks, particularly those with imbalanced data distributions. Existing training methods ofte…
RESEARCH · CL_15670 · May 5 · 04:00

New HERMES and DSCache methods improve streaming video understanding with KV cache

Researchers have developed new methods to improve the efficiency of multimodal large language models (MLLMs) for understanding streaming video. One approach, HERMES, conceptualizes the KV cache as a hierarchical memory …
TOOL · CL_15615 · May 5 · 04:00

VideoThinker framework improves lightweight MLLMs' video reasoning via causal debiasing

Researchers have developed VideoThinker, a novel framework designed to enhance the reasoning capabilities of lightweight multimodal language models (MLLMs) in video analysis. This approach addresses the issue of percept…
RESEARCH · CL_15728 · May 4 · 15:36

MLLMs show foundational visual gaps despite progress in multimodal reasoning

A new paper introduces a method to improve latent reasoning in multimodal large language models (MLLMs) by optimizing visual latents at inference time, addressing a pathology where their contribution is suppressed. Sepa…
RESEARCH · CL_15514 · May 4 · 14:14

New benchmark and models advance generalized moment retrieval in videos

Researchers have introduced Generalized Moment Retrieval (GMR), a new framework for video analysis that moves beyond the assumption of a single matching moment per query. This approach aims to retrieve all relevant temp…
RESEARCH · CL_14485 · May 4 · 04:00

MLLMs struggle with Chinese short-video misinformation, Gemini-2.5-Pro leads

Researchers have developed a new framework to evaluate how well Multimodal Large Language Models (MLLMs) can identify misinformation in Chinese short videos. The study utilized a dataset of 200 videos annotated for dece…
RESEARCH · CL_14374 · May 4 · 04:00

New AI models tackle complex chart reasoning and generation challenges

Researchers have developed new frameworks and benchmarks to improve how multimodal large language models (MLLMs) reason across complex visual data, such as charts. One approach, HierVA, uses a hierarchical agent to mana…
RESEARCH · CL_14367 · May 4 · 04:00

VideoDetective framework enhances long video understanding for MLLMs

Researchers have introduced VideoDetective, a novel framework designed to enhance the understanding of long videos by multimodal large language models (MLLMs). This approach addresses the challenge of limited context wi…
RESEARCH · CL_14362 · May 4 · 04:00

GeoThinker framework actively integrates geometry for advanced spatial reasoning

Researchers have developed GeoThinker, a novel framework that enhances spatial reasoning in multimodal large language models (MLLMs) by actively integrating geometric information. Unlike previous passive fusion methods,…
RESEARCH · CL_14352 · May 4 · 04:00

FreeRet framework turns multimodal LLMs into training-free retrievers

Researchers have developed FreeRet, a novel framework that enables multimodal large language models (MLLMs) to function as effective retrievers without requiring additional training. This plug-and-play system extracts s…
RESEARCH · CL_11849 · May 1 · 04:00

GuideDog dataset aids blind and low-vision navigation with egocentric multimodal data

Researchers have introduced GuideDog, a new dataset designed to aid the development of multimodal large language models (MLLMs) for blind and low-vision (BLV) individuals. The dataset comprises 22,000 image-description …
RESEARCH · CL_11777 · May 1 · 04:00

New benchmark tackles visual-semantic knowledge conflicts in surgical AI

Researchers have introduced OR-VSKC, a new benchmark designed to address visual-semantic knowledge conflicts in multimodal large language models (MLLMs) within operating room settings. The benchmark utilizes 28,190 high…
RESEARCH · CL_11343 · Apr 30 · 17:56

New AEGIS benchmark reveals AI image forensics lag behind generative advances

Researchers have introduced AEGIS, a new benchmark designed to evaluate the forensic analysis of AI-generated academic images. This benchmark addresses domain-specific complexity across seven academic categories and inc…
RESEARCH · CL_11383 · Apr 30 · 08:57

New SPUR benchmark reveals AI models struggle with scientific image interpretation

Researchers have introduced the SPUR benchmark, designed to evaluate multimodal large language models (MLLMs) on their ability to interpret scientific experimental images. SPUR includes over 4,000 question-answering pai…
RESEARCH · CL_10116 · Apr 30 · 04:00

New STAR-64K dataset and training framework boost MLLM reasoning

Researchers have developed a new method for training multi-modal large language models (MLLMs) to improve their ability to reason with abstract relational knowledge presented in images. This approach involves an automat…

New MSEarth benchmark uses MLLMs for Earth science discovery

New VQA methods enhance explainability and knowledge integration for multimodal LLMs

MLLMs show promise in analyzing seizure movements, outperforming traditional models

New AI unlearning methods balance data removal with model utility

New In-Prompt Process Supervision framework enhances MLLMs for video moderation

Researchers use RL to improve MLLM regression on imbalanced data

New HERMES and DSCache methods improve streaming video understanding with KV cache

VideoThinker framework improves lightweight MLLMs' video reasoning via causal debiasing

MLLMs show foundational visual gaps despite progress in multimodal reasoning

New benchmark and models advance generalized moment retrieval in videos

MLLMs struggle with Chinese short-video misinformation, Gemini-2.5-Pro leads

New AI models tackle complex chart reasoning and generation challenges

VideoDetective framework enhances long video understanding for MLLMs

GeoThinker framework actively integrates geometry for advanced spatial reasoning

FreeRet framework turns multimodal LLMs into training-free retrievers

GuideDog dataset aids blind and low-vision navigation with egocentric multimodal data

New benchmark tackles visual-semantic knowledge conflicts in surgical AI

New AEGIS benchmark reveals AI image forensics lag behind generative advances

New SPUR benchmark reveals AI models struggle with scientific image interpretation

New STAR-64K dataset and training framework boost MLLM reasoning