ENTITY MLLMs

MLLMs

PulseAugur coverage of MLLMs — every cluster mentioning MLLMs across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

145

145 over 90d

Releases · 30d

0 over 90d

Papers · 30d

145

145 over 90d

TIER MIX · 90D

TOPICS

paper 145
model release 59
other 41
safety 29
product 17
infra 4
policy 1

RELATIONSHIPS

instance of Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 95%
instance of CatalyzeX 90%
instance of DagsHub 90%
instance of Gotit.pub 90%
used by Chain Of Thought 70%
used by Standard Chinese 70%
used by alphaXiv 70%
used by train of thought 70%
used by Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond 70%
used by English 60%

TIMELINE

2026-05-22 research_milestone A new pipeline was introduced to enhance MLLMs for safety-critical driving video analysis. source
2026-05-22 research_milestone Researchers reveal and propose a method to recover temporal grounding in multimodal large language models. source
2026-05-22 research_milestone A new benchmark and dataset were introduced to evaluate MLLMs' ability to reason about personality beyond superficial cues. source
2026-05-21 research_milestone A new method using MLLMs for detecting AI-generated Chinese poetry achieves state-of-the-art results. source

SENTIMENT · 30D

21 day(s) with sentiment data

RECENT · PAGE 1/8 · 145 TOTAL

TOOL · CL_111649 · Jun 26 · 04:00

New paper identifies critical gaps in multimodal LLM evaluation

A new paper published on arXiv highlights significant gaps in the evaluation of multimodal large language models (MLLMs). The research points out that current benchmarks often focus on isolated tasks and fail to assess …
RESEARCH · CL_111284 · Jun 25 · 15:06

New FlameVQA benchmark tests MLLMs on UAV wildfire intelligence

Researchers have introduced FlameVQA, a new benchmark designed to improve wildfire monitoring capabilities using Unmanned Aerial Vehicles (UAVs). This benchmark leverages paired RGB and radiometric thermal imagery to en…
RESEARCH · CL_111291 · Jun 25 · 13:12

New EVIS system segments videos by event for improved understanding

Researchers have developed EVIS, an Event-Aware Instructed Assistant for Referring Video Segmentation. This new method addresses limitations in existing approaches by decomposing videos into distinct events, allowing fo…
RESEARCH · CL_111505 · Jun 25 · 06:31

New SocialPersona benchmark tests MLLMs' ability to infer user preferences from social media

Researchers have introduced SocialPersona, a new benchmark designed to evaluate the ability of multimodal large language models (MLLMs) to infer user preferences from social media data. The benchmark utilizes longitudin…
RESEARCH · CL_111336 · Jun 25 · 05:02

New DiCoBench benchmark reveals MLLM struggles with high-resolution visual perception

Researchers have introduced DiCoBench, a new benchmark designed to evaluate the fine-grained perception capabilities of Multimodal Large Language Models (MLLMs) using high-resolution, multi-image inputs. The benchmark f…
TOOL · CL_110022 · Jun 25 · 04:00

New research evaluates MLLMs for assistive AI tasks

A new paper explores the capabilities of Multimodal Large Language Models (MLLMs) for assistive AI applications. Researchers developed a system called NetraLink, using a GoPro camera to capture egocentric data, and crea…
RESEARCH · CL_111613 · Jun 24 · 21:12

New VIGIL framework combats visual laziness in multimodal LLMs

Researchers have introduced VIGIL, a novel reinforcement learning framework designed to address "visual laziness" in multimodal large language models (MLLMs). This issue causes MLLMs to generate responses that contradic…
RESEARCH · CL_109506 · Jun 24 · 17:00

New benchmark reveals MLLMs struggle with complex visual reasoning · 2 sources tracked

A new benchmark called TriViewBench has been developed to assess the structural reasoning capabilities of Multimodal Large Language Models (MLLMs). The benchmark, comprising synthetic 3D scenes with varying object count…
RESEARCH · CL_109634 · Jun 24 · 13:55

New framework uses scene graphs to enable LLMs to reason over long videos

Researchers have developed a new framework to enable multi-modal large language models (MLLMs) to reason over long-form egocentric videos, overcoming current token limitations. The approach utilizes Egocentric Scene Gra…
RESEARCH · CL_109637 · Jun 24 · 00:00

ShutterMuse: New MLLM offers capture-time photography guidance · 3 sources tracked

Researchers have introduced ShutterMuse, a multimodal large language model designed to assist with photography during image capture. This model addresses the gap in current benchmarks by providing both composition guida…
RESEARCH · CL_107733 · Jun 23 · 14:03

New benchmarks push video AI to ground answers in temporal evidence · 4 sources tracked

Two new research papers introduce benchmarks and models for video question answering that focus on temporal reasoning and evidence grounding. The EG-VQA benchmark, with over 11,000 QA pairs and temporal evidence annotat…
RESEARCH · CL_107915 · Jun 23 · 13:06

ForensicsTok uses token generation for precise image tampering localization

Researchers have introduced ForensicsTok, a novel approach for localizing image tampering by reframing the task as an autoregressive sequence generation problem. This method directly generates token sequences to predict…
RESEARCH · CL_107936 · Jun 23 · 08:17

ActiveScope framework enhances MLLM perception by correcting errors

Researchers have introduced ActiveScope, a novel training-free framework designed to improve the perception capabilities of Multimodal Large Language Models (MLLMs). This framework addresses limitations in high-resoluti…
TOOL · CL_105100 · Jun 22 · 17:58

New AIR system enhances MLLMs with adaptive code-based numerical reasoning

Researchers have developed AIR, an Adaptive Interleaved Reasoning system designed to enhance multimodal large language models (MLLMs). This system extends reinforcement learning to enable MLLMs to perform complex numeri…
RESEARCH · CL_105087 · Jun 22 · 09:38

New PIVOTSBench benchmark evaluates MLLMs on interpersonal relationship reasoning

Researchers have introduced PIVOTSBench, a new benchmark designed to evaluate how well multimodal large language models (MLLMs) can understand and reason about interpersonal relationships. This benchmark, derived from S…
TOOL · CL_100258 · Jun 19 · 04:00

MLLM agents show promise in zero-shot disease diagnosis, but clinical deployment remains distant

A pilot study published on arXiv explores the capability of multimodal large language models (MLLMs) to distinguish between visually similar diseases in a zero-shot setting. Researchers introduced a multi-agent framewor…
RESEARCH · CL_99522 · Jun 18 · 14:23

ELVA framework tackles "grain blindness" in multimodal retrieval · 2 sources tracked

Researchers have introduced ELVA, a novel framework designed to address "grain blindness" in Universal Multimodal Retrieval (UMR) systems that utilize Multimodal Large Language Models (MLLMs). Grain blindness occurs whe…
RESEARCH · CL_99584 · Jun 18 · 12:46

New benchmark and method improve MLLM negation comprehension in remote sensing

Researchers have developed RS-Neg, a new benchmark designed to evaluate and improve the negation comprehension abilities of Multimodal Large Language Models (MLLMs) in remote sensing tasks. Current advanced MLLMs exhibi…
RESEARCH · CL_99810 · Jun 18 · 08:09

SpatialSV framework enhances MLLMs' 3D spatial awareness with interpretable visual supervision

Researchers have introduced SpatialSV, a novel framework aimed at enhancing the 3D spatial awareness of multimodal large language models (MLLMs). Unlike existing methods that rely on external tools or opaque feature dis…
TOOL · CL_98070 · Jun 18 · 04:00

New attack hijacks MLLMs with single perturbation · arXiv research

Researchers have developed a novel attack method called Semantic-Aware Hijacking that can compromise Multimodal Large Language Models (MLLMs) with a single adversarial perturbation. This technique, termed Semantic-Aware…

New paper identifies critical gaps in multimodal LLM evaluation

New FlameVQA benchmark tests MLLMs on UAV wildfire intelligence

New EVIS system segments videos by event for improved understanding

New SocialPersona benchmark tests MLLMs' ability to infer user preferences from social media

New DiCoBench benchmark reveals MLLM struggles with high-resolution visual perception

New research evaluates MLLMs for assistive AI tasks

New VIGIL framework combats visual laziness in multimodal LLMs

New benchmark reveals MLLMs struggle with complex visual reasoning · 2 sources tracked

New framework uses scene graphs to enable LLMs to reason over long videos

ShutterMuse: New MLLM offers capture-time photography guidance · 3 sources tracked

New benchmarks push video AI to ground answers in temporal evidence · 4 sources tracked

ForensicsTok uses token generation for precise image tampering localization

ActiveScope framework enhances MLLM perception by correcting errors

New AIR system enhances MLLMs with adaptive code-based numerical reasoning

New PIVOTSBench benchmark evaluates MLLMs on interpersonal relationship reasoning

MLLM agents show promise in zero-shot disease diagnosis, but clinical deployment remains distant

ELVA framework tackles "grain blindness" in multimodal retrieval · 2 sources tracked

New benchmark and method improve MLLM negation comprehension in remote sensing

SpatialSV framework enhances MLLMs' 3D spatial awareness with interpretable visual supervision

New attack hijacks MLLMs with single perturbation · arXiv research