ENTITY Multimodal Large Language Models

Multimodal Large Language Models

PulseAugur coverage of Multimodal Large Language Models — every cluster mentioning Multimodal Large Language Models across labs, papers, and developer communities, ranked by signal.

Total · 30d

21 over 90d

Releases · 30d

0 over 90d

Papers · 30d

21 over 90d

TIER MIX · 90D

RECENT · PAGE 1/1 · 7 TOTAL

TOOL · CL_30586 · May 13 · 03:52

New GSEC framework uses LLMs for improved image clustering

Researchers have developed a new image clustering framework called GSEC, which utilizes generative semantic guidance and a bi-layer ensemble strategy. This approach employs Multimodal Large Language Models to create sem…
TOOL · CL_30596 · May 13 · 01:54

New benchmark CiteVQA exposes "Attribution Hallucination" in LLMs

Researchers have introduced CiteVQA, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to accurately attribute answers to specific source regions within documents. Unlike pre…
TOOL · CL_30605 · May 12 · 19:33

New benchmark reveals AI models lag human experts in judging image beauty

Researchers have developed the Visual Aesthetic Benchmark (VAB) to evaluate how well multimodal large language models (MLLMs) can judge beauty in images. Their study found that current frontier MLLMs perform significant…
TOOL · CL_29251 · May 12 · 17:11

New benchmark reveals MLLMs struggle with spatial reasoning

Researchers have introduced PCSR-Bench, a new diagnostic benchmark designed to evaluate the spatial reasoning capabilities of multimodal large language models (MLLMs) when processing omnidirectional images. The benchmar…
TOOL · CL_29402 · May 12 · 14:07

New benchmark tests multimodal LLMs on complex optimization tasks

Researchers have introduced MM-OptBench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on optimization modeling tasks. This benchmark incorporates both text and visual information, a depa…
TOOL · CL_29435 · May 12 · 07:22

New multimodal benchmark uses 900K Japanese student responses

Researchers have developed a new multimodal benchmark using data from Japan's National Assessment of Academic Ability, which includes approximately 900,000 aggregated student responses. This dataset features real exam m…
TOOL · CL_27553 · May 11 · 08:21

New V-ABS framework enhances multimodal visual reasoning

Researchers have developed V-ABS, a novel beam search framework designed to improve multi-step visual reasoning in multimodal large language models. This approach addresses the imagination-action-observer bias by iterat…

New GSEC framework uses LLMs for improved image clustering

New benchmark CiteVQA exposes "Attribution Hallucination" in LLMs

New benchmark reveals AI models lag human experts in judging image beauty

New benchmark reveals MLLMs struggle with spatial reasoning

New benchmark tests multimodal LLMs on complex optimization tasks

New multimodal benchmark uses 900K Japanese student responses

New V-ABS framework enhances multimodal visual reasoning