PulseAugur
EN
LIVE 13:09:40
ENTITY MLLMs

MLLMs

PulseAugur coverage of MLLMs — every cluster mentioning MLLMs across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
103
103 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
103
103 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
TIMELINE
  1. 2026-05-22 research_milestone A new pipeline was introduced to enhance MLLMs for safety-critical driving video analysis. source
  2. 2026-05-22 research_milestone Researchers reveal and propose a method to recover temporal grounding in multimodal large language models. source
  3. 2026-05-22 research_milestone A new benchmark and dataset were introduced to evaluate MLLMs' ability to reason about personality beyond superficial cues. source
  4. 2026-05-21 research_milestone A new method using MLLMs for detecting AI-generated Chinese poetry achieves state-of-the-art results. source
SENTIMENT · 30D

18 day(s) with sentiment data

RECENT · PAGE 5/6 · 103 TOTAL
  1. RESEARCH · CL_10110 ·

    ReGATE method accelerates multimodal LLM training by selectively pruning tokens

    Researchers have developed ReGATE, a novel method to accelerate the training of multimodal large language models (MLLMs) by adaptively pruning tokens. This technique uses a teacher-student framework where a frozen teach…

  2. RESEARCH · CL_11400 ·

    COHERENCE benchmark evaluates MLLMs' fine-grained image-text alignment in interleaved contexts

    Researchers have introduced COHERENCE, a new benchmark designed to assess the fine-grained image-text alignment capabilities of Multimodal Large Language Models (MLLMs). Existing benchmarks often overlook the complexiti…

  3. RESEARCH · CL_09749 ·

    New framework improves MLLMs' accuracy in dial-based measurement reading

    Researchers have identified a significant weakness in multimodal large language models (MLLMs) when it comes to reading dial-based measurements. These models struggle with accuracy and are highly sensitive to changes in…

  4. RESEARCH · CL_08517 ·

    SIEVES method boosts multimodal LLM coverage on visual tasks with evidence scoring

    Researchers have developed SIEVES, a novel method for improving the reliability of multimodal large language models (MLLMs) in out-of-distribution scenarios. SIEVES works by learning to estimate the quality of visual ev…

  5. RESEARCH · CL_07047 ·

    CrossGuard safeguards multimodal LLMs against implicit and explicit attacks

    Researchers have developed CrossGuard, a new defense system designed to protect Multimodal Large Language Models (MLLMs) from sophisticated implicit attacks. These attacks combine seemingly benign text and image inputs …

  6. RESEARCH · CL_07035 ·

    MLLMs tested on reconstructing masked text from visual context with MMTR-Bench

    Researchers have developed MMTR-Bench, a new benchmark designed to test the ability of Multimodal Large Language Models (MLLMs) to reconstruct missing text solely from visual context. This benchmark avoids explicit prom…

  7. RESEARCH · CL_06941 ·

    AI system SoccerRef-Agents uses multi-agent reasoning for soccer refereeing

    Researchers have introduced SoccerRef-Agents, a multi-agent system designed to automate soccer refereeing with enhanced accuracy and explainability. The framework incorporates a new benchmark dataset, SoccerRefBench, fe…

  8. RESEARCH · CL_06571 ·

    New methods enhance LLMs for fine-grained visual recognition tasks

    Two new research papers propose novel methods for improving Fine-Grained Visual Recognition (FGVR) using Large Vision-Language Models (LVLMs). The first paper introduces SARE, a framework that adaptively applies reasoni…

  9. RESEARCH · CL_06531 ·

    OmniVTG dataset and CoT paradigm enhance open-world video temporal grounding

    Researchers have introduced OmniVTG, a large-scale dataset and training paradigm designed to improve open-world Video Temporal Grounding (VTG) for Multimodal Large Language Models (MLLMs). The dataset was created using …

  10. RESEARCH · CL_06419 ·

    New benchmark reveals AI models struggle with ego-motion understanding in driving

    Researchers have developed EgoDyn-Bench, a new benchmark designed to evaluate how well vision-centric foundation models understand ego-motion in autonomous driving scenarios. The benchmark reveals a significant 'Percept…

  11. RESEARCH · CL_06400 ·

    PivotMerge framework integrates multimodal LLM alignment capabilities

    Researchers have introduced PivotMerge, a novel framework designed to integrate the cross-modal alignment capabilities of different multimodal large language models (MLLMs). This approach addresses challenges in merging…

  12. RESEARCH · CL_06631 ·

    New benchmarks SpecVQA and M3-VQA challenge multimodal LLMs in scientific and multi-hop reasoning

    Researchers have introduced M$^3$-VQA, a new benchmark designed to evaluate multimodal large language models (MLLMs) on complex reasoning tasks involving multiple entities and multi-hop inference. The benchmark challeng…

  13. RESEARCH · CL_06263 ·

    MEG-RAG framework improves multimodal evidence selection for LLMs

    Researchers have introduced MEG-RAG, a novel framework designed to improve Multimodal Retrieval-Augmented Generation (MRAG) systems. Current MRAG models often struggle to accurately assess the relevance of retrieved mul…

  14. RESEARCH · CL_06208 ·

    MLLMs improve object grounding in crowded scenes using language-guided semantic cues

    Researchers have developed a new method to improve the robustness of Multimodal Large Language Models (MLLMs) in challenging visual scenarios like crowded scenes. The approach leverages Language-Guided Semantic Cues (LG…

  15. RESEARCH · CL_06209 ·

    New benchmarks and frameworks tackle AI agent limitations in website generation and remote sensing tasks

    Researchers have introduced InteractWeb-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) in website generation tasks. This benchmark simulates real-world conditions where user instruc…

  16. RESEARCH · CL_05105 ·

    Researchers develop DecAF for training-free video reasoning segmentation

    Researchers have developed Decomposed Attention Fusion (DecAF), a novel method for video reasoning segmentation that operates without requiring model retraining. DecAF refines attention maps generated by multimodal larg…

  17. RESEARCH · CL_06302 ·

    New benchmarks SciMDR and ShredBench evaluate multimodal LLMs on scientific documents and reconstruction

    Researchers have introduced ShredBench, a new benchmark designed to evaluate the semantic reasoning abilities of multimodal large language models (MLLMs) in reconstructing documents from shredded fragments. This benchma…

  18. RESEARCH · CL_04920 ·

    New CGC framework boosts multimodal LLMs for fine-grained image understanding

    Researchers have introduced Compositional Grounded Contrast (CGC), a new framework designed to enhance the fine-grained multi-image understanding capabilities of Multimodal Large Language Models (MLLMs). This approach a…

  19. RESEARCH · CL_04921 ·

    MLLMs predict mouse social dominance in novel MTT-Bench benchmark

    Researchers have developed MTT-Bench, a new benchmark for analyzing mouse social dominance using Multimodal Large Language Models (MLLMs). This framework fine-tunes existing MLLM architectures to predict dominance hiera…

  20. RESEARCH · CL_04980 ·

    New benchmark tests MLLMs' Chinese sign language understanding capabilities

    Researchers have developed CNSL-bench, a new benchmark designed to evaluate the sign language understanding capabilities of multimodal large language models (MLLMs). This benchmark is grounded in the official Chinese Na…