PulseAugur
EN
LIVE 12:10:14
ENTITY MLLMs

MLLMs

PulseAugur coverage of MLLMs — every cluster mentioning MLLMs across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
145
145 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
145
145 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
TIMELINE
  1. 2026-05-22 research_milestone A new pipeline was introduced to enhance MLLMs for safety-critical driving video analysis. source
  2. 2026-05-22 research_milestone Researchers reveal and propose a method to recover temporal grounding in multimodal large language models. source
  3. 2026-05-22 research_milestone A new benchmark and dataset were introduced to evaluate MLLMs' ability to reason about personality beyond superficial cues. source
  4. 2026-05-21 research_milestone A new method using MLLMs for detecting AI-generated Chinese poetry achieves state-of-the-art results. source
SENTIMENT · 30D

21 day(s) with sentiment data

RECENT · PAGE 1/8 · 145 TOTAL
  1. TOOL · CL_111649 ·

    New paper identifies critical gaps in multimodal LLM evaluation

    A new paper published on arXiv highlights significant gaps in the evaluation of multimodal large language models (MLLMs). The research points out that current benchmarks often focus on isolated tasks and fail to assess …

  2. RESEARCH · CL_111284 ·

    New FlameVQA benchmark tests MLLMs on UAV wildfire intelligence

    Researchers have introduced FlameVQA, a new benchmark designed to improve wildfire monitoring capabilities using Unmanned Aerial Vehicles (UAVs). This benchmark leverages paired RGB and radiometric thermal imagery to en…

  3. RESEARCH · CL_111291 ·

    New EVIS system segments videos by event for improved understanding

    Researchers have developed EVIS, an Event-Aware Instructed Assistant for Referring Video Segmentation. This new method addresses limitations in existing approaches by decomposing videos into distinct events, allowing fo…

  4. RESEARCH · CL_111505 ·

    New SocialPersona benchmark tests MLLMs' ability to infer user preferences from social media

    Researchers have introduced SocialPersona, a new benchmark designed to evaluate the ability of multimodal large language models (MLLMs) to infer user preferences from social media data. The benchmark utilizes longitudin…

  5. RESEARCH · CL_111336 ·

    New DiCoBench benchmark reveals MLLM struggles with high-resolution visual perception

    Researchers have introduced DiCoBench, a new benchmark designed to evaluate the fine-grained perception capabilities of Multimodal Large Language Models (MLLMs) using high-resolution, multi-image inputs. The benchmark f…

  6. TOOL · CL_110022 ·

    New research evaluates MLLMs for assistive AI tasks

    A new paper explores the capabilities of Multimodal Large Language Models (MLLMs) for assistive AI applications. Researchers developed a system called NetraLink, using a GoPro camera to capture egocentric data, and crea…

  7. RESEARCH · CL_111613 ·

    New VIGIL framework combats visual laziness in multimodal LLMs

    Researchers have introduced VIGIL, a novel reinforcement learning framework designed to address "visual laziness" in multimodal large language models (MLLMs). This issue causes MLLMs to generate responses that contradic…

  8. RESEARCH · CL_109506 ·

    New benchmark reveals MLLMs struggle with complex visual reasoning · 2 sources tracked

    A new benchmark called TriViewBench has been developed to assess the structural reasoning capabilities of Multimodal Large Language Models (MLLMs). The benchmark, comprising synthetic 3D scenes with varying object count…

  9. RESEARCH · CL_109634 ·

    New framework uses scene graphs to enable LLMs to reason over long videos

    Researchers have developed a new framework to enable multi-modal large language models (MLLMs) to reason over long-form egocentric videos, overcoming current token limitations. The approach utilizes Egocentric Scene Gra…

  10. RESEARCH · CL_109637 ·

    ShutterMuse: New MLLM offers capture-time photography guidance · 3 sources tracked

    Researchers have introduced ShutterMuse, a multimodal large language model designed to assist with photography during image capture. This model addresses the gap in current benchmarks by providing both composition guida…

  11. RESEARCH · CL_107733 ·

    New benchmarks push video AI to ground answers in temporal evidence · 4 sources tracked

    Two new research papers introduce benchmarks and models for video question answering that focus on temporal reasoning and evidence grounding. The EG-VQA benchmark, with over 11,000 QA pairs and temporal evidence annotat…

  12. RESEARCH · CL_107915 ·

    ForensicsTok uses token generation for precise image tampering localization

    Researchers have introduced ForensicsTok, a novel approach for localizing image tampering by reframing the task as an autoregressive sequence generation problem. This method directly generates token sequences to predict…

  13. RESEARCH · CL_107936 ·

    ActiveScope framework enhances MLLM perception by correcting errors

    Researchers have introduced ActiveScope, a novel training-free framework designed to improve the perception capabilities of Multimodal Large Language Models (MLLMs). This framework addresses limitations in high-resoluti…

  14. TOOL · CL_105100 ·

    New AIR system enhances MLLMs with adaptive code-based numerical reasoning

    Researchers have developed AIR, an Adaptive Interleaved Reasoning system designed to enhance multimodal large language models (MLLMs). This system extends reinforcement learning to enable MLLMs to perform complex numeri…

  15. RESEARCH · CL_105087 ·

    New PIVOTSBench benchmark evaluates MLLMs on interpersonal relationship reasoning

    Researchers have introduced PIVOTSBench, a new benchmark designed to evaluate how well multimodal large language models (MLLMs) can understand and reason about interpersonal relationships. This benchmark, derived from S…

  16. TOOL · CL_100258 ·

    MLLM agents show promise in zero-shot disease diagnosis, but clinical deployment remains distant

    A pilot study published on arXiv explores the capability of multimodal large language models (MLLMs) to distinguish between visually similar diseases in a zero-shot setting. Researchers introduced a multi-agent framewor…

  17. RESEARCH · CL_99522 ·

    ELVA framework tackles "grain blindness" in multimodal retrieval · 2 sources tracked

    Researchers have introduced ELVA, a novel framework designed to address "grain blindness" in Universal Multimodal Retrieval (UMR) systems that utilize Multimodal Large Language Models (MLLMs). Grain blindness occurs whe…

  18. RESEARCH · CL_99584 ·

    New benchmark and method improve MLLM negation comprehension in remote sensing

    Researchers have developed RS-Neg, a new benchmark designed to evaluate and improve the negation comprehension abilities of Multimodal Large Language Models (MLLMs) in remote sensing tasks. Current advanced MLLMs exhibi…

  19. RESEARCH · CL_99810 ·

    SpatialSV framework enhances MLLMs' 3D spatial awareness with interpretable visual supervision

    Researchers have introduced SpatialSV, a novel framework aimed at enhancing the 3D spatial awareness of multimodal large language models (MLLMs). Unlike existing methods that rely on external tools or opaque feature dis…

  20. TOOL · CL_98070 ·

    New attack hijacks MLLMs with single perturbation · arXiv research

    Researchers have developed a novel attack method called Semantic-Aware Hijacking that can compromise Multimodal Large Language Models (MLLMs) with a single adversarial perturbation. This technique, termed Semantic-Aware…