PulseAugur
EN
LIVE 11:44:10
ENTITY MLLMs

MLLMs

PulseAugur coverage of MLLMs — every cluster mentioning MLLMs across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
103
103 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
103
103 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
TIMELINE
  1. 2026-05-22 research_milestone A new pipeline was introduced to enhance MLLMs for safety-critical driving video analysis. source
  2. 2026-05-22 research_milestone Researchers reveal and propose a method to recover temporal grounding in multimodal large language models. source
  3. 2026-05-22 research_milestone A new benchmark and dataset were introduced to evaluate MLLMs' ability to reason about personality beyond superficial cues. source
  4. 2026-05-21 research_milestone A new method using MLLMs for detecting AI-generated Chinese poetry achieves state-of-the-art results. source
SENTIMENT · 30D

18 day(s) with sentiment data

RECENT · PAGE 4/6 · 103 TOTAL
  1. TOOL · CL_18628 ·

    New MSEarth benchmark uses MLLMs for Earth science discovery

    Researchers have developed MSEarth, a new multimodal benchmark designed to evaluate the capabilities of multimodal large language models (MLLMs) in Earth science reasoning. This dataset comprises over 289,000 figures wi…

  2. RESEARCH · CL_18678 ·

    New VQA methods enhance explainability and knowledge integration for multimodal LLMs

    Researchers have developed CoExVQA, a new framework for Document Visual Question Answering (DocVQA) that enhances explainability by breaking down the reasoning process. This method first identifies relevant evidence, th…

  3. RESEARCH · CL_18700 ·

    MLLMs show promise in analyzing seizure movements, outperforming traditional models

    A pilot study explored the use of multimodal large language models (MLLMs) for analyzing pathological movements in seizure videos. The research found that MLLMs, without specific training, outperformed traditional compu…

  4. RESEARCH · CL_21948 ·

    New AI unlearning methods balance data removal with model utility

    Researchers have developed new methods for machine unlearning, a process that removes specific data from AI models without full retraining. One approach, SHRED, uses self-distillation and logit demotion to identify and …

  5. TOOL · CL_15945 ·

    New In-Prompt Process Supervision framework enhances MLLMs for video moderation

    Researchers have developed a new framework called IPS (In-Prompt Process Supervision) to enhance the accuracy of multimodal large language models (MLLMs) in content moderation for short videos. This method incorporates …

  6. TOOL · CL_15707 ·

    Researchers use RL to improve MLLM regression on imbalanced data

    Researchers have developed a new framework to improve how multimodal large language models (MLLMs) handle numerical regression tasks, particularly those with imbalanced data distributions. Existing training methods ofte…

  7. RESEARCH · CL_15670 ·

    New HERMES and DSCache methods improve streaming video understanding with KV cache

    Researchers have developed new methods to improve the efficiency of multimodal large language models (MLLMs) for understanding streaming video. One approach, HERMES, conceptualizes the KV cache as a hierarchical memory …

  8. TOOL · CL_15615 ·

    VideoThinker framework improves lightweight MLLMs' video reasoning via causal debiasing

    Researchers have developed VideoThinker, a novel framework designed to enhance the reasoning capabilities of lightweight multimodal language models (MLLMs) in video analysis. This approach addresses the issue of percept…

  9. RESEARCH · CL_15728 ·

    MLLMs show foundational visual gaps despite progress in multimodal reasoning

    A new paper introduces a method to improve latent reasoning in multimodal large language models (MLLMs) by optimizing visual latents at inference time, addressing a pathology where their contribution is suppressed. Sepa…

  10. RESEARCH · CL_15514 ·

    New benchmark and models advance generalized moment retrieval in videos

    Researchers have introduced Generalized Moment Retrieval (GMR), a new framework for video analysis that moves beyond the assumption of a single matching moment per query. This approach aims to retrieve all relevant temp…

  11. RESEARCH · CL_14485 ·

    MLLMs struggle with Chinese short-video misinformation, Gemini-2.5-Pro leads

    Researchers have developed a new framework to evaluate how well Multimodal Large Language Models (MLLMs) can identify misinformation in Chinese short videos. The study utilized a dataset of 200 videos annotated for dece…

  12. RESEARCH · CL_14374 ·

    New AI models tackle complex chart reasoning and generation challenges

    Researchers have developed new frameworks and benchmarks to improve how multimodal large language models (MLLMs) reason across complex visual data, such as charts. One approach, HierVA, uses a hierarchical agent to mana…

  13. RESEARCH · CL_14367 ·

    VideoDetective framework enhances long video understanding for MLLMs

    Researchers have introduced VideoDetective, a novel framework designed to enhance the understanding of long videos by multimodal large language models (MLLMs). This approach addresses the challenge of limited context wi…

  14. RESEARCH · CL_14362 ·

    GeoThinker framework actively integrates geometry for advanced spatial reasoning

    Researchers have developed GeoThinker, a novel framework that enhances spatial reasoning in multimodal large language models (MLLMs) by actively integrating geometric information. Unlike previous passive fusion methods,…

  15. RESEARCH · CL_14352 ·

    FreeRet framework turns multimodal LLMs into training-free retrievers

    Researchers have developed FreeRet, a novel framework that enables multimodal large language models (MLLMs) to function as effective retrievers without requiring additional training. This plug-and-play system extracts s…

  16. RESEARCH · CL_11849 ·

    GuideDog dataset aids blind and low-vision navigation with egocentric multimodal data

    Researchers have introduced GuideDog, a new dataset designed to aid the development of multimodal large language models (MLLMs) for blind and low-vision (BLV) individuals. The dataset comprises 22,000 image-description …

  17. RESEARCH · CL_11777 ·

    New benchmark tackles visual-semantic knowledge conflicts in surgical AI

    Researchers have introduced OR-VSKC, a new benchmark designed to address visual-semantic knowledge conflicts in multimodal large language models (MLLMs) within operating room settings. The benchmark utilizes 28,190 high…

  18. RESEARCH · CL_11343 ·

    New AEGIS benchmark reveals AI image forensics lag behind generative advances

    Researchers have introduced AEGIS, a new benchmark designed to evaluate the forensic analysis of AI-generated academic images. This benchmark addresses domain-specific complexity across seven academic categories and inc…

  19. RESEARCH · CL_11383 ·

    New SPUR benchmark reveals AI models struggle with scientific image interpretation

    Researchers have introduced the SPUR benchmark, designed to evaluate multimodal large language models (MLLMs) on their ability to interpret scientific experimental images. SPUR includes over 4,000 question-answering pai…

  20. RESEARCH · CL_10116 ·

    New STAR-64K dataset and training framework boost MLLM reasoning

    Researchers have developed a new method for training multi-modal large language models (MLLMs) to improve their ability to reason with abstract relational knowledge presented in images. This approach involves an automat…