PulseAugur
实时 02:53:12
实体 MLLMs

MLLMs

PulseAugur coverage of MLLMs — every cluster mentioning MLLMs across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
73
90 天内 73
发布 · 30天
0
90 天内 0
论文 · 30天
73
90 天内 73
层级分布 · 90 天
关系
时间线
  1. 2026-05-22 research_milestone A new pipeline was introduced to enhance MLLMs for safety-critical driving video analysis. 来源
  2. 2026-05-22 research_milestone Researchers reveal and propose a method to recover temporal grounding in multimodal large language models. 来源
  3. 2026-05-22 research_milestone A new benchmark and dataset were introduced to evaluate MLLMs' ability to reason about personality beyond superficial cues. 来源
  4. 2026-05-21 research_milestone A new method using MLLMs for detecting AI-generated Chinese poetry achieves state-of-the-art results. 来源
情绪 · 30 天

9 天有情绪数据

最近 · 第 2/4 页 · 共 73 条
  1. TOOL · CL_36926 ·

    New benchmark reveals MLLMs struggle with spatial reasoning

    Researchers have developed PCSR-Bench, a new benchmark designed to evaluate the spatial reasoning capabilities of Multimodal Large Language Models (MLLMs) when processing omnidirectional images. The benchmark, comprisin…

  2. TOOL · CL_27571 ·

    New benchmark EgoMemReason tests AI memory in week-long videos

    Researchers have introduced EgoMemReason, a new benchmark designed to test the memory capabilities of multimodal large language models (MLLMs) and agentic frameworks in understanding long-horizon egocentric videos. The …

  3. TOOL · CL_22498 ·

    New metric evaluates MLLMs for logical consistency without annotations

    Researchers have introduced a new metric, VL-LCM, to evaluate the logical consistency of multimodal large language models (MLLMs) without requiring ground-truth annotations. This metric assesses the cause-effect reasoni…

  4. RESEARCH · CL_22492 ·

    AI研究强调跨文化和非英语语言模型开发中的挑战

    两篇新研究论文强调了为非英语语言和文化开发人工智能的挑战。其中一篇论文回顾了构建阿拉伯语自然语言处理资源的二十年历程,得出结论认为社会和制度因素比语言因素更难克服。另一篇论文介绍了一个基准,用于评估多模态大型语言模型(MLLMs)在不负面影响其在其他文化背景下表现的情况下,适应不同文化的能力。

  5. TOOL · CL_22465 ·

    New research reveals MLLM jailbreaks exploit reconstruction-concealment tradeoff

    Researchers have identified a critical tradeoff in multimodal large language models (MLLMs) related to how harmful queries are concealed and reconstructed. They found that existing methods for transforming harmful input…

  6. TOOL · CL_22437 ·

    Visual Para-Thinker introduces parallel reasoning to multimodal LLMs

    Researchers have introduced Visual Para-Thinker, a novel framework for parallel reasoning in multimodal large language models (MLLMs). This approach shifts from vertical scaling of reasoning depth to a parallel strategy…

  7. TOOL · CL_22420 ·

    New SOW method uses MLLMs to improve image generation coherence

    Researchers have introduced Selective One-Way Diffusion (SOW), a novel approach to image generation that reframes diffusion models for improved contextual coherence. SOW utilizes Multimodal Large Language Models (MLLMs)…

  8. TOOL · CL_22405 ·

    MLLMs enable training-free dense hand contact estimation, outperforming supervised methods

    Researchers have developed ContactPrompt, a novel training-free method for dense hand contact estimation that utilizes multi-modal large language models (MLLMs). This approach addresses challenges in encoding 3D hand ge…

  9. RESEARCH · CL_21787 ·

    New MedHorizon benchmark tests AI's ability to understand long medical videos

    Researchers have introduced MedHorizon, a new benchmark designed to test multimodal large language models (MLLMs) on understanding long-form medical videos. This benchmark includes 759 hours of clinical procedures and 1…

  10. TOOL · CL_20778 ·

    Vision-EKIPL framework boosts MLLM visual reasoning with external knowledge infusion

    Researchers have introduced Vision-EKIPL, a novel reinforcement learning framework designed to enhance visual reasoning in Multimodal Large Language Models (MLLMs). This approach incorporates high-quality actions genera…

  11. TOOL · CL_18628 ·

    New MSEarth benchmark uses MLLMs for Earth science discovery

    Researchers have developed MSEarth, a new multimodal benchmark designed to evaluate the capabilities of multimodal large language models (MLLMs) in Earth science reasoning. This dataset comprises over 289,000 figures wi…

  12. RESEARCH · CL_18678 ·

    New VQA methods enhance explainability and knowledge integration for multimodal LLMs

    Researchers have developed CoExVQA, a new framework for Document Visual Question Answering (DocVQA) that enhances explainability by breaking down the reasoning process. This method first identifies relevant evidence, th…

  13. RESEARCH · CL_18700 ·

    MLLMs show promise in analyzing seizure movements, outperforming traditional models

    A pilot study explored the use of multimodal large language models (MLLMs) for analyzing pathological movements in seizure videos. The research found that MLLMs, without specific training, outperformed traditional compu…

  14. RESEARCH · CL_21948 ·

    New AI unlearning methods balance data removal with model utility

    Researchers have developed new methods for machine unlearning, a process that removes specific data from AI models without full retraining. One approach, SHRED, uses self-distillation and logit demotion to identify and …

  15. TOOL · CL_15945 ·

    New In-Prompt Process Supervision framework enhances MLLMs for video moderation

    Researchers have developed a new framework called IPS (In-Prompt Process Supervision) to enhance the accuracy of multimodal large language models (MLLMs) in content moderation for short videos. This method incorporates …

  16. TOOL · CL_15707 ·

    Researchers use RL to improve MLLM regression on imbalanced data

    Researchers have developed a new framework to improve how multimodal large language models (MLLMs) handle numerical regression tasks, particularly those with imbalanced data distributions. Existing training methods ofte…

  17. RESEARCH · CL_15670 ·

    新的 HERMES 和 DSCache 方法通过 KV 缓存改进流式视频理解

    研究人员开发了新的方法来提高多模态大型语言模型 (MLLM) 理解流式视频的效率。一种方法 HERMES 将 KV 缓存概念化为一个分层内存系统,从而以更少的内存使用量实现更快的处理和更高的准确性。另一种方法 DSCache 将过去和现在的 KV 缓存解耦,并使用位置无关编码来处理无界流,并泛化到比模型训练时更长的序列。

  18. TOOL · CL_15615 ·

    VideoThinker framework improves lightweight MLLMs' video reasoning via causal debiasing

    Researchers have developed VideoThinker, a novel framework designed to enhance the reasoning capabilities of lightweight multimodal language models (MLLMs) in video analysis. This approach addresses the issue of percept…

  19. RESEARCH · CL_15728 ·

    MLLMs show foundational visual gaps despite progress in multimodal reasoning

    A new paper introduces a method to improve latent reasoning in multimodal large language models (MLLMs) by optimizing visual latents at inference time, addressing a pathology where their contribution is suppressed. Sepa…

  20. RESEARCH · CL_15514 ·

    新的基准和模型推动视频中通用时刻检索的进展

    研究人员引入了通用时刻检索(GMR),这是一个视频分析的新框架,它超越了每个查询只有一个匹配时刻的假设。该方法旨在检索所有相关的时域片段,或在没有时刻匹配给定自然语言查询时正确识别出来。为了支持这一点,他们使用足球视频开发了 Soccer-GMR 基准,并提出了两种建模范式:用于现有模型的 GMR 适配器和用于微调多模态大语言模型的 GRPO 奖励。