PulseAugur
实时 21:13:47
实体 MLLMs

MLLMs

PulseAugur coverage of MLLMs — every cluster mentioning MLLMs across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
73
90 天内 73
发布 · 30天
0
90 天内 0
论文 · 30天
73
90 天内 73
层级分布 · 90 天
关系
时间线
  1. 2026-05-22 research_milestone A new pipeline was introduced to enhance MLLMs for safety-critical driving video analysis. 来源
  2. 2026-05-22 research_milestone Researchers reveal and propose a method to recover temporal grounding in multimodal large language models. 来源
  3. 2026-05-22 research_milestone A new benchmark and dataset were introduced to evaluate MLLMs' ability to reason about personality beyond superficial cues. 来源
  4. 2026-05-21 research_milestone A new method using MLLMs for detecting AI-generated Chinese poetry achieves state-of-the-art results. 来源
情绪 · 30 天

9 天有情绪数据

最近 · 第 1/4 页 · 共 73 条
  1. TOOL · CL_49280 ·

    New framework AKT-Rec improves e-commerce recommendations using LLM-generated IDs

    Researchers have developed a new framework called AKT-Rec to address challenges in long-tail recommendation systems, particularly those in e-commerce platforms with significant data imbalance. This framework utilizes mu…

  2. RESEARCH · CL_45045 ·

    New methods and benchmarks boost MLLM visual grounding

    Researchers have developed new methods to improve visual grounding in multimodal large language models (MLLMs). One approach, PGT, uses procedurally generated tasks with geometric primitives to provide denser supervisio…

  3. TOOL · CL_45094 ·

    SkeletonLLM enables LLMs to process human skeleton data

    Researchers have developed SkeletonLLM, a novel approach to enable multimodal large language models (MLLMs) to understand structured, non-visual data like human skeletons. The system uses DrAction, a differentiable rend…

  4. TOOL · CL_45081 ·

    New benchmark reveals perception, spatiotemporal modeling as MLLM weaknesses

    Researchers have introduced BEAR, a new benchmark designed to evaluate and diagnose the skill-level capabilities of embodied multimodal large language models (MLLMs). This benchmark decomposes embodied tasks into 14 dis…

  5. TOOL · CL_45070 ·

    New ST-SimDiff framework boosts MLLM video processing efficiency

    Researchers have developed ST-SimDiff, a novel framework designed to make multimodal large language models (MLLMs) more efficient at processing long videos. The method addresses the computational burden by focusing on b…

  6. TOOL · CL_45035 ·

    MLLMs struggle with video timing; new method recovers temporal grounding

    Researchers have identified a temporal grounding issue in multimodal large language models (MLLMs) where the models understand event timing during an initial phase but lose this signal during answer generation. They dis…

  7. TOOL · CL_44979 ·

    New MapTab benchmark tests multimodal LLMs on complex route planning

    Researchers have introduced MapTab, a new benchmark designed to evaluate the multi-criteria reasoning abilities of multimodal large language models (MLLMs). This benchmark utilizes route planning tasks that combine visu…

  8. TOOL · CL_44952 ·

    New pipeline enhances LLMs for safety-critical driving analysis

    Researchers have developed a new pipeline to improve the ability of multimodal large language models (MLLMs) to analyze safety-critical driving events. This pipeline fuses downsampled video frames with telematics data a…

  9. RESEARCH · CL_43971 ·

    AI-generated Chinese poetry detected using image-semantic method

    Researchers have developed a novel method for detecting AI-generated modern Chinese poetry by integrating image semantics with text analysis. This approach uses images related to the poem's content to provide complement…

  10. TOOL · CL_43934 ·

    New benchmark evaluates human and LLM text-to-image prompting skills

    Researchers have introduced AtelierEval, a novel benchmark designed to evaluate the proficiency of both humans and multimodal large language models (MLLMs) in generating effective text-to-image prompts. This benchmark, …

  11. RESEARCH · CL_45069 ·

    MLLMs show prejudice gap in personality assessments, new benchmark reveals

    Researchers have introduced a new benchmark and dataset called MM-OCEAN to evaluate how well multimodal large language models (MLLMs) can reason about personality. The study found that a significant portion of MLLMs, ov…

  12. RESEARCH · CL_44007 ·

    LatentOmni framework unifies audio-visual reasoning for omnimodal understanding

    Researchers have introduced LatentOmni, a novel framework designed to enhance omnimodal understanding by unifying audio-visual reasoning within a latent space. This approach aims to overcome limitations in current multi…

  13. TOOL · CL_41890 ·

    TextSculptor framework advances scene text editing with new dataset and benchmark

    Researchers have introduced TextSculptor, a new framework designed to improve scene text editing in images. This framework includes an automated data construction pipeline that generates a large dataset of 3.2 million s…

  14. RESEARCH · CL_41749 ·

    New methods tackle AI hallucinations in research and medical Q&A

    Two new research papers address the critical issue of AI hallucinations in different domains. One paper introduces ACL-Verbatim, an extractive question-answering system designed to provide hallucination-free answers fro…

  15. RESEARCH · CL_44092 ·

    New methods boost video diffusion model efficiency and quality

    Researchers have developed several new techniques to improve video diffusion models, focusing on efficiency and quality. One approach, LocalDPO, optimizes alignment at a localized spatio-temporal region level for better…

  16. TOOL · CL_46843 ·

    New benchmark EgoCoT-Bench tests MLLM reasoning in egocentric video

    Researchers have introduced EgoCoT-Bench, a new benchmark designed to evaluate the reasoning capabilities of Multimodal Large Language Models (MLLMs) when processing egocentric video data. This benchmark specifically fo…

  17. RESEARCH · CL_38223 ·

    New ESI-Bench benchmark tests AI agents' active spatial reasoning

    Researchers have introduced ESI-Bench, a new benchmark designed to evaluate embodied spatial intelligence in AI agents. This benchmark focuses on the perception-action loop, where agents actively explore their environme…

  18. TOOL · CL_38243 ·

    New CrossView Suite enhances multimodal models' spatial reasoning

    Researchers have introduced the CrossView Suite, a comprehensive framework designed to enhance the spatial reasoning capabilities of multimodal large language models (MLLMs). This suite addresses limitations in cross-vi…

  19. RESEARCH · CL_37979 ·

    New image tokenization methods boost MLLM performance

    Two new research papers propose novel methods for tokenizing images to improve multimodal large language models (MLLMs). The first paper, VFMTok, uses a frozen vision foundation model as a tokenizer, achieving significa…

  20. RESEARCH · CL_43941 ·

    New benchmarks VGenST-Bench and CaST-Bench target MLLM spatio-temporal reasoning

    Researchers have introduced two new benchmarks, VGenST-Bench and CaST-Bench, designed to more rigorously evaluate the spatio-temporal reasoning capabilities of Multimodal Large Language Models (MLLMs) and Vision-Languag…