PulseAugur
EN
LIVE 09:32:18
ENTITY multimodal large language model

multimodal large language model

PulseAugur coverage of multimodal large language model — every cluster mentioning multimodal large language model across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
63
63 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
63
63 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
SENTIMENT · 30D

17 day(s) with sentiment data

RECENT · PAGE 1/4 · 63 TOTAL
  1. TOOL · CL_111792 ·

    DocArena pipeline automates document search agent training environments

    Researchers have developed DocArena, a novel pipeline that automatically transforms raw document collections into training environments for search agents. This system leverages multimodal large language models (MLLMs) f…

  2. RESEARCH · CL_110189 ·

    New TOPS method prunes visual tokens for efficient MLLM inference

    Researchers have developed TOPS, a novel method for pruning visual tokens in multimodal large language models (MLLMs) to improve efficiency. Unlike previous approaches that relied on attention scores or token similarity…

  3. RESEARCH · CL_111339 ·

    New ForeAgent framework advances AI-generated image detection

    Researchers have developed ForeAgent, a novel framework for detecting AI-generated images. This agentic system utilizes a Perception-Verdict architecture that combines multi-view forensic cues with a multimodal large la…

  4. RESEARCH · CL_109631 ·

    New USS framework enhances embodied visual tracking with spatial-semantic prompts

    Researchers have introduced USS, a novel framework for Embodied Visual Tracking (EVT) that moves beyond text-only prompts to incorporate unified spatial-semantic inputs. This approach allows for a more precise indicatio…

  5. RESEARCH · CL_109648 ·

    New FeVOS task and dataset enable predictive video object segmentation

    Researchers have introduced FeVOS, a novel task called Foresight Expression Video Object Segmentation, which requires models to predict future events in videos and identify corresponding objects. This task addresses lim…

  6. TOOL · CL_103176 ·

    ByteDance Seed's SpatialTree framework accepted for CVPR 2026

    ByteDance Seed, in collaboration with academic partners, has introduced SpatialTree, a novel hierarchical framework designed to enhance the spatial intelligence of multimodal large language models (MLLMs). This new fram…

  7. TOOL · CL_99527 ·

    Stellar framework enhances multimodal document retrieval scalability

    Researchers have introduced Stellar, a new framework designed to make multimodal document retrieval more scalable for Natural Language Query (NLQ) systems. Current methods often use multiple token-level embeddings, whic…

  8. TOOL · CL_96285 ·

    BusterX++ MLLM Unifies Image and Video AI-Generated Content Detection

    Researchers have developed BusterX++, a novel multimodal large language model (MLLM) designed for unified detection and explanation of AI-generated content across images and videos. This approach aims to address the gro…

  9. TOOL · CL_94000 ·

    New DPC-VQA framework uses MLLMs for efficient video quality assessment

    Researchers have developed DPC-VQA, a new framework for video quality assessment that leverages multimodal large language models (MLLMs). This approach decouples the perceptual capabilities of a frozen MLLM from a light…

  10. TOOL · CL_93980 ·

    MIRAGE framework enhances image retrieval accuracy and efficiency for MLLMs

    Researchers have introduced MIRAGE, a new framework designed to improve the efficiency and accuracy of multi-vector image retrieval (MVR) within multimodal large language models (MLLMs). MIRAGE addresses limitations in …

  11. TOOL · CL_93959 ·

    MLLMs Enhance Person Re-Identification Through Inference Re-Ranking

    Researchers have developed a novel method for improving person re-identification (Re-ID) in unseen real-world scenarios by leveraging multimodal large language models (MLLMs). Unlike traditional approaches that focus on…

  12. RESEARCH · CL_93088 ·

    UniBrain MLLM advances brain MRI imputation and understanding

    Researchers have introduced UniBrain, a novel multimodal large language model (MLLM) designed for brain magnetic resonance imaging (MRI) analysis. This model addresses the challenges of limited training data and missing…

  13. RESEARCH · CL_86634 ·

    New MACCO Framework Enhances Vision-Language Model Compositionality

    Researchers have developed MACCO, a novel framework designed to improve the compositional understanding of vision-language models (VLMs). MACCO addresses the limitations of existing models, which often struggle with obj…

  14. RESEARCH · CL_84530 ·

    MLLM framework boosts forensic image retrieval accuracy

    Researchers have developed a unified retrieval framework using a multimodal large language model (MLLM) to enhance forensic image analysis. The system generates textual descriptions for images and queries, enabling text…

  15. RESEARCH · CL_84418 ·

    New framework enhances social intelligence reasoning with distilled MLLM

    Researchers have developed a new framework called MODF-SIR, which utilizes a lightweight Multimodal Large Language Model (MLLM) for social intelligence reasoning. The framework enhances both training and inference throu…

  16. TOOL · CL_80232 ·

    HDRAgent uses LLMs for adaptive HDR imaging

    Researchers have introduced HDRAgent, a novel framework for High Dynamic Range (HDR) imaging that utilizes an agent-driven approach to adaptively select reconstruction strategies. This method aims to mitigate ghosting a…

  17. TOOL · CL_79965 ·

    New SMART framework enhances video moment retrieval with audio and shot-aware compression

    Researchers have developed SMART, a new framework for video moment retrieval that enhances multimodal understanding by integrating audio cues with visual information. This approach utilizes a Multimodal Large Language M…

  18. RESEARCH · CL_79121 ·

    New benchmark CoVEBench tests complex video editing AI

    Researchers have introduced CoVEBench, a new benchmark designed to evaluate the capabilities of text-guided video editing models. This benchmark addresses the limitations of existing models that struggle with complex, m…

  19. RESEARCH · CL_76917 ·

    New benchmark tackles privacy blind spots in AI image editing

    Researchers have introduced SPPE, a new benchmark for evaluating privacy-preserving image editing in Multimodal Large Language Models (MLLMs). This benchmark addresses the issue where standard privacy methods often resu…

  20. RESEARCH · CL_68200 ·

    New benchmark WebRISE tests MLLM-generated web artifacts

    Researchers have developed WebRISE, a new benchmark for evaluating Multi-modal Large Language Models (MLLMs) that generate web artifacts. Unlike previous methods, WebRISE focuses on requirement-induced states and transi…