multimodal large language model
PulseAugur coverage of multimodal large language model — every cluster mentioning multimodal large language model across labs, papers, and developer communities, ranked by signal.
17 day(s) with sentiment data
-
DocArena pipeline automates document search agent training environments
Researchers have developed DocArena, a novel pipeline that automatically transforms raw document collections into training environments for search agents. This system leverages multimodal large language models (MLLMs) f…
-
New TOPS method prunes visual tokens for efficient MLLM inference
Researchers have developed TOPS, a novel method for pruning visual tokens in multimodal large language models (MLLMs) to improve efficiency. Unlike previous approaches that relied on attention scores or token similarity…
-
New ForeAgent framework advances AI-generated image detection
Researchers have developed ForeAgent, a novel framework for detecting AI-generated images. This agentic system utilizes a Perception-Verdict architecture that combines multi-view forensic cues with a multimodal large la…
-
New USS framework enhances embodied visual tracking with spatial-semantic prompts
Researchers have introduced USS, a novel framework for Embodied Visual Tracking (EVT) that moves beyond text-only prompts to incorporate unified spatial-semantic inputs. This approach allows for a more precise indicatio…
-
New FeVOS task and dataset enable predictive video object segmentation
Researchers have introduced FeVOS, a novel task called Foresight Expression Video Object Segmentation, which requires models to predict future events in videos and identify corresponding objects. This task addresses lim…
-
ByteDance Seed's SpatialTree framework accepted for CVPR 2026
ByteDance Seed, in collaboration with academic partners, has introduced SpatialTree, a novel hierarchical framework designed to enhance the spatial intelligence of multimodal large language models (MLLMs). This new fram…
-
Stellar framework enhances multimodal document retrieval scalability
Researchers have introduced Stellar, a new framework designed to make multimodal document retrieval more scalable for Natural Language Query (NLQ) systems. Current methods often use multiple token-level embeddings, whic…
-
BusterX++ MLLM Unifies Image and Video AI-Generated Content Detection
Researchers have developed BusterX++, a novel multimodal large language model (MLLM) designed for unified detection and explanation of AI-generated content across images and videos. This approach aims to address the gro…
-
New DPC-VQA framework uses MLLMs for efficient video quality assessment
Researchers have developed DPC-VQA, a new framework for video quality assessment that leverages multimodal large language models (MLLMs). This approach decouples the perceptual capabilities of a frozen MLLM from a light…
-
MIRAGE framework enhances image retrieval accuracy and efficiency for MLLMs
Researchers have introduced MIRAGE, a new framework designed to improve the efficiency and accuracy of multi-vector image retrieval (MVR) within multimodal large language models (MLLMs). MIRAGE addresses limitations in …
-
MLLMs Enhance Person Re-Identification Through Inference Re-Ranking
Researchers have developed a novel method for improving person re-identification (Re-ID) in unseen real-world scenarios by leveraging multimodal large language models (MLLMs). Unlike traditional approaches that focus on…
-
UniBrain MLLM advances brain MRI imputation and understanding
Researchers have introduced UniBrain, a novel multimodal large language model (MLLM) designed for brain magnetic resonance imaging (MRI) analysis. This model addresses the challenges of limited training data and missing…
-
New MACCO Framework Enhances Vision-Language Model Compositionality
Researchers have developed MACCO, a novel framework designed to improve the compositional understanding of vision-language models (VLMs). MACCO addresses the limitations of existing models, which often struggle with obj…
-
MLLM framework boosts forensic image retrieval accuracy
Researchers have developed a unified retrieval framework using a multimodal large language model (MLLM) to enhance forensic image analysis. The system generates textual descriptions for images and queries, enabling text…
-
New framework enhances social intelligence reasoning with distilled MLLM
Researchers have developed a new framework called MODF-SIR, which utilizes a lightweight Multimodal Large Language Model (MLLM) for social intelligence reasoning. The framework enhances both training and inference throu…
-
HDRAgent uses LLMs for adaptive HDR imaging
Researchers have introduced HDRAgent, a novel framework for High Dynamic Range (HDR) imaging that utilizes an agent-driven approach to adaptively select reconstruction strategies. This method aims to mitigate ghosting a…
-
New SMART framework enhances video moment retrieval with audio and shot-aware compression
Researchers have developed SMART, a new framework for video moment retrieval that enhances multimodal understanding by integrating audio cues with visual information. This approach utilizes a Multimodal Large Language M…
-
New benchmark CoVEBench tests complex video editing AI
Researchers have introduced CoVEBench, a new benchmark designed to evaluate the capabilities of text-guided video editing models. This benchmark addresses the limitations of existing models that struggle with complex, m…
-
New benchmark tackles privacy blind spots in AI image editing
Researchers have introduced SPPE, a new benchmark for evaluating privacy-preserving image editing in Multimodal Large Language Models (MLLMs). This benchmark addresses the issue where standard privacy methods often resu…
-
New benchmark WebRISE tests MLLM-generated web artifacts
Researchers have developed WebRISE, a new benchmark for evaluating Multi-modal Large Language Models (MLLMs) that generate web artifacts. Unlike previous methods, WebRISE focuses on requirement-induced states and transi…