multimodal large language model
PulseAugur coverage of multimodal large language model — every cluster mentioning multimodal large language model across labs, papers, and developer communities, ranked by signal.
7 天有情绪数据
-
New AI methods boost industrial anomaly detection with multimodal data and LLMs
Researchers have developed three new frameworks for industrial anomaly detection using multimodal data and advanced AI techniques. One approach, EAGLE, integrates expert anomaly detectors with frozen multimodal large la…
-
Rethinking Token Pruning for Historical Screenshots in GUI Visual Agents: Semantic, Spatial, and Temporal Perspectives
Researchers have explored token pruning strategies for GUI visual agents that utilize Multimodal Large Language Models (MLLMs). Their study revealed that background regions in screenshots, often overlooked, can provide …
-
MLLMs adapted for nuanced video retrieval, achieving SOTA performance
Researchers have developed a novel method for video retrieval that enhances understanding of nuanced queries. This approach adapts Multimodal Large Language Models (MLLMs) to better interpret temporal actions, negations…
-
New frameworks enhance VLM spatial reasoning with world models and multi-agent systems
Researchers have developed World2VLM, a novel training framework that distills spatial reasoning capabilities from generative world models into vision-language models (VLMs). This approach synthesizes future views to pr…
-
New agent systems self-evolve for image, video generation
Researchers have developed several self-evolving agent systems for complex generative tasks. GenEvolve focuses on image generation by orchestrating tools and distilling visual experience for improved prompt construction…
-
ByteDance unveils Astra, a dual-model AI for advanced robot navigation
ByteDance has introduced Astra, a novel dual-model architecture designed to enhance autonomous robot navigation in complex indoor environments. The system employs a System 1/System 2 approach, with Astra-Global handling…