PulseAugur
实时 01:58:05
实体 multimodal large language model

multimodal large language model

PulseAugur coverage of multimodal large language model — every cluster mentioning multimodal large language model across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
26
90 天内 26
发布 · 30天
0
90 天内 0
论文 · 30天
26
90 天内 26
层级分布 · 90 天
情绪 · 30 天

7 天有情绪数据

最近 · 第 2/2 页 · 共 26 条
  1. RESEARCH · CL_06429 ·

    New AI methods boost industrial anomaly detection with multimodal data and LLMs

    Researchers have developed three new frameworks for industrial anomaly detection using multimodal data and advanced AI techniques. One approach, EAGLE, integrates expert anomaly detectors with frozen multimodal large la…

  2. RESEARCH · CL_05114 ·

    Rethinking Token Pruning for Historical Screenshots in GUI Visual Agents: Semantic, Spatial, and Temporal Perspectives

    Researchers have explored token pruning strategies for GUI visual agents that utilize Multimodal Large Language Models (MLLMs). Their study revealed that background regions in screenshots, often overlooked, can provide …

  3. RESEARCH · CL_05108 ·

    MLLMs adapted for nuanced video retrieval, achieving SOTA performance

    Researchers have developed a novel method for video retrieval that enhances understanding of nuanced queries. This approach adapts Multimodal Large Language Models (MLLMs) to better interpret temporal actions, negations…

  4. RESEARCH · CL_02944 ·

    New frameworks enhance VLM spatial reasoning with world models and multi-agent systems

    Researchers have developed World2VLM, a novel training framework that distills spatial reasoning capabilities from generative world models into vision-language models (VLMs). This approach synthesizes future views to pr…

  5. RESEARCH · CL_44107 ·

    New agent systems self-evolve for image, video generation

    Researchers have developed several self-evolving agent systems for complex generative tasks. GenEvolve focuses on image generation by orchestrating tools and distilling visual experience for improved prompt construction…

  6. RESEARCH · CL_05787 ·

    ByteDance unveils Astra, a dual-model AI for advanced robot navigation

    ByteDance has introduced Astra, a novel dual-model architecture designed to enhance autonomous robot navigation in complex indoor environments. The system employs a System 1/System 2 approach, with Astra-Global handling…