PulseAugur
LIVE 06:38:39
ENTITY multimodal large language model

multimodal large language model

PulseAugur coverage of multimodal large language model — every cluster mentioning multimodal large language model across labs, papers, and developer communities, ranked by signal.

Total · 30d
25
25 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
25
25 over 90d
TIER MIX · 90D
SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/2 · 22 TOTAL
  1. TOOL · CL_30750 ·

    New MLLM framework unifies surgical scene understanding

    Researchers have developed SurgMLLM, a novel framework that unifies surgical scene understanding by integrating high-level reasoning with low-level visual grounding. This multimodal large language model (MLLM) is fine-t…

  2. TOOL · CL_29245 ·

    AlphaGRPO framework boosts multimodal AI generation with self-reflection

    Researchers have introduced AlphaGRPO, a new framework designed to improve multimodal generation in Unified Multimodal Models (UMMs). This approach uses Group Relative Policy Optimization (GRPO) to enable models to perf…

  3. TOOL · CL_27987 ·

    New MPerS method uses MLLMs for remote sensing scene segmentation

    Researchers have developed MPerS, a novel approach for remote sensing scene segmentation that leverages multimodal large language models (MLLMs). This method generates high-quality captions for remote sensing images usi…

  4. TOOL · CL_25593 ·

    New MLLM WeatherSyn generates weather reports, outperforms existing models

    Researchers have introduced WeatherSyn, a novel instruction-tuned multimodal large language model (MLLM) designed for generating weather forecast reports. This model is trained on a new dataset, , which includes data f…

  5. RESEARCH · CL_22410 ·

    New benchmarks and models advance video understanding reward modeling

    Researchers have developed new methods for training reward models for video understanding tasks, addressing a gap in current AI capabilities. One approach introduces a benchmark called VURB and a dataset VUP-35K, leadin…

  6. TOOL · CL_22442 ·

    Motion-MLLM enhances 3D scene understanding with egomotion data

    Researchers have developed Motion-MLLM, a new framework that integrates egomotion data from Inertial Measurement Units (IMUs) with video to enhance Multimodal Large Language Models (MLLMs) for 3D scene understanding. Th…

  7. TOOL · CL_20769 ·

    RemoteZero framework enables geospatial reasoning without human annotations

    Researchers have introduced RemoteZero, a novel framework designed for geospatial reasoning that eliminates the need for human-annotated ground-truth coordinates. This approach leverages an MLLM's stronger ability to ve…

  8. TOOL · CL_18547 ·

    Valley3 model scales multimodal AI for global e-commerce tasks

    Researchers have introduced Valley3, a new omni multimodal large language model designed for e-commerce applications. This model integrates text, image, video, and audio understanding, with a particular focus on multili…

  9. TOOL · CL_26969 ·

    New ReasonAudio benchmark reveals AI struggles with complex audio reasoning

    Researchers have introduced ReasonAudio, a new benchmark designed to evaluate text-audio retrieval models on complex reasoning tasks beyond simple semantic matching. The benchmark includes 1,000 queries and 1,000 audio …

  10. RESEARCH · CL_15684 ·

    New benchmarks challenge MLLMs' spatial and functional reasoning abilities

    Researchers have introduced new benchmarks to evaluate the spatial and functional reasoning capabilities of multimodal large language models (MLLMs). These benchmarks aim to move beyond basic geometric perception to ass…

  11. RESEARCH · CL_14055 ·

    New AI methods enhance video temporal grounding with MLLMs and graph networks

    Researchers have developed two new frameworks for Temporal Video Grounding (TVG), a task focused on localizing specific moments in videos based on text queries. The MASRA framework utilizes a Multimodal Large Language M…

  12. RESEARCH · CL_14082 ·

    New framework enables scalable video understanding with multi-agent collaboration

    Researchers have introduced a Multi-Agent Collaboration Framework (MACF) designed to enhance the understanding of long videos by multi-modal large language models (MLLMs). MACF addresses the context budget limitations o…

  13. RESEARCH · CL_11705 ·

    MLLM feedback on student drawings shows significant grounding failures

    A new study published on arXiv reveals significant grounding failures in multimodal large language models (MLLMs) when generating feedback on student science drawings. Researchers found that 41.3% of feedback instances …

  14. RESEARCH · CL_11488 ·

    New VeriGround model achieves reliable circuit-to-Verilog code generation

    Researchers have identified a significant reliability issue in multimodal large language models (MLLMs) when generating hardware description language (HDL) code from circuit diagrams. This "Mirage" phenomenon occurs whe…

  15. RESEARCH · CL_08185 ·

    OcularChat MLLM accurately diagnoses age-related macular degeneration with interactive explanations

    Researchers have developed OcularChat, a multimodal large language model (MLLM) fine-tuned from Qwen2.5-VL, designed to diagnose age-related macular degeneration (AMD) using color fundus photographs. The model was train…

  16. RESEARCH · CL_06429 ·

    New AI methods boost industrial anomaly detection with multimodal data and LLMs

    Researchers have developed three new frameworks for industrial anomaly detection using multimodal data and advanced AI techniques. One approach, EAGLE, integrates expert anomaly detectors with frozen multimodal large la…

  17. RESEARCH · CL_06609 ·

    Audio-Omni framework unifies audio generation, editing, and understanding

    Researchers have introduced Audio-Omni, a novel framework designed to unify audio understanding, generation, and editing across diverse domains like speech, music, and general sounds. This system integrates a frozen Mul…

  18. RESEARCH · CL_06582 ·

    Chat-Scene++ advances 3D LLM scene understanding with context-rich object identification

    Researchers have introduced Chat-Scene++, a novel framework designed to enhance multi-modal large language models (MLLMs) for 3D scene understanding. This approach structures 3D scenes as sequences of objects, incorpora…

  19. RESEARCH · CL_05108 ·

    MLLMs adapted for nuanced video retrieval, achieving SOTA performance

    Researchers have developed a novel method for video retrieval that enhances understanding of nuanced queries. This approach adapts Multimodal Large Language Models (MLLMs) to better interpret temporal actions, negations…

  20. RESEARCH · CL_05114 ·

    Rethinking Token Pruning for Historical Screenshots in GUI Visual Agents: Semantic, Spatial, and Temporal Perspectives

    Researchers have explored token pruning strategies for GUI visual agents that utilize Multimodal Large Language Models (MLLMs). Their study revealed that background regions in screenshots, often overlooked, can provide …