PulseAugur
实时 01:00:07
实体 multimodal large language model

multimodal large language model

PulseAugur coverage of multimodal large language model — every cluster mentioning multimodal large language model across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
26
90 天内 26
发布 · 30天
0
90 天内 0
论文 · 30天
26
90 天内 26
层级分布 · 90 天
情绪 · 30 天

7 天有情绪数据

最近 · 第 1/2 页 · 共 26 条
  1. RESEARCH · CL_41904 ·

    FruitEnsemble uses MLLM to boost fruit classification accuracy

    Researchers have developed FruitEnsemble, a novel framework for fine-grained fruit classification that addresses challenges like limited datasets and visual similarity between fruit types. The system utilizes a two-stag…

  2. RESEARCH · CL_41910 ·

    OSGNet and MLLM win Ego4D Episodic Memory Challenge

    Researchers have developed a novel approach for the Ego4D Episodic Memory Challenge, achieving first place in both the Natural Language Queries and GoalStep tracks. Their method combines the OSGNet localization model wi…

  3. TOOL · CL_36043 ·

    EndoGSim uses MLLMs for physics-aware surgical simulation

    Researchers have developed EndoGSim, a new framework for simulating dynamic endoscopic scenes in robot-assisted surgery. This system uses Multi-modal Large Language Models (MLLMs) to guide Gaussian Splatting, enabling p…

  4. TOOL · CL_30750 ·

    New MLLM framework unifies surgical scene understanding

    Researchers have developed SurgMLLM, a novel framework that unifies surgical scene understanding by integrating high-level reasoning with low-level visual grounding. This multimodal large language model (MLLM) is fine-t…

  5. TOOL · CL_29245 ·

    AlphaGRPO framework boosts multimodal AI generation with self-reflection

    Researchers have introduced AlphaGRPO, a new framework designed to improve multimodal generation in Unified Multimodal Models (UMMs). This approach uses Group Relative Policy Optimization (GRPO) to enable models to perf…

  6. TOOL · CL_27987 ·

    New MPerS method uses MLLMs for remote sensing scene segmentation

    Researchers have developed MPerS, a novel approach for remote sensing scene segmentation that leverages multimodal large language models (MLLMs). This method generates high-quality captions for remote sensing images usi…

  7. TOOL · CL_25593 ·

    New MLLM WeatherSyn generates weather reports, outperforms existing models

    Researchers have introduced WeatherSyn, a novel instruction-tuned multimodal large language model (MLLM) designed for generating weather forecast reports. This model is trained on a new dataset, , which includes data f…

  8. TOOL · CL_22442 ·

    Motion-MLLM enhances 3D scene understanding with egomotion data

    Researchers have developed Motion-MLLM, a new framework that integrates egomotion data from Inertial Measurement Units (IMUs) with video to enhance Multimodal Large Language Models (MLLMs) for 3D scene understanding. Th…

  9. RESEARCH · CL_22410 ·

    New benchmarks and models advance video understanding reward modeling

    Researchers have developed new methods for training reward models for video understanding tasks, addressing a gap in current AI capabilities. One approach introduces a benchmark called VURB and a dataset VUP-35K, leadin…

  10. TOOL · CL_20769 ·

    RemoteZero framework enables geospatial reasoning without human annotations

    Researchers have introduced RemoteZero, a novel framework designed for geospatial reasoning that eliminates the need for human-annotated ground-truth coordinates. This approach leverages an MLLM's stronger ability to ve…

  11. TOOL · CL_18547 ·

    Valley3 model scales multimodal AI for global e-commerce tasks

    Researchers have introduced Valley3, a new omni multimodal large language model designed for e-commerce applications. This model integrates text, image, video, and audio understanding, with a particular focus on multili…

  12. TOOL · CL_26969 ·

    New ReasonAudio benchmark reveals AI struggles with complex audio reasoning

    Researchers have introduced ReasonAudio, a new benchmark designed to evaluate text-audio retrieval models on complex reasoning tasks beyond simple semantic matching. The benchmark includes 1,000 queries and 1,000 audio …

  13. RESEARCH · CL_15684 ·

    New benchmarks challenge MLLMs' spatial and functional reasoning abilities

    Researchers have introduced new benchmarks to evaluate the spatial and functional reasoning capabilities of multimodal large language models (MLLMs). These benchmarks aim to move beyond basic geometric perception to ass…

  14. RESEARCH · CL_14055 ·

    New AI methods enhance video temporal grounding with MLLMs and graph networks

    Researchers have developed two new frameworks for Temporal Video Grounding (TVG), a task focused on localizing specific moments in videos based on text queries. The MASRA framework utilizes a Multimodal Large Language M…

  15. RESEARCH · CL_14082 ·

    New framework enables scalable video understanding with multi-agent collaboration

    Researchers have introduced a Multi-Agent Collaboration Framework (MACF) designed to enhance the understanding of long videos by multi-modal large language models (MLLMs). MACF addresses the context budget limitations o…

  16. RESEARCH · CL_11705 ·

    MLLM feedback on student drawings shows significant grounding failures

    A new study published on arXiv reveals significant grounding failures in multimodal large language models (MLLMs) when generating feedback on student science drawings. Researchers found that 41.3% of feedback instances …

  17. RESEARCH · CL_11488 ·

    New VeriGround model achieves reliable circuit-to-Verilog code generation

    Researchers have identified a significant reliability issue in multimodal large language models (MLLMs) when generating hardware description language (HDL) code from circuit diagrams. This "Mirage" phenomenon occurs whe…

  18. RESEARCH · CL_08185 ·

    OcularChat MLLM accurately diagnoses age-related macular degeneration with interactive explanations

    Researchers have developed OcularChat, a multimodal large language model (MLLM) fine-tuned from Qwen2.5-VL, designed to diagnose age-related macular degeneration (AMD) using color fundus photographs. The model was train…

  19. RESEARCH · CL_06609 ·

    Audio-Omni framework unifies audio generation, editing, and understanding

    Researchers have introduced Audio-Omni, a novel framework designed to unify audio understanding, generation, and editing across diverse domains like speech, music, and general sounds. This system integrates a frozen Mul…

  20. RESEARCH · CL_06582 ·

    Chat-Scene++ advances 3D LLM scene understanding with context-rich object identification

    Researchers have introduced Chat-Scene++, a novel framework designed to enhance multi-modal large language models (MLLMs) for 3D scene understanding. This approach structures 3D scenes as sequences of objects, incorpora…