PulseAugur
EN
LIVE 10:48:39
ENTITY Large Multimodal Models

Large Multimodal Models

PulseAugur coverage of Large Multimodal Models — every cluster mentioning Large Multimodal Models across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
24
24 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
23
23 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/2 · 24 TOTAL
  1. RESEARCH · CL_111283 ·

    New HarmVideoBench evaluates LLMs on nuanced harmful video understanding · 2 sources tracked

    Researchers have introduced HarmVideoBench, a new benchmark designed to evaluate the harmful video understanding capabilities of large vision-language models (LVLMs). Existing benchmarks often oversimplify harmful conte…

  2. TOOL · CL_108142 ·

    New LMM 'PreciseDoc' Enhances Document Element Grounding Accuracy

    Researchers have developed PreciseDoc, a new Large Multimodal Model (LMM) designed to improve the accuracy of grounding specific elements within documents. Existing models struggle with precise localization in text-heav…

  3. RESEARCH · CL_86751 ·

    PRISMR framework enhances LMMs for multimodal listwise ranking

    Researchers have developed PRISMR, a new framework designed to improve the performance of Large Multimodal Models (LMMs) in listwise ranking tasks, particularly in long-context scenarios. PRISMR addresses a failure mode…

  4. TOOL · CL_80226 ·

    New method uses location attention and LMMs for worldwide image geo-localization

    Researchers have developed TransGeoCLIP, a new framework for worldwide image geo-localization that uses a location attention mechanism and large multimodal models. This method aims to improve accuracy by distinguishing …

  5. TOOL · CL_65649 ·

    Researchers isolate visual relation vectors in LMMs

    Researchers have identified specific attention heads within Large Multimodal Models (LMMs) that are crucial for processing visual relations. By extracting and manipulating these "function vectors," they can improve the …

  6. TOOL · CL_53658 ·

    New Benchmark Reveals LMMs Struggle with Real-World High School Exams

    A new benchmark called LiveK12Bench has been developed to assess the capabilities of Large Multimodal Models (LMMs) in high school-level examinations. This dynamic, multi-disciplinary benchmark includes over 2,000 quest…

  7. TOOL · CL_51150 ·

    New benchmark M3-Verse tests LMMs on dynamic video scene changes

    Researchers have introduced M3-Verse, a new benchmark designed to test large multimodal models (LMMs) on their ability to understand dynamic changes in video scenes. The benchmark features paired videos of indoor scenes…

  8. RESEARCH · CL_53650 ·

    New Benchmark Tests LMMs' Creative Physical Intelligence

    Researchers have developed MM-CreativityBench, a new benchmark designed to evaluate the creative physical intelligence of large multimodal models (LMMs). The benchmark focuses on the ability of LMMs to identify and repu…

  9. TOOL · CL_45083 ·

    LongVT framework enhances AI video reasoning with tool-calling

    Researchers have developed LongVT, a new framework designed to improve how large multimodal models (LMMs) process and reason about long videos. This approach mimics human comprehension by first skimming the entire video…

  10. TOOL · CL_41291 ·

    AWS Strands Evals adds multimodal judges for image-to-text tasks

    Amazon Web Services has introduced new multimodal evaluators for its Strands Evals SDK, designed to assess image-to-text tasks. These tools leverage large multimodal models (MLMMs) to judge responses by directly referen…

  11. RESEARCH · CL_40792 ·

    AI research tackles temporal grounding for AVs and video analysis

    Two new research papers explore methods to improve temporal grounding in AI systems, particularly for autonomous vehicles and video analysis. The first paper, "From Prompts to Pavement Through Time," investigates tempor…

  12. TOOL · CL_49337 ·

    New AQuaUI method slashes GUI agent visual tokens

    Researchers have developed AQuaUI, a novel method to reduce the number of visual tokens processed by Large Multimodal Models (LMMs) when interacting with graphical user interfaces (GUIs). This training-free technique co…

  13. RESEARCH · CL_37942 ·

    New benchmarks and synthetic data aim to boost AI's egocentric video understanding

    Researchers have introduced new benchmarks and synthetic data generation methods to improve the performance of large multimodal models (LMMs) on egocentric video data. The EgoBabyVLM benchmark focuses on language ground…

  14. RESEARCH · CL_36070 ·

    New research explores synergy between visual understanding and generation in multimodal models

    Researchers are exploring new methods to improve unified multimodal models (UMMs) by enhancing the synergy between visual understanding and generation. One approach, Semantic Generative Tuning (SGT), uses image segmenta…

  15. TOOL · CL_30558 ·

    New FIKA-Bench tests AI knowledge acquisition beyond visual recognition

    Researchers have introduced FIKA-Bench, a new benchmark designed to evaluate the ability of AI systems to acquire knowledge about unfamiliar objects, moving beyond simple visual recognition. The benchmark consists of 31…

  16. RESEARCH · CL_27969 ·

    New benchmarks reveal major gaps in multimodal context learning for LLMs

    Two new benchmarks, MMCL-Bench and Personal-VCL-Bench, have been introduced to evaluate the multimodal context learning capabilities of large language models. MMCL-Bench focuses on learning from visual rules, procedures…

  17. TOOL · CL_28006 ·

    New method enhances LMM spatial reasoning with generated viewpoints

    Researchers have introduced a new paradigm called Thinking with Novel Views (TwNV) to enhance the spatial reasoning capabilities of Large Multimodal Models (LMMs). This approach integrates generative novel-view synthesi…

  18. TOOL · CL_25781 ·

    New LithoBench benchmark reveals large multimodal model limitations

    Researchers have introduced LithoBench, a new benchmark designed to evaluate the capabilities of large multimodal models in interpreting geological lithology from remote sensing data. This benchmark includes 10,000 expe…

  19. RESEARCH · CL_18242 ·

    New CC-OCR V2 benchmark reveals LMMs fall short in real-world document processing

    A new benchmark, CC-OCR V2, has been released to evaluate Large Multimodal Models (LMMs) on real-world document processing tasks. The benchmark includes 7,093 challenging samples across five OCR-centric tracks, addressi…

  20. TOOL · CL_15665 ·

    New CSteer method guides large multimodal models to refer multiple regions without fine-tuning

    Researchers have developed a new training-free method called Contextual Latent Steering (CSteer) to enhance the ability of Large Multimodal Models (LMMs) to accurately identify and refer to multiple specific regions wit…