Multi-modal Large Language Models
PulseAugur coverage of Multi-modal Large Language Models — every cluster mentioning Multi-modal Large Language Models across labs, papers, and developer communities, ranked by signal.
4 day(s) with sentiment data
-
New 'Ground Then Rank' method boosts knowledge-based visual question answering
Researchers have developed a new framework called "Ground Then Rank" (GTR) to improve Knowledge-Based Visual Question Answering (KB-VQA) performance. This method decouples entity identification from evidence ranking, ad…
-
New benchmarks and methods tackle visual document retrieval challenges
Researchers have developed new methods to improve visual document retrieval, particularly for large collections of similar documents like invoices. One approach, Invoice Haystack, introduces a benchmark designed to stre…
-
New TASM framework boosts MLLM efficiency with structured memory
Researchers have developed a new framework called TASM (Task-Aware Structured Memory) to improve the efficiency of multi-modal large language models (MLLMs). This training-free approach addresses the limitations of curr…
-
New benchmarks and frameworks enhance video temporal grounding
Researchers have introduced new benchmarks and frameworks for improving temporal grounding in long-form videos. One study posits that hour-scale video grounding is primarily a search problem, not a recognition one, and …
-
LLM privacy research tackles Japanese data, multi-modal risks, and DP adaptation
Researchers are exploring privacy risks associated with large language models (LLMs) and their adaptations. One study focuses on detecting sensitive personal information in Japanese pre-training corpora, developing a cl…
-
Survey details LLM and MM-LLM use in transportation operations
A new survey paper explores the application of large language models (LLMs) and multi-modal large language models (MM-LLMs) in transportation systems management and operations. The research synthesizes current studies a…
-
AI agents learn human beliefs and spatial reasoning
Researchers are exploring how AI agents can better understand human beliefs and intentions, particularly in interactive scenarios. One paper proposes a second-order Theory of Mind (ToM-2) framework using I-POMDP to enab…
-
AI research questions video anomaly detection framing
Two new research papers challenge the current direction of video anomaly detection (VAD). The first paper argues that the field's focus on general models and multi-modal large language models (MLLMs) has shifted focus a…