LMMs
PulseAugur coverage of LMMs — every cluster mentioning LMMs across labs, papers, and developer communities, ranked by signal.
-
UnAC method enhances LMMs for complex multimodal reasoning with adaptive prompting
Researchers have introduced UnAC, a novel multimodal prompting method designed to enhance the reasoning capabilities of Large Multimodal Models (LMMs) on complex visual tasks. This method employs adaptive visual prompti…
-
New CSteer method guides large multimodal models to refer multiple regions without fine-tuning
Researchers have developed a new training-free method called Contextual Latent Steering (CSteer) to enhance the ability of Large Multimodal Models (LMMs) to accurately identify and refer to multiple specific regions wit…
-
Researchers develop Glance-or-Gaze to improve LMM visual search with adaptive focus
Researchers have introduced Glance-or-Gaze (GoG), a new framework designed to improve Large Multimodal Models (LMMs) in handling knowledge-intensive visual queries. Unlike previous methods that retrieve information indi…
-
新基准UNIKIE-BENCH评估大模型在文档信息提取方面的能力
研究人员推出了UNIKIE-BENCH,这是一个旨在系统评估大语言多模态模型(LMMs)从视觉文档中提取关键信息性能的新基准。该基准包含两个赛道:一个用于具有预定义模式的约束类别KIE,另一个用于开放类别KIE。使用15个最先进的LMMs进行的实验突显了在处理不同模式、长尾信息和复杂布局时性能显著下降,表明LMMs在该领域的准确性和推理能力仍面临挑战。