PulseAugur
实时 08:36:15
实体 vision-language model

vision-language model

PulseAugur coverage of vision-language model — every cluster mentioning vision-language model across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
111
90 天内 111
发布 · 30天
0
90 天内 0
论文 · 30天
107
90 天内 107
层级分布 · 90 天
关系
时间线
  1. 2026-05-19 research_milestone A new method is proposed to improve out-of-distribution visual document understanding in VLMs. 来源
情绪 · 30 天

17 天有情绪数据

最近 · 第 4/6 页 · 共 111 条
  1. RESEARCH · CL_26359 ·

    GPT-5 Mini leads Agentick benchmark, but no agent paradigm dominates

    The new Agentick benchmark, which assesses various AI agents across 37 tasks, shows GPT-5 Mini achieving the top score of 0.309. However, no single agent paradigm, including reinforcement learning, LLM, VLM, or hybrid a…

  2. TOOL · CL_25598 ·

    New SAEgis framework detects adversarial attacks on vision-language models

    Researchers have developed a new framework called SAEgis to detect adversarial attacks on vision-language models (VLMs). This method utilizes sparse autoencoders (SAEs) as a plug-and-play module, requiring no additional…

  3. TOOL · CL_22401 ·

    ChartZero uses synthetic data to extract chart data without real-world annotation

    Researchers have developed ChartZero, a novel framework designed to extract data from line charts with zero-shot capabilities. This approach bypasses the need for real-world annotations by training exclusively on synthe…

  4. TOOL · CL_22124 ·

    CompART training improves VLM multi-object grounding and visual understanding

    Researchers have developed a new training method called Compositional Attention-Regularized Training (CompART) to improve how Vision-Language Models (VLMs) handle complex, multi-object references. Current VLMs struggle …

  5. RESEARCH · CL_21791 ·

    GeoStack framework enables efficient VLM knowledge composition, preventing catastrophic forgetting.

    Researchers have developed GeoStack, a novel framework designed to enhance knowledge composition in Vision-Language Models (VLMs). This approach addresses the issue of catastrophic forgetting, where models lose previous…

  6. TOOL · CL_20775 ·

    Consensus Entropy improves VLM OCR accuracy by measuring inter-model agreement

    Researchers have developed a new metric called Consensus Entropy (CE) to assess the reliability of Optical Character Recognition (OCR) outputs from Vision-Language Models (VLMs). CE measures the agreement between multip…

  7. TOOL · CL_20754 ·

    Researchers propose new framework for generative recommendation systems

    Researchers have developed a new framework to improve the generation of Semantic IDs (SIDs) for generative recommendation systems. This approach addresses issues of information and semantic degradation by integrating de…

  8. RESEARCH · CL_20275 ·

    PhysForge generates physics-grounded 3D assets for virtual worlds and embodied AI

    Researchers have introduced PhysForge, a novel framework designed to generate physics-grounded 3D assets for interactive virtual worlds and embodied AI. This system addresses the limitations of existing methods by focus…

  9. RESEARCH · CL_20307 ·

    New AI models InterMesh and Anny-Fit advance 3D human pose and shape recovery

    Researchers have developed InterMesh, a new framework for multi-person human mesh recovery that explicitly incorporates human-environment interaction information. This approach enhances pose and shape estimation by enri…

  10. TOOL · CL_18874 ·

    VLM pipeline enables viewpoint-agnostic grasping for robots with partial observations

    Researchers have developed a new end-to-end pipeline for language-guided grasping that enhances the robustness of mobile manipulators in cluttered environments. This system uses visual-language models (VLMs) and partial…

  11. RESEARCH · CL_18576 ·

    Researchers unveil new stealthy backdoor attacks on AI models using diffusion and style features

    Researchers have developed new methods for backdoor attacks on advanced AI models, specifically targeting Vision-Language Models (VLMs) and Diffusion Models (DMs). One approach, CBV, uses diffusion models to create natu…

  12. RESEARCH · CL_18299 ·

    New GLANCE framework enhances VLM agents with curiosity-driven visual-linguistic exploration

    Researchers have developed a new framework called GLANCE to enhance the exploration capabilities of Visual-Linguistic Model (VLM) agents. This framework aims to improve how these agents navigate complex and partially ob…

  13. TOOL · CL_15782 ·

    New benchmark reveals video models forget long-term context

    Researchers have introduced SceneBench, a new benchmark designed to evaluate video understanding models' ability to retain context over long videos, particularly across different scenes. Their findings indicate that cur…

  14. TOOL · CL_15622 ·

    VISTA benchmark launched for advanced VLM spatio-temporal interaction analysis

    Researchers have introduced VISTA, a new benchmark designed to evaluate the spatio-temporal understanding capabilities of Vision-Language Models (VLMs). Unlike existing benchmarks that focus on simple actions and limite…

  15. TOOL · CL_15616 ·

    Researchers propose Gromov-Wasserstein distance for VLM vision encoder selection

    Researchers have developed a new method for selecting optimal vision encoders for Vision-Language Models (VLMs). Traditional approaches, like choosing encoders with high accuracy or large size, were found to be ineffect…

  16. TOOL · CL_15611 ·

    Chain of Evidence framework enables pixel-level visual attribution for retrieval-augmented generation

    Researchers have developed a new framework called Chain of Evidence (CoE) to improve iterative retrieval-augmented generation (iRAG) systems. CoE utilizes Vision-Language Models to directly analyze screenshots of retrie…

  17. RESEARCH · CL_16299 ·

    Coral and CoRAL systems optimize LLM serving and robotic control

    Researchers have developed two distinct systems named Coral and CoRAL. Coral is an adaptive system designed for cost-efficient serving of multiple large language models across heterogeneous cloud GPUs, aiming to optimiz…

  18. RESEARCH · CL_16304 ·

    Robots gain semantic understanding with VLM and adaptive memory

    Researchers have developed a "Semantic Autonomy Stack" to enable indoor mobile robots to understand natural language instructions, overcoming the latency and memory limitations of current Vision-Language Models (VLMs). …

  19. RESEARCH · CL_14362 ·

    GeoThinker framework actively integrates geometry for advanced spatial reasoning

    Researchers have developed GeoThinker, a novel framework that enhances spatial reasoning in multimodal large language models (MLLMs) by actively integrating geometric information. Unlike previous passive fusion methods,…

  20. RESEARCH · CL_21819 ·

    New benchmarks tackle 'Entity Identity Confusion' in LLM knowledge editing

    Researchers have identified a new failure mode in multimodal knowledge editing called Entity Identity Confusion (EIC), where edited vision-language models incorrectly associate new entity information with original image…