PulseAugur
实时 21:20:50
实体 vision-language model

vision-language model

PulseAugur coverage of vision-language model — every cluster mentioning vision-language model across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
110
90 天内 110
发布 · 30天
0
90 天内 0
论文 · 30天
106
90 天内 106
层级分布 · 90 天
关系
时间线
  1. 2026-05-19 research_milestone A new method is proposed to improve out-of-distribution visual document understanding in VLMs. 来源
情绪 · 30 天

16 天有情绪数据

最近 · 第 3/6 页 · 共 110 条
  1. TOOL · CL_37996 ·

    New framework exposes counting bias in Vision-Language Models

    Researchers have developed CounterCount, a new framework designed to diagnose counting biases in Vision-Language Models (VLMs). The framework uses paired factual and counterfactual images to test whether VLMs rely on vi…

  2. TOOL · CL_38011 ·

    GraSP-VL method unlocks semantic granularity in vision-language embeddings

    Researchers have developed GraSP-VL, a method to better utilize frozen vision-language model (VLM) embeddings by treating their length as a semantic interface. This approach learns a shared prefix transform that allows …

  3. RESEARCH · CL_43941 ·

    New benchmarks VGenST-Bench and CaST-Bench target MLLM spatio-temporal reasoning

    Researchers have introduced two new benchmarks, VGenST-Bench and CaST-Bench, designed to more rigorously evaluate the spatio-temporal reasoning capabilities of Multimodal Large Language Models (MLLMs) and Vision-Languag…

  4. TOOL · CL_36541 ·

    New rubric assesses VLM adaptivity in math education

    Researchers have developed a new rubric to assess the adaptivity of Vision Language Models (VLMs) in mathematics education. The rubric evaluates VLMs based on cognitive and motivational aspects, as well as response corr…

  5. TOOL · CL_36046 ·

    New framework unifies CT image analysis with language-guided reasoning

    Researchers have developed a unified framework that integrates language-guided visual reasoning for CT image interpretation. This autoregressive model uses task-routing tokens to trigger detection and segmentation heads…

  6. TOOL · CL_36058 ·

    DepthVLM enables vision-language models to predict dense depth maps

    Researchers have developed DepthVLM, a new framework that enables Vision-Language Models (VLMs) to predict dense metric depth maps from single images. Unlike previous methods that relied on external models or inefficien…

  7. TOOL · CL_36564 ·

    DeltaPrompts boosts VLM reasoning by targeting model capability gaps

    Researchers have introduced DeltaPrompts, a new method to improve the distillation of knowledge into smaller Vision-Language Models (VLMs). They identified that many existing prompts provide minimal learning signals bec…

  8. TOOL · CL_33402 ·

    ICED framework enables concept-level unlearning in Vision-Language Models

    Researchers have developed a new machine unlearning framework called ICED for Vision-Language Models (VLMs). This method allows for the precise removal of specific concepts from a VLM's knowledge without impacting unrel…

  9. TOOL · CL_31314 ·

    RoboEvolve framework boosts robotic manipulation with co-evolving AI

    Researchers have developed RoboEvolve, a new framework designed to improve robotic manipulation capabilities by addressing the scarcity of training data. This system co-evolves a vision-language model planner with a vid…

  10. COMMENTARY · CL_29648 ·

    AI transforms robotics, journalism, and environmental monitoring

    A new survey highlights the significant impact of vision-language models on industrial robotics, achieving a 90% task success rate in human-robot collaboration. Separately, Al Jazeera is partnering with Google Cloud to …

  11. TOOL · CL_29263 ·

    New benchmark reveals VLMs struggle with high-res Earth observation details

    Researchers have introduced UHR-Micro, a new benchmark designed to evaluate Vision-Language Models (VLMs) on their ability to perceive small, critical details within ultra-high-resolution Earth observation imagery. Curr…

  12. TOOL · CL_28149 ·

    Fine-tuning VLMs hinges on strategic choices, not just training

    This article argues that fine-tuning a vision-language model (VLM) is less about the technical training process and more about strategic decisions made beforehand. The author highlights four key choices that significant…

  13. TOOL · CL_27973 ·

    New model HieraCount improves object counting with multi-grained approach

    Researchers have introduced a new framework for open-world object counting, addressing the brittleness of current vision-language models in accurately identifying and counting objects based on user intent. They propose …

  14. TOOL · CL_28312 ·

    New framework boosts VLM chart understanding with counterfactual data

    Researchers have developed ChartCF, a new framework to improve the data efficiency of vision-language models (VLMs) used for chart understanding. This method leverages counterfactual data synthesis, where small code-con…

  15. TOOL · CL_27979 ·

    Medical VQA self-verification unreliable, study finds

    A new research paper introduces a diagnostic framework called [METHOD NAME] to expose the unreliability of self-verification in medical visual question answering (VQA) systems. The study argues that current self-verific…

  16. RESEARCH · CL_27989 ·

    New UJEM-KL attack bypasses VLM safety measures with entropy maximization

    Researchers have developed a new method called Untargeted Jailbreak via Entropy Maximization (UJEM-KL) to bypass safety measures in vision-language models (VLMs). This technique focuses on manipulating high-entropy toke…

  17. TOOL · CL_27992 ·

    TINS method enhances OOD detection in vision-language models

    Researchers have developed TINS, a novel method for Out-of-Distribution (OOD) detection in vision-language models. TINS addresses limitations of static negative labels by learning dynamic negative semantics during test-…

  18. TOOL · CL_28024 ·

    New AI method simplifies images while keeping them photorealistic

    Researchers have developed a new framework for simplifying images while maintaining photorealism, moving beyond traditional non-photorealistic rendering techniques. Their method iteratively removes and inpaints elements…

  19. TOOL · CL_28030 ·

    New SleepWalk benchmark tests AI's 3D navigation and instruction grounding

    Researchers have introduced SleepWalk, a new benchmark designed to rigorously test instruction-guided vision-language navigation capabilities of AI models. This benchmark focuses on localized, interaction-centric embodi…

  20. RESEARCH · CL_26359 ·

    GPT-5 Mini leads Agentick benchmark, but no agent paradigm dominates

    The new Agentick benchmark, which assesses various AI agents across 37 tasks, shows GPT-5 Mini achieving the top score of 0.309. However, no single agent paradigm, including reinforcement learning, LLM, VLM, or hybrid a…