PulseAugur
EN
LIVE 20:03:08
ENTITY Qwen2.5-VL

Qwen2.5-VL

PulseAugur coverage of Qwen2.5-VL — every cluster mentioning Qwen2.5-VL across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
14
14 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
11
11 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

7 day(s) with sentiment data

RECENT · PAGE 1/1 · 14 TOTAL
  1. TOOL · CL_79831 ·

    New benchmark reveals multilingual safety gaps in vision-language models

    Researchers have developed MLingualFC, a new multilingual benchmark to test the safety vulnerabilities of vision-language models (VLMs). This benchmark uses flowchart images encoded with harmful instructions in five lan…

  2. TOOL · CL_66123 ·

    New CoCoA method boosts multimodal embedding quality

    Researchers have introduced CoCoA, a novel pre-training paradigm designed to enhance multimodal embedding models. This method focuses on content reconstruction through collaborative attention, aiming to create more comp…

  3. RESEARCH · CL_66037 ·

    New methods boost video QA by compressing content and improving temporal reasoning

    Researchers have developed new methods to improve video question answering (VQA) for long videos. One approach, MemoryCard, compresses video content into topic-aware "Memory Cards" to better capture event-level semantic…

  4. RESEARCH · CL_47640 ·

    llama.cpp releases add Vulkan, optimize matrix math, and improve server logging

    The llama.cpp project has released several updates, including version b9580 which adds Vulkan support for matrix-matrix multiplication and Flash Attention, along with optimizations for FP16 dot2 extensions. Other recent…

  5. TOOL · CL_44756 ·

    New framework boosts VLM anomaly detection for self-driving cars

    Researchers have developed SAVANT, a new framework designed to improve the detection of semantic anomalies in autonomous driving systems using Vision-Language Models (VLMs). SAVANT reformulates anomaly detection as a la…

  6. RESEARCH · CL_41802 ·

    UF Gators win AmericasNLP 2026 task with novel captioning system

    Researchers from the University of Florida Gators have won the AmericasNLP 2026 shared task for cultural image captioning of Indigenous languages. Their two-stage system uses Qwen2.5-VL for an intermediate Spanish capti…

  7. FRONTIER RELEASE · CL_42261 ·

    ByteDance releases Lance, a unified multimodal AI model

    ByteDance has released Lance, an open-source multimodal AI model capable of understanding, generating, and editing both images and videos within a single framework. This lightweight model, with only 3 billion active par…

  8. TOOL · CL_32566 ·

    Video2GUI generates 12M GUI trajectories from unlabeled videos

    Researchers have developed Video2GUI, an automated framework designed to generate large-scale interaction trajectories for training GUI agents. This system extracts data from unlabeled internet videos, converting them i…

  9. TOOL · CL_22434 ·

    New DICModel enhances ICT image captioning with multi-modal LLMs

    Researchers have developed a novel Domain-specific Image Captioning Model (DICModel) designed for the ICT industry, utilizing a multi-stage progressive training strategy. This approach combines synthesized image-text pa…

  10. TOOL · CL_22400 ·

    Medical VLMs struggle with negated answers, new benchmark reveals

    Researchers have developed CXR-ContraBench, a new benchmark designed to evaluate the performance of medical vision-language models (VLMs) in correctly interpreting negated statements within chest X-ray analyses. The ben…

  11. RESEARCH · CL_09753 ·

    DenseStep2M pipeline automates video annotation for improved understanding

    Researchers have developed DenseStep2M, a novel pipeline that automatically extracts detailed procedural annotations from instructional videos without requiring training data. This system segments videos, filters irrele…

  12. RESEARCH · CL_08185 ·

    OcularChat MLLM accurately diagnoses age-related macular degeneration with interactive explanations

    Researchers have developed OcularChat, a multimodal large language model (MLLM) fine-tuned from Qwen2.5-VL, designed to diagnose age-related macular degeneration (AMD) using color fundus photographs. The model was train…

  13. TOOL · CL_47693 ·

    Arcee AI moves to Together Endpoints for cost-efficient SLMs

    Arcee AI has migrated its specialized small language models (SLMs) from AWS to Together Dedicated Endpoints, seeking improved cost, performance, and operational agility. The company focuses on training efficient models …

  14. RESEARCH · CL_04681 ·

    New research tackles LLM hallucinations with novel methods and benchmarks

    Multiple research papers released on arXiv address the challenge of hallucinations in large language and vision-language models. One paper introduces In-Context Visual Contrastive Optimization (IC-VCO) to mitigate multi…