Qwen3-VL
PulseAugur coverage of Qwen3-VL — every cluster mentioning Qwen3-VL across labs, papers, and developer communities, ranked by signal.
3 天有情绪数据
-
VLMs fail to re-examine images when prompted, study finds
Researchers have developed a new framework called VisualSwap to test whether Vision-Language Models (VLMs) truly re-examine images when they claim to. Their experiments using the VS-Bench dataset on models like Qwen3-VL…
-
Alibaba's Qwen unveils advanced image generation and VAE models
Alibaba's Qwen team has released technical reports for two new image models: Qwen-Image-VAE-2.0 and Qwen-Image-2.0. Qwen-Image-VAE-2.0 is a high-compression Variational Autoencoder designed for improved reconstruction f…
-
新框架使遥感模型能够适应尺度变化
研究人员开发了ScaleEarth,一个新颖的遥感视觉语言模型(RS-VLMs)框架,解决了地面采样距离(GSD)变化带来的挑战。与先前将GSD视为离散token的方法不同,ScaleEarth使用连续条件变量,根据物理尺度动态调整模型的计算路径。该方法通过CS-HLoRA和SSE-U进行GSD预测,在遥感基准测试中取得了最先进的成果。
-
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs
Researchers have introduced Persistent Visual Memory (PVM), a novel module designed to address the "Visual Signal Dilution" problem in Large Vision-Language Models (LVLMs). This issue causes visual attention to weaken a…
-
WaferSAGE uses LLMs to analyze semiconductor defects with synthetic data
Researchers have developed WaferSAGE, a framework utilizing a 4B-parameter Qwen3-VL model for visual question answering on wafer defects in semiconductor manufacturing. The system addresses data scarcity by employing a …
-
Researchers develop precise video language models with human-AI oversight
Researchers have developed a new framework called CHAI (Critique-based Human-AI Oversight) to improve video captioning and generation. This method uses AI to generate initial captions, which are then refined by human ex…
-
Researchers probe VLM safety with embedding-guided typographic attacks
Researchers have developed a method to probe the safety vulnerabilities of vision-language models (VLMs) by using typographic prompt injections. Their study found that multimodal embedding distance strongly predicts att…
-
Alibaba's Qwen3.5-397B-A17B model offers multimodal capabilities and efficient inference
Alibaba has released Qwen3.5-397B-A17B, an open-weight, natively multimodal model featuring a hybrid attention mechanism and sparse Mixture-of-Experts architecture. The model boasts support for 201 languages and demonst…
-
Alibaba Cloud launches 7 new AI models and a $52B roadmap
Alibaba Cloud announced a significant expansion of its AI capabilities, releasing seven new models over a four-day period. Among these were the Qwen3-Max, Qwen3-Omni, and Qwen3-VL models, indicating advancements in vari…