Qwen VL
PulseAugur coverage of Qwen VL — every cluster mentioning Qwen VL across labs, papers, and developer communities, ranked by signal.
4 day(s) with sentiment data
-
New AI framework enables robots to co-create music with humans
Researchers have developed Co-policy, a novel framework enabling robots to co-create music with humans. This system integrates semantic understanding with physical execution, allowing robots to generate complementary mu…
-
Alibaba's Qwen LLMs now available in Russia via promptra.ru API
The Qwen family of large language models, developed by Alibaba Group, is now accessible in Russia through an API aggregator called promptra.ru. This service allows Russian users to pay for Qwen models, including the Qwe…
-
Alibaba unveils Qwen-Robot embodied AI model series
Alibaba has launched the Qwen-Robot series, its first comprehensive embodied AI model family. This series includes three distinct models: Qwen-RobotManip for VLA (Vision-Language-Action) operations, Qwen-RobotNav for VL…
-
Alibaba's Qwen launches embodied AI suite for robotics
Alibaba's Qwen has launched the Qwen-Robot Suite, a collection of three foundation models designed for embodied intelligence. The suite includes Qwen-RobotNav for navigation, Qwen-RobotManip for physical interaction and…
-
Qwen-RobotManip and PAIWorld advance robotic manipulation foundation models
Researchers have developed Qwen-RobotManip, a foundation model for robotic manipulation that leverages a unified alignment framework to process heterogeneous data at scale. This approach enables the model to achieve sig…
-
PhyDrawGen generates accurate physics diagrams using neuro-symbolic AI
Researchers have developed PhyDrawGen, a novel system for generating physics diagrams from natural language descriptions. This neuro-symbolic pipeline first uses a large language model to extract a scene graph from text…
-
NVIDIA's PiD upscaler shows promise but struggles with text
A comparison between NVIDIA's new latent-space upscaler model, PiD (Pixel Diffusion Decoder), and the popular SeedVR2 model reveals mixed results. PiD excels at rendering faces with fewer artifacts and noise due to its …
-
SenseTime's 8B model redefines open-source image generation
SenseTime has released SenseNova U1, an 8B parameter open-source model that redefines image generation capabilities by removing the VAE component. This new architecture, called NEO-unify, enables end-to-end modeling of …
-
Vision Wormhole enables latent-space communication for heterogeneous AI agents
Researchers have introduced the "Vision Wormhole," a novel method for enabling communication between heterogeneous multi-agent systems (MAS) by leveraging the visual interface of Vision-Language Models (VLMs). This appr…
-
VLMs fail visual re-examination tests, research finds
Recent research indicates that Vision-Language Models (VLMs) may not be as visually grounded as their self-reflective statements suggest. Studies using image-swapping techniques and counterfactual interventions reveal t…
-
AMD eyes tens of billions in AI revenue, robot model RAM debuts, Blue Origin revises incentives
Researchers from Zhejiang University, the Chinese University of Hong Kong, and Zhejiang University have developed a new model called RAM for 3D spatial understanding and manipulation in robots. This model addresses limi…