PulseAugur / Brief
EN
LIVE 16:11:45

Brief

last 24h
[2/2] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency

    Alibaba's Qwen team has released Qwen3.5-LiveTranslate-Flash, a real-time multimodal translation model that significantly reduces latency to 2.8 seconds. This new model expands language support to 60 input languages and 29 output languages, while also incorporating visual cues like lip movements to improve accuracy in noisy environments. A standout feature is its ability to clone the original speaker's voice in real-time for translated output, creating a more natural listening experience. AI

    Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency

    IMPACT Enhances real-time multilingual communication by reducing latency and improving accuracy through multimodal input and voice cloning.

  2. Code-Driven Visual Perception: Why "Understanding Code" is the Real Key for Large Models to Conquer STEM Problems | CVPR 2026

    Researchers from Shanghai Jiao Tong University and the Qwen team have introduced CodePercept, a novel approach to enhance large language models' visual perception capabilities, particularly for STEM tasks. Their research suggests that improving visual perception, rather than just reasoning, is the key bottleneck for models tackling science and math problems. CodePercept leverages code as a precise language for visual understanding, enabling models to generate executable code that accurately represents image content, thereby overcoming the inherent ambiguity of natural language descriptions. AI

    Code-Driven Visual Perception: Why "Understanding Code" is the Real Key for Large Models to Conquer STEM Problems | CVPR 2026

    IMPACT This approach could significantly improve LLMs' ability to understand and solve complex STEM problems by enhancing their visual perception through precise code-based representations.