PulseAugur / Brief
EN
LIVE 08:27:50

Brief

last 24h
[3/3] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Qwen3-Coder-Next: 80B total, 3B active, 70.6 on SWE-Bench

    Alibaba's Qwen3-Coder-Next, an 80 billion parameter model with 3 billion active parameters, has achieved a 70.6 score on the SWE-Bench Verified benchmark. This performance is notable as it rivals top closed-source models while offering downloadable weights under the Apache 2.0 license. The model employs a sparse Mixture-of-Experts architecture and a hybrid attention mechanism, combining linear attention for long contexts with standard attention for global context reconstruction. AI

    IMPACT Sets a new SOTA for open-source coding models on SWE-Bench, making advanced coding assistance more accessible.

  2. The Open Agent Leaderboard

    Hugging Face has launched the Open Agent Leaderboard, a new framework for evaluating the performance and cost of AI agent systems. This benchmark focuses on assessing an agent's generality across diverse tasks and settings, rather than just the underlying model's capabilities. The leaderboard utilizes six established benchmarks, including SWE-Bench Verified and AppWorld, to test agents in areas like coding, customer service, and research, providing a more holistic view of their real-world applicability. AI

    The Open Agent Leaderboard

    IMPACT Provides a new standardized method for evaluating AI agent generality and cost, potentially guiding development towards more practical applications.

  3. DeepSeek-V4 Pro now available on Together AI

    DeepSeek-V4 Pro, a large Mixture-of-Experts model with 1.6 trillion parameters, is now accessible on the Together AI platform. This model is designed for long-context reasoning, supporting up to a 512K-token context window in its initial Together AI deployment, with plans for a 1M-token context window. It features controllable reasoning modes to optimize for speed or depth and offers specialized pricing for cached input tokens to reduce costs on repeated queries. AI

    IMPACT Enables new applications requiring reasoning over extensive datasets, potentially lowering costs for repeated long-context queries.