Brief

last 24h

[3/3] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · dev.to — LLM tag English(EN) · 3d

Qwen3-Coder-Next: 80B total, 3B active, 70.6 on SWE-Bench

Alibaba's Qwen3-Coder-Next, an 80 billion parameter model with 3 billion active parameters, has achieved a 70.6 score on the SWE-Bench Verified benchmark. This performance is notable as it rivals top closed-source models while offering downloadable weights under the Apache 2.0 license. The model employs a sparse Mixture-of-Experts architecture and a hybrid attention mechanism, combining linear attention for long contexts with standard attention for global context reconstruction. AI

IMPACT Sets a new SOTA for open-source coding models on SWE-Bench, making advanced coding assistance more accessible.
RESEARCH · Hugging Face Blog English(EN) · 1w · [3 sources]

The Open Agent Leaderboard

Hugging Face has launched the Open Agent Leaderboard, a new framework for evaluating the performance and cost of AI agent systems. This benchmark focuses on assessing an agent's generality across diverse tasks and settings, rather than just the underlying model's capabilities. The leaderboard utilizes six established benchmarks, including SWE-Bench Verified and AppWorld, to test agents in areas like coding, customer service, and research, providing a more holistic view of their real-world applicability. AI

IMPACT Provides a new standardized method for evaluating AI agent generality and cost, potentially guiding development towards more practical applications.
RESEARCH · Together AI blog English(EN) · 3w

DeepSeek-V4 Pro now available on Together AI

DeepSeek-V4 Pro, a large Mixture-of-Experts model with 1.6 trillion parameters, is now accessible on the Together AI platform. This model is designed for long-context reasoning, supporting up to a 512K-token context window in its initial Together AI deployment, with plans for a 1M-token context window. It features controllable reasoning modes to optimize for speed or depth and offers specialized pricing for cached input tokens to reduce costs on repeated queries. AI

IMPACT Enables new applications requiring reasoning over extensive datasets, potentially lowering costs for repeated long-context queries.

Brief

Qwen3-Coder-Next: 80B total, 3B active, 70.6 on SWE-Bench

The Open Agent Leaderboard

DeepSeek-V4 Pro now available on Together AI