Brief

last 24h

[4/4] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

SIGNIFICANT · 量子位 (QbitAI) 中文(ZH) · 4d · [15 sources]

Artificial Analysis Ranking: Qwen3.7 Wins Domestic Model Championship, Top 5 Globally

Alibaba's Qwen3.7-Max has been ranked the top-performing Chinese large language model and fifth globally by Artificial Analysis, a third-party evaluation platform. This new flagship model achieved a score of 56.6, surpassing other domestic models and nearing the capabilities of leading international models like GPT, Claude, and Gemini. Qwen3.7-Max is designed for agentic tasks, demonstrating significant advancements in programming, reasoning, and tool utilization, capable of handling complex, long-duration tasks with extensive tool calls. AI

IMPACT Sets a new benchmark for Chinese LLMs and signals increased competition at the frontier of global model performance.
FRONTIER RELEASE · Simon Willison English(EN) · 1mo · [88 sources]

Gemini 3.5 Flash: more expensive, but Google plan to use it for everything

Google has launched Gemini 3.5 Flash, a new model designed for agentic workflows and coding tasks, available immediately across its consumer and developer platforms. This release also introduces Gemini Omni for multimodal generation, particularly video, and the Antigravity agent stack. While Gemini 3.5 Flash offers significant speed and a 1 million token context window, its pricing has increased substantially compared to previous versions, aligning with a trend of rising costs among major AI labs. AI

IMPACT Sets a new standard for agentic AI performance and multimodal capabilities, potentially accelerating enterprise adoption and pushing competitors.
TOOL · X — Together (inference / OSS) English(EN) · 1w

Together AI STT models now hold the top two spots for transcription speed on the @ArtificialAnlys Speech to Text leaderboard.

Together AI's speech-to-text models have achieved the top two positions on the Artificial Analysis leaderboard for transcription speed. The NVIDIA Parakeet TDT 0.6B V3 model, running on Together AI, is currently ranked first, processing 303 seconds of audio for every second of computation. AI

IMPACT Sets new SOTA on transcription speed benchmarks, potentially improving efficiency for voice AI applications.
TOOL · Bluesky Jetstream — AI desk English(EN) · 2w

Artificial Analysis relies on our IFBench eval to test how closely models follow user prompts.

Artificial Analysis has developed IFBench, an evaluation tool designed to measure how closely AI models adhere to user instructions. Unlike many other benchmarks that quickly become saturated, IFBench remains effective because it assesses aspects that are often overlooked and continue to challenge even advanced AI models. This tool is crucial for understanding model behavior beyond standard performance metrics. AI

IMPACT Provides a new method for assessing AI model alignment with user instructions, addressing a gap in current evaluation practices.
- Artificial Analysis
- IFBench

Brief

Artificial Analysis Ranking: Qwen3.7 Wins Domestic Model Championship, Top 5 Globally

Gemini 3.5 Flash: more expensive, but Google plan to use it for everything

Together AI STT models now hold the top two spots for transcription speed on the @ArtificialAnlys Speech to Text leaderboard.

Artificial Analysis relies on our IFBench eval to test how closely models follow user prompts.