Alibaba has released Qwen3.7-Max, an agent-first LLM with a 1 million token context window, capable of autonomous coding tasks. The model demonstrated a 35-hour coding session without human intervention, optimizing code for unfamiliar hardware and achieving a 10x speedup on a custom chip performance kernel. While independent reproduction of this demo is pending, Qwen3.7-Max shows strong performance on benchmarks like Terminal-Bench 2.0 and MCP-Atlas, surpassing some competitors, though it trails in graduate-level science reasoning and has a lower attempt rate. AI
IMPACT Sets a new bar for agentic coding and long-context reasoning, potentially pressuring competitors in specialized tasks.
RANK_REASON Frontier-lab model release with system card and benchmark data. [lever_c_demoted from frontier_release: ic=1 ai=1.0]
- Alibaba
- Claude Opus 4.6
- Claude Opus 4.7
- GPQA Diamond
- GPT-5.5
- MCP-Atlas
- Qwen3.7-Max
- Qwen3.7-Plus
- Qwen3-Coder-Next
- Terminal-Bench 2.0
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →