Subquadratic debuts 12M-token context window with linear scaling architecture

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Subquadratic, a startup with 11 PhD researchers, has launched a new model featuring its Subquadratic Selective Attention (SSA) architecture, which claims to scale linearly with context length. This innovation allows for a 12-million-token context window, aiming to overcome the quadratic cost limitations of traditional dense attention mechanisms in LLMs. Early benchmarks show competitive performance against models like GPT-5.5 and Claude Opus on tasks such as MRCR v2 and SWE-Bench, with significantly faster inference speeds. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Linear scaling in compute and memory with context length could significantly reduce the cost and improve the ROI of RAG and agentic decomposition.

RANK_REASON A startup released a new model with a novel architecture and provided benchmark results. [lever_c_demoted from research: ic=1 ai=1.0]

Read on dev.to — LLM tag →

COVERAGE [1]

dev.to — LLM tag TIER_1 · Andrew Kew · 2026-05-06 12:15

12 million tokens, linear cost: Subquadratic's bet against the attention tax

<p>The quadratic attention problem has quietly shaped everything you've built with LLMs. RAG pipelines, agentic decomposition, hybrid architectures — these aren't the natural shape of AI systems. They're workarounds. Doubling the context quadruples the compute, so everyone stoppe…

COVERAGE [1]

12 million tokens, linear cost: Subquadratic's bet against the attention tax

RELATED ENTITIES

RELATED TOPICS