Subquadratic debuts 12M-token context window with linear scaling architecture

By PulseAugur Editorial · [1 sources] · 2026-05-06 12:15

Subquadratic, a startup with 11 PhD researchers, has launched a new model featuring its Subquadratic Selective Attention (SSA) architecture, which claims to scale linearly with context length. This innovation allows for a 12-million-token context window, aiming to overcome the quadratic cost limitations of traditional dense attention mechanisms in LLMs. Early benchmarks show competitive performance against models like GPT-5.5 and Claude Opus on tasks such as MRCR v2 and SWE-Bench, with significantly faster inference speeds. AI

IMPACT Linear scaling in compute and memory with context length could significantly reduce the cost and improve the ROI of RAG and agentic decomposition.

RANK_REASON A startup released a new model with a novel architecture and provided benchmark results. [lever_c_demoted from research: ic=1 ai=1.0]

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Subquadratic debuts 12M-token context window with linear scaling architecture

COVERAGE [1]

dev.to — LLM tag TIER_1 English(EN) · Andrew Kew · 2026-05-06 12:15

12 million tokens, linear cost: Subquadratic's bet against the attention tax

<p>The quadratic attention problem has quietly shaped everything you've built with LLMs. RAG pipelines, agentic decomposition, hybrid architectures — these aren't the natural shape of AI systems. They're workarounds. Doubling the context quadruples the compute, so everyone stoppe…

COVERAGE [1]

12 million tokens, linear cost: Subquadratic's bet against the attention tax

RELATED ENTITIES

RELATED TOPICS