PulseAugur
实时 07:12:59

HubRouter offers sub-quadratic routing for sequence models, improving throughput

Researchers have developed HubRouter, a novel module designed to replace computationally expensive O(n^2) attention layers in sequence models with a more efficient O(nM) hub-mediated routing system. This new primitive uses a small number of learned hub tokens to facilitate routing, significantly improving training throughput by up to 90x in certain configurations. While HubRouter shows promise in enhancing efficiency, particularly in hybrid architectures like Jamba, it introduces a slight trade-off in model quality compared to standard Transformers. AI

影响 Introduces a more efficient routing mechanism for sequence models, potentially reducing computational costs and accelerating training.

排序理由 The cluster describes a new academic paper detailing a novel technical approach for sequence models.

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

HubRouter offers sub-quadratic routing for sequence models, improving throughput

报道来源 [2]

  1. arXiv cs.LG TIER_1 English(EN) · Abhinaba Basu ·

    HubRouter: A Pluggable Sub-Quadratic Routing Primitive for Hybrid Sequence Models

    arXiv:2604.22442v1 Announce Type: new Abstract: We introduce HubRouter, a pluggable module that replaces O(n^2) attention layers with O(nM) hub-mediated routing, where M << n is a small number of learned hub tokens. We demonstrate it in two from-scratch architectures: a Jamba-sty…

  2. arXiv cs.LG TIER_1 English(EN) · Abhinaba Basu ·

    HubRouter: A Pluggable Sub-Quadratic Routing Primitive for Hybrid Sequence Models

    We introduce HubRouter, a pluggable module that replaces O(n^2) attention layers with O(nM) hub-mediated routing, where M << n is a small number of learned hub tokens. We demonstrate it in two from-scratch architectures: a Jamba-style hybrid and a 12-layer Transformer; retrofit i…