PulseAugur
EN
LIVE 13:38:24

New Oryx Model Flexibly Switches Between Attention and Recurrent Mixers

Researchers have introduced Oryx, a novel hybrid model designed to flexibly switch between different sequence mixers, such as quadratic attention and linear recurrences, throughout a given sequence. This approach allows for rich context utilization with attention and efficient generation with linear recurrences, while sharing over 90% of parameters across these modes. Validation with Mamba-2 and Gated DeltaNet variants, up to 1.4B models, demonstrated that Oryx achieves comparable or superior performance to single-mixer baselines on language modeling tasks and matches Transformer baseline performance on retrieval tasks with significantly fewer tokens processed in attention mode. AI

IMPACT Introduces a novel hybrid architecture that could improve efficiency and performance in long-context sequence modeling.

RANK_REASON The cluster contains a research paper detailing a new model architecture.

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New Oryx Model Flexibly Switches Between Attention and Recurrent Mixers

COVERAGE [2]

  1. arXiv cs.LG TIER_1 English(EN) · Kevin Y. Li, Asher Trockman, Ananda Theertha Suresh, Ziteng Sun ·

    Multi-Mixer Models: Flexible Sequence Modeling with Shared Representations

    arXiv:2605.28769v1 Announce Type: new Abstract: Softmax attention is the cornerstone of modern large language models, but its memory scales linearly and compute quadratically with sequence length. Linear recurrent models, such as linear attention and state space models, have beco…

  2. arXiv cs.LG TIER_1 English(EN) · Ziteng Sun ·

    Multi-Mixer Models: Flexible Sequence Modeling with Shared Representations

    Softmax attention is the cornerstone of modern large language models, but its memory scales linearly and compute quadratically with sequence length. Linear recurrent models, such as linear attention and state space models, have become widely studied as alternatives to attention d…