Researchers have introduced Oryx, a novel hybrid model designed to flexibly switch between different sequence mixers, such as quadratic attention and linear recurrences, throughout a given sequence. This approach allows for rich context utilization with attention and efficient generation with linear recurrences, while sharing over 90% of parameters across these modes. Validation with Mamba-2 and Gated DeltaNet variants, up to 1.4B models, demonstrated that Oryx achieves comparable or superior performance to single-mixer baselines on language modeling tasks and matches Transformer baseline performance on retrieval tasks with significantly fewer tokens processed in attention mode. AI
IMPACT Introduces a novel hybrid architecture that could improve efficiency and performance in long-context sequence modeling.
RANK_REASON The cluster contains a research paper detailing a new model architecture.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →