Researchers have introduced a novel feedforward network (FFN) design called Mixture of Activations (MoA) for large language models (LLMs). MoA utilizes token-adaptive activation mixing, allowing different activation functions to be applied to different tokens based on lightweight, input-dependent gates. This approach theoretically offers greater expressivity than fixed-activation FFNs and learnable activations (LA). Empirical evaluations on models ranging from 0.12B to 2B parameters show that MoA consistently achieves lower terminal loss and better scaling behavior with minimal overhead. AI
IMPACT This new FFN design could lead to more efficient and powerful LLMs by improving their nonlinear expressivity and scaling behavior.
RANK_REASON The cluster contains an academic paper detailing a new method for improving feedforward network layers in LLMs.
- Feedforward network (FFN)
- GELU
- Large language models (LLMs)
- Learnable activations (LA)
- Mixture of Activations (MoA)
- ReLU
- SwiGLU
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →