WikiText-103
PulseAugur coverage of WikiText-103 — every cluster mentioning WikiText-103 across labs, papers, and developer communities, ranked by signal.
-
New parameter E predicts Mixture-of-Experts model health, preventing dead experts.
Researchers have introduced a new dimensionless control parameter, E = T*H/(O+B), to predict the health of expert ecologies in Mixture-of-Experts (MoE) models. This parameter, derived from four hyperparameters, can prev…
-
New framework uses masked language models for efficient wireless token communication
Researchers have developed a novel context-aware wireless token communication framework that utilizes a masked language model (MLM) to improve transmission efficiency. This system enables robust token inference over noi…
-
Jordan-RoPE: Non-Semisimple Relative Positional Encoding via Complex Jordan Blocks
Researchers have introduced Jordan-RoPE, a novel relative positional encoding method for transformer models that utilizes complex Jordan blocks. This approach generates oscillatory-polynomial features, enabling a distan…
-
Researchers explore weight decay, in-context learning, and acceleration for Transformer models
Researchers have developed several new methods to improve the efficiency and theoretical understanding of Transformer models. One paper provides a functional-analytic characterization of weight decay, demonstrating its …
-
Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space
研究人员引入了一种新颖的复值序列模型,称为相位关联记忆(PAM),它利用希尔伯特空间形式主义来更好地捕捉语义表达含义的不确定性。虽然 PAM 的绝对损失高于其实值对应物,但它随着参数数量的增加表现出更快的改进。这表明 PAM 式架构有可能以显著更少的参数实现最先进的语言模型功能,使其在消费级硬件上可行。
-
AutoCompress method isolates critical transformer layers for efficient compression
Researchers have developed AutoCompress, a novel method for compressing transformer models by isolating and preserving the critical first layer (Layer 0). This approach, termed Critical Layer Isolation (CLI), showed tha…