WikiText-2
PulseAugur coverage of WikiText-2 — every cluster mentioning WikiText-2 across labs, papers, and developer communities, ranked by signal.
3 天有情绪数据
-
Llama 3.1 8B benchmark reveals memory bandwidth bottleneck on Apple M4
A benchmark of Llama 3.1 8B on an Apple M4 Mac Mini with 16GB unified memory revealed that the Q8_0 quantization, despite fitting entirely in memory, suffers from slow token generation due to memory bandwidth limitation…
-
New ScaleSearch method boosts generative model efficiency via optimized quantization
Researchers have developed a new method called ScaleSearch to improve the efficiency of generative models through quantization. This technique optimizes the selection of scale factors in Block Floating Point (BFP) forma…
-
New BCJR-QAT method pushes LLM quantization to 2 bits per weight
Researchers have developed BCJR-QAT, a novel method for quantizing large language models to 2 bits per weight, a significant advancement beyond current post-training quantization techniques. This new approach uses a dif…
-
New parameter E predicts Mixture-of-Experts model health, preventing dead experts.
Researchers have introduced a new dimensionless control parameter, E = T*H/(O+B), to predict the health of expert ecologies in Mixture-of-Experts (MoE) models. This parameter, derived from four hyperparameters, can prev…
-
New MetaAdamW optimizer uses self-attention for adaptive learning rates
Researchers have developed MetaAdamW, a novel optimizer that enhances adaptive learning rates and weight decay by employing a self-attention mechanism. This Transformer-based approach dynamically adjusts hyperparameters…
-
Associative-State Universal Transformers improve parameter efficiency with sparse retrieval
Researchers have developed UniMatrix, a novel Universal Transformer architecture that integrates structured recurrence with sparse retrieval mechanisms. While initial versions showed parameter efficiency and competitive…