GPT-2
PulseAugur coverage of GPT-2 — every cluster mentioning GPT-2 across labs, papers, and developer communities, ranked by signal.
7 天有情绪数据
-
GPT-2 模型在复制人类语言模式方面取得的成功不一
研究人员调查了语言模型如何处理差异论标记(DAM),这是一种标记取决于语义突出性的语言特征。使用在合成数据上训练的 GPT-2 模型,他们发现模型可以复制人类对自然标记方向的偏好,倾向于标记语义非典型论点的系统。然而,模型并未重现人类在 DAM 系统中更频繁地标记宾语而非主语的倾向,这表明不同的类型学倾向可能源于不同的起源。
-
新指标揭示语言模型如何处理隐喻
研究人员开发了一种名为条件尺度熵(CSE)的新指标,用于分析仅解码器语言模型如何处理隐喻。CSE 衡量了 Transformer 层内不同频率尺度上的计算参与广度。使用 CSE 进行的研究表明,在参数量从 1.24 亿到 200 亿不等的模型中,包括 GPT-2、LLaMA-2 和 GPT-oss 等架构,隐喻性词元相比字面性词元始终激活更广泛的计算尺度。
-
SymbolicLight V1 language model achieves high sparsity
Researchers have developed SymbolicLight V1, a novel spiking language model that integrates binary Leaky Integrate-and-Fire dynamics with a continuous residual stream. This model employs a unique Dual-Path SparseTCAM mo…
-
GLU 结构通过重塑 NTK 谱加速 LLM 优化
研究人员调查了门控线性单元 (GLU) 在大型语言模型中为何优于非 GLU 结构。他们在神经切线核 (NTK) 机制下的分析表明,GLU 重塑了 NTK 谱,从而减小了条件数并加快了收敛速度。虽然 GLU 似乎能加速优化,但经验观察表明,它在减小 ViT 和 GPT-2 等模型的泛化差距方面作用有限。
-
Self-training restructures language models, research finds
A new research paper challenges the common understanding of self-training in language models, suggesting it restructures rather than flattens language. The study found that while surface-level linguistic features like d…
-
LLM Fine-Tuning Explained: SFT, RAG, and Data Preparation
This blog post explains the process and necessity of fine-tuning large language models (LLMs) for specific tasks. It differentiates fine-tuning from Retrieval-Augmented Generation (RAG), stating that fine-tuning is best…
-
新架构应对LLM的灾难性遗忘问题
研究人员开发了新的架构方法,以解决大型语言模型(LLM)在持续预训练和微调过程中出现的灾难性遗忘问题。其中一种方法TFGN引入了一个叠加层,可以在不改变核心Transformer的情况下实现参数高效更新,在不同领域和模型规模下均能显著保留先前的知识。另一种受生物视觉启发的UAM方法,采用双流架构将语义理解与动作控制分离,在VLA模型训练过程中保持多模态能力。这些进展旨在使模型能够持续学习,而不会降低先前获得的知识的性能。
-
FibQuant 方法为 LLM 提供显著的 KV 缓存压缩
研究人员开发了 FibQuant,一种新颖的向量量化方法,旨在显著压缩大型语言模型 (LLM) 中使用的键值 (KV) 缓存。该技术通过用更高效的基于向量的方法替换标量量化,旨在减少与长上下文推理相关的内存流量。实验表明,FibQuant 可以在保持高保真度的同时实现显著的压缩率,例如在 GPT-2 small KV 缓存上实现 34 倍压缩,并在 TinyLlama-1.1B 等模型上展示出比现有方法更高的困惑度。
-
New method offers formal guarantees for LLM safety classifiers
Researchers have developed a new method to formally verify the safety of Large Language Model (LLM) guardrail classifiers, moving beyond traditional red-teaming. This approach shifts verification from the discrete input…
-
New research links optimizers to mode connectivity in neural networks
Researchers have explored the role of optimizers in mode connectivity within neural networks, a concept previously underexplored. Their work demonstrates that solutions generated by a single optimizer, such as AdamW or …
-
New theory tackles bandwidth limits for distributed language models
Researchers have developed new theoretical frameworks for training and calibrating language models in distributed settings with limited bandwidth. The Federated Probe-Logit Distillation (FPLD) protocol offers a statisti…
-
New theory reveals optimal learning rate schedules for deep learning
Researchers have developed a theoretical framework for optimal learning rate schedules in deep learning, specifically analyzing a random feature model trained with stochastic gradient descent. The study identifies two d…
-
Small-scale models show bilingualism poses no challenge for language acquisition
Researchers have developed a method using language models to simulate multilingual language acquisition in children. By training GPT-2 models on controlled monolingual and bilingual datasets, they investigated how diffe…
-
Pro-KLShampoo optimizer improves LLM pre-training with spectral structure analysis
Researchers have developed Pro-KLShampoo, an optimization technique that combines gradient preconditioning with orthogonalization for more efficient LLM pre-training. This method leverages the observed spike-and-flat ei…
-
SignSGD和Muon优化器的性能提升得到理论解释
研究人员从理论上分析了像SignSGD和Muon这样的基于符号的优化算法为何能在训练大型模型时优于标准SGD。一项新研究表明,SignSGD的优势源于其在特定条件下的有效性,例如稀疏噪声和$\\ell_1$-范数平稳性,而标准SGD在处理这些条件时效率不高。另一篇论文质疑了Muon复杂几何结构的必要性,提出像随机或反向谱等更简单的方法可以通过关注局部对齐和下降潜力来实现类似的性能。
-
New Polar Express method accelerates matrix decomposition for deep learning
Researchers have developed a new GPU-friendly algorithm called Polar Express for computing matrix decompositions, which is crucial for the Muon optimizer used in training deep neural networks. This method optimizes for …
-
LLMs achieve real-time text transmission via entropy coding
Researchers have explored the connection between learning, prediction, and compression for real-time text transmission using LLM-based entropy coding. They analyzed the trade-off between compression efficiency and trans…
-
Researchers develop SNMF for interpretable LLM feature analysis
Researchers have developed a new method for understanding the internal workings of large language models by decomposing MLP activations. This technique, semi-nonnegative matrix factorization (SNMF), identifies interpret…
-
Researchers explore weight decay, in-context learning, and acceleration for Transformer models
Researchers have developed several new methods to improve the efficiency and theoretical understanding of Transformer models. One paper provides a functional-analytic characterization of weight decay, demonstrating its …
-
Researchers develop parametric memory network for efficient token communication in wireless transmission
Researchers have developed an evolving semantic token communication system using a parametric memory network designed for MIMO fading channels. This system transmits only a prefix of each semantic token to reduce overhe…