GPT-2
PulseAugur coverage of GPT-2 — every cluster mentioning GPT-2 across labs, papers, and developer communities, ranked by signal.
8 天有情绪数据
-
Researchers explore weight decay, in-context learning, and acceleration for Transformer models
Researchers have developed several new methods to improve the efficiency and theoretical understanding of Transformer models. One paper provides a functional-analytic characterization of weight decay, demonstrating its …
-
Researchers develop parametric memory network for efficient token communication in wireless transmission
Researchers have developed an evolving semantic token communication system using a parametric memory network designed for MIMO fading channels. This system transmits only a prefix of each semantic token to reduce overhe…
-
What is Tokenization Drift and How to Fix It?
Tokenization drift occurs when minor formatting changes in input text, such as spacing or line breaks, lead to different token IDs being generated by a model. This can cause unpredictable shifts in model behavior becaus…
-
Researchers explore efficient transformers via attention control and algorithmic capture
Researchers are exploring methods to enhance transformer efficiency and understanding. One paper introduces Budgeted Attention Allocation, a head-gating mechanism that allows for cost-quality trade-offs. Another study d…
-
NetNomos framework integrates logic rules into generative ML for networking
Researchers have developed NetNomos, a novel framework designed to integrate explicit network knowledge into generative machine learning models for networking tasks. This approach addresses limitations in current models…
-
Porting microgpt to Futhark, Part I
The author details their experience porting Andrej Karpathy's microgpt, a concise Python implementation of a GPT-2-like neural network, to the data-parallel language Futhark. The goal was to improve scalability beyond P…
-
Galaxy General LDA-1B模型统一多样化数据,迎来具身AI的GPT-2时刻
Galaxy General LDA 推出了 LDA-1B,一个拥有 16 亿参数的模型,旨在统一具身 AI 的多样化数据源利用。该模型采用了新颖的世界-动作融合方法,使其能够从广泛的数据中学习,包括虚拟模拟、真实世界镜头,甚至噪声或未标记的输入。通过打破数据孤岛,LDA-1B 旨在克服先前具身 AI 模型的局限性,并迎来可扩展、通用机器人智能的时代。
-
Latent reasoning models may offer safer, more interpretable AI
A LessWrong post explores the potential benefits of latent reasoning models (LRMs) for AI safety and interpretability. These models, which perform Chain-of-Thought (CoT) reasoning within their internal activations rathe…
-
Transformer research probes security flaws, training dynamics, and in-context learning limits
Researchers have identified vulnerabilities in the shuffling defense mechanism used to secure Transformer models during inference, demonstrating an attack that can extract model weights by aligning permuted activations.…
-
研究:移除 LLM 中的 LayerNorm 可作为隐式正则化器,其影响取决于训练数据大小。
研究人员调查了从神经网络架构中移除层归一化(LayerNorm)的影响,特别是在 GPT-2 和 Llama 等模型中。他们的发现表明,用学习到的激活边界机制动态双曲正切(DyT)替换 LayerNorm,可以作为一种依赖于训练阶段的隐式正则化器。这意味着 DyT 可以在某些训练阶段(例如,较小的数据集)提高性能,但在其他阶段(例如,较大的数据集或增加模型容量)会降低性能。该研究表明,激活饱和是 DyT 性能的关键因素,其饱和水平因模…
-
Talkie-1930: New 13B AI model trained on pre-1931 text explores historical knowledge
A new project called Talkie has released a 13-billion parameter language model trained exclusively on English text from before 1931. This "vintage" model aims to explore AI's ability to predict the future and generate n…
-
新的 GEM 激活函数提供了比 ReLU 更平滑、更具理性的替代方案
研究人员推出了一种名为 Geometric Monomial (GEM) 的新型激活函数族,专为深度神经网络设计。这些函数采用纯粹的有理数算术,并提供 $C^{2N}$-平滑性,旨在克服标准 ReLU 的局限性。实验表明,GEM 变体在 CIFAR-10、CIFAR-100、MNIST、GPT-2 和 BERT-small 等各种基准测试中,其性能可媲美甚至超越 GELU 等成熟函数。
-
Researchers find variance doesn't equal importance in transformer compression
Researchers have conducted a systematic study on transformer compression, analyzing over 40 experiments across GPT-2 and Mistral 7B models. Their findings indicate that variance in activation directions does not correla…
-
EleutherAI发布开源工具用于解释AI模型特征
EleutherAI发布了一个开源库,用于自动解释稀疏自编码器中的特征,这是一种用于分解模型激活的方法。该工具利用Llama 3.1和Claude 3.5 Sonnet等大型语言模型为这些特征生成自然语言解释,与之前的手动方法相比,大大降低了成本和工作量。该库旨在使社区更容易研究这些可解释的特征。
-
OpenAI launches affordable GPT-4o mini and open-weight gpt-oss models
OpenAI has released GPT-4o mini, a new, highly cost-efficient small model designed to broaden AI accessibility and application development. This model demonstrates superior performance on benchmarks like MMLU, MGSM, and…
-
OpenAI explores weak-to-strong generalization for AI alignment
OpenAI has introduced a new research direction called weak-to-strong generalization, aiming to address the challenge of aligning future superintelligent AI systems with human supervision. Their initial experiments show …
-
Replit 和 Weights & Biases 主办机器学习马拉松赛并颁发奖项
Replit 和 Weights & Biases 最近结束了他们首次机器学习马拉松赛,该比赛于 2023 年 2 月 4 日至 11 日举行。全球参赛者使用 Replit 的平台和 Weights & Biases 的工具来构建和微调机器学习模型。总计超过 500,000 个 Cycles 的奖金颁发给了优秀项目,其中包括利用 GPT-3 扩展人类努力的项目、使用微调的 GPT-2 生成合成禅语的项目,以及实现 Q-Learning 的项目。