nanoGPT
PulseAugur coverage of nanoGPT — every cluster mentioning nanoGPT across labs, papers, and developer communities, ranked by signal.
- 2026-05-15 research_milestone AI agents achieved new records in the nanoGPT training speedrun benchmark, surpassing human performance. source
6 day(s) with sentiment data
-
BeamGPT operator enhances language model training efficiency
A novel operator called BeamGPT has been developed, which significantly improves learning curves in language models by identifying sequence structures that standard attention mechanisms miss. This operator, when integra…
-
Developer implements GPTQ quantization from scratch, achieving minimal performance loss
A developer detailed their process of implementing the GPTQ quantization method from scratch on a nanoGPT model. This technique reduces model size and speeds up inference by lowering the precision of weights, but unlike…
-
New AngularMuown optimizer improves Transformer pre-training
Researchers have introduced AngularMuown, a novel optimization algorithm that implicitly performs angular step-size decay, building upon the principles of matrix-aware optimizers like Muon and Muown. This new method exp…
-
Hybrid LLM-GNN Model Enhances Quantum Circuit Optimization
A developer has created a hybrid model combining Large Language Models (LLMs) and Graph Neural Networks (GNNs) to improve the efficiency of the ADAPT-QAOA algorithm for optimizing quantum circuits. This approach aims to…
-
Student proposes Silia Transformer for parameter-efficient small models
A student researcher has introduced "Silia," a novel Transformer architecture designed for parameter efficiency in models under 10 million parameters. The architecture aims to combine the dynamic mixing of attention mec…
-
New 'Muon' optimization technique flattens matrix gradients
A new research paper introduces "Muon," an optimization technique that replaces matrix gradients with their polar factors. This method maintains singular directions but flattens the update spectrum, which the authors su…
-
Kronecker Embeddings slash language model parameters, boost performance
Researchers have developed Kronecker Embeddings, a novel method for representing tokens in language models that significantly reduces the number of trainable parameters. This approach replaces large embedding tables wit…
-
Community project proposed for training LLMs on 8GB VRAM consumer hardware
A user on r/LocalLLaMA is proposing a community project to train a large language model from scratch using only consumer-grade hardware, specifically targeting an 8GB VRAM limit. The goal is to create an accessible, fre…
-
AI agents set new records in nanoGPT training speedrun
Prime Intellect utilized advanced AI models, specifically Codex (based on GPT-5.5) and Claude Code (based on Opus 4.7), to autonomously optimize the nanoGPT training process. The AI agents conducted approximately 10,000…
-
Tilde Research launches Aurora optimizer to fix neuron death in Muon
Tilde Research has introduced Aurora, a novel optimizer designed to train neural networks more effectively. Aurora addresses a critical issue in the popular Muon optimizer where a significant number of neurons become pe…
-
Aurora optimizer boosts neural network training efficiency
Researchers have introduced Aurora, a new optimizer designed to improve the training of large neural networks, particularly those with rectangular matrices. Aurora addresses issues like neuron death in MLP layers that c…
-
Muon optimizer fails on convex Lipschitz functions, study finds
A new paper challenges the theoretical underpinnings of the Muon optimization algorithm, demonstrating that it does not converge on convex Lipschitz functions. The research suggests that Muon's practical success likely …