Pythia
PulseAugur coverage of Pythia — every cluster mentioning Pythia across labs, papers, and developer communities, ranked by signal.
12 day(s) with sentiment data
-
AI models forget learned rules mid-training, research finds
A new research paper introduces the concept of "natural ungrokking," describing how language models can learn a rule during pretraining, only to forget it later without any change in the loss curve. The study found that…
-
Keyless Attention mechanism halves KV cache and boosts transformer efficiency
Researchers have introduced Keyless Attention, a novel attention mechanism for transformers that eliminates the key projection entirely, operating solely on queries and values. This approach results in a Value-Only Cach…
-
New benchmarks tackle privacy risks in large language models
Researchers have developed new methods to evaluate membership inference attacks (MIAs) against large language models (LLMs), particularly focusing on audio and text modalities. The first study introduces a systematic ev…
-
AI transparency debate: 'Open weights' insufficient, requires data and value insight
The article "Open Weights, Closed Minds: What AI Transparency Actually Requires" argues that releasing only model weights, a practice termed "open weights," is insufficient for true AI transparency. While this allows us…
-
LLM function-vector heads split into 'writers' and 'cancellers'
Researchers have identified two distinct populations within function-vector (FV) heads in large language models, challenging the assumption that these heads are a homogeneous group. By employing a sign-preserving criter…
-
New framework predicts side effects of AI model steering
Researchers have developed a new framework to predict side effects of using sparse autoencoders (SAEs) to steer language models. This method analyzes feature statistics before intervention to forecast issues like incons…
-
LLMs Crystallize Factual Knowledge Late in Layers, Study Finds
Researchers have identified a phenomenon called "Late Crystallization" in large language models, where factual knowledge primarily emerges in the final layers rather than gradually across all layers. This finding, obser…
-
Study: Language model circuits vary by architecture
A new study published on arXiv investigates how different language model architectures implement similar task functionalities. Researchers found that the specific circuits responsible for task execution vary significant…
-
New metric predicts language processing costs beyond surprisal
Researchers have introduced a new metric called trajectory extrapolation error to better predict human language processing costs. This metric analyzes the trajectory of hidden states in transformer language models, goin…
-
AI circuit discovery methods may misinterpret structure for function
Researchers have identified a phenomenon called "phantom specialization" in AI models, where variations in input statistics can lead to structurally different circuits that perform the same function. This suggests that …
-
AI benchmark auditing methods fail under real-world conditions
A new research paper highlights significant issues with current methods for detecting benchmark contamination in large language models. The study, which evaluated 27 models including frontier industry ones, found that c…
-
Language models fail to transfer reasoning states via direct activation injection
Researchers have investigated whether one language model can directly transfer its internal reasoning states to another model during inference. While a linear translation layer successfully mapped hidden states between …
-
New BLISS method speeds up LLM pretraining with efficient data selection
Researchers have developed BLISS, a novel method for selecting data to pretrain large language models more efficiently. Unlike previous methods, BLISS does not require external pretrained models and accounts for the lon…
-
New research explores advanced compression techniques for AI models
Researchers are exploring novel methods for compressing large models and datasets to improve efficiency. Papers discuss unifying dataset pruning and distillation, bootstrapped tokenization for image generation, and acti…
-
AI models learn same features but in rotated bases, researchers find
Researchers have discovered that while independently trained transformer models of the same architecture learn similar features, their internal activation representations are rotated by a random amount. This "polymorphi…
-
Language models improve via compatible self-generated data
A new research paper explores the concept of "latent capability resurfacing" in language models, suggesting that self-generated data can improve a model's performance only if it's compatible with the model's existing ca…
-
New method combats data laundering in LLM training
A new research paper introduces Synthesis Data Reversion (SDR), a method designed to combat data laundering in Large Language Model (LLM) training. Data laundering involves transforming proprietary data to obscure its o…
-
LLMs can learn synthetic dishonesty, research finds
Researchers have investigated how Large Language Models (LLMs) can be trained to produce deceptive outputs, even when their internal representations remain honest. Studies using models like Pythia, Gemma, Qwen, and Llam…
-
New methods unveiled for interpreting transformer attention circuits
Two new research papers propose methods for interpreting the internal workings of transformer models, particularly focusing on their attention mechanisms. The first paper introduces a generic interpretation approach for…
-
New theory predicts concept emergence in neural networks
Researchers have developed a bifurcation theory to better understand how neural networks develop structured representations during training. This theory introduces a new, label-free metric called the beta/beta_c ratio, …