PulseAugur
实时 23:36:52
实体 Pythia

Pythia

PulseAugur coverage of Pythia — every cluster mentioning Pythia across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
11
90 天内 11
发布 · 30天
0
90 天内 0
论文 · 30天
11
90 天内 11
层级分布 · 90 天
情绪 · 30 天

4 天有情绪数据

最近 · 第 1/1 页 · 共 11 条
  1. RESEARCH · CL_48758 ·

    New Unpack method deciphers transformer component interactions

    Researchers have developed a new method called Unpack to analyze the internal workings of transformer models. This technique uses backward recursion to trace how different components, like attention and MLP layers, cont…

  2. RESEARCH · CL_47622 ·

    New theory models LLM training as noisy channel communication

    Researchers have introduced the Shannon Scaling Law, a new theoretical framework for understanding Large Language Model (LLM) training. This model views LLM training as information transmission through a noisy channel, …

  3. RESEARCH · CL_41829 ·

    Self-training restructures language models, research finds

    A new research paper challenges the common understanding of self-training in language models, suggesting it restructures rather than flattens language. The study found that while surface-level linguistic features like d…

  4. TOOL · CL_36526 ·

    Transformer layer pruning tests yield divergent results

    Researchers have identified that the definition of 'layer equivalence' in transformer models is not a fixed property but depends heavily on the testing methodology. Two distinct tests, 'replacement' and 'interchange', c…

  5. RESEARCH · CL_22182 ·

    Language model surprisal may not predict metaphor novelty as thought

    A new paper published on arXiv suggests that language model surprisal, often used as a proxy for contextual predictability and metaphor novelty, may be misleading. The research indicates that lexical frequency is a stro…

  6. TOOL · CL_26990 ·

    New AEN-SAE architecture tackles feature starvation in LLM interpretability

    Researchers have introduced Adaptive Elastic Net Sparse Autoencoders (AEN-SAEs) to address feature starvation in sparse autoencoders used for interpreting LLM representations. Traditional methods struggle with dead neur…

  7. RESEARCH · CL_18265 ·

    Researchers find Transformers know counts but struggle to output them

    A new paper identifies a specific bottleneck in Transformer models that hinders their ability to perform counting tasks. Researchers found that while models like Pythia, Qwen3, and Mistral store count information accura…

  8. RESEARCH · CL_15547 ·

    HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization

    Researchers are developing several novel methods to optimize the Key-Value (KV) cache in large language models, which is a major bottleneck for long-context processing. These approaches include training models to inhere…

  9. RESEARCH · CL_09277 ·

    AI model evaluations are becoming a costly bottleneck, surpassing training expenses

    AI model evaluations are becoming prohibitively expensive, with recent benchmarks costing tens of thousands of dollars and consuming thousands of GPU hours. This high cost is particularly pronounced for agent-based eval…

  10. RESEARCH · CL_08642 ·

    Transformer architecture significantly impacts model error detection capabilities

    A new paper reveals that a transformer model's architecture significantly impacts its ability to signal decision quality through internal activations, a property termed 'observability.' This observability is crucial for…

  11. RESEARCH · CL_06772 ·

    Transformer research probes security flaws, training dynamics, and in-context learning limits

    Researchers have identified vulnerabilities in the shuffling defense mechanism used to secure Transformer models during inference, demonstrating an attack that can extract model weights by aligning permuted activations.…