PulseAugur
实时 14:46:08
English(EN) Trajectory Geometry of Transformer Representations Across Layers

Transformer几何探索:模块特定优化与表示轨迹

两篇新研究论文探讨了Transformer模型的内部几何结构,重点关注表示在层间的演变方式。一篇论文研究了用于优化的模块特定权重空间几何,发现为GPT-2中的注意力层和MLP层分配不同的流形约束可以提高性能和稳定性。另一篇论文分析了表示的轨迹几何,使用长度、曲率和收敛性等指标来理解语义相关的提示如何演变,揭示了不同的处理阶段,并将曲率与GPT-2、TinyLlama和Qwen2.5的计算复杂度相关联。 AI

影响 为Transformer架构和优化提供了新的见解,可能带来更高效、更稳定的模型训练。

排序理由 该集群包含两篇在arXiv上发表的学术论文,详细介绍了对Transformer模型内部的开创性研究。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 6 个来源。 我们如何撰写摘要 →

报道来源 [6]

  1. arXiv cs.AI TIER_1 English(EN) · Kirato Yoshihara ·

    Different Layers, Different Manifolds: Module-Wise Weight-Space Geometry in Transformer Optimization

    arXiv:2606.13276v1 Announce Type: cross Abstract: Weight-space geometry plays a central role in neural network optimization, yet manifold constraints are often applied uniformly across all weight matrices. In this work, we ask whether different transformer modules prefer differen…

  2. arXiv cs.AI TIER_1 English(EN) · Kirato Yoshihara ·

    Different Layers, Different Manifolds: Module-Wise Weight-Space Geometry in Transformer Optimization

    Weight-space geometry plays a central role in neural network optimization, yet manifold constraints are often applied uniformly across all weight matrices. In this work, we ask whether different transformer modules prefer different manifold geometries. We study Manifold Muon for …

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    Different Layers, Different Manifolds: Module-Wise Weight-Space Geometry in Transformer Optimization

    Weight-space geometry plays a central role in neural network optimization, yet manifold constraints are often applied uniformly across all weight matrices. In this work, we ask whether different transformer modules prefer different manifold geometries. We study Manifold Muon for …

  4. arXiv cs.LG TIER_1 English(EN) · Vishal Pandey, Gopal Singh ·

    Transformer表征在不同层级的轨迹几何

    arXiv:2606.09287v1 Announce Type: new Abstract: Understanding how transformer representations evolve across layers, not merely what they encode, remains an open problem in mechanistic interpretability. We recast the transformer forward pass as a discrete population trajectory thr…

  5. arXiv cs.LG TIER_1 English(EN) · Gopal Singh ·

    Transformer表征在不同层级的轨迹几何

    Understanding how transformer representations evolve across layers, not merely what they encode, remains an open problem in mechanistic interpretability. We recast the transformer forward pass as a discrete population trajectory through a high-dimensional representation manifold,…

  6. arXiv stat.ML TIER_1 English(EN) · Tiexin Ding ·

    A Two-Parameter Weibull Framework for Diagnosing Transformer Weight Distributions

    arXiv:2605.18898v1 Announce Type: cross Abstract: We apply the Weibull distribution -- a two-parameter family from extreme-value theory -- as a diagnostic framework for element-wise weight magnitude distributions in transformers. At initialization, i.i.d. Gaussian weights give |w…