Trajectory Geometry of Transformer Representations Across Layers
Two new research papers explore the internal geometry of transformer models, focusing on how representations evolve across layers. One paper investigates module-specific weight-space geometries for optimization, finding that assigning different manifold constraints to attention and MLP layers in GPT-2 improves performance and stability. The other paper analyzes the trajectory geometry of representations, using metrics like length, curvature, and convergence to understand how semantically related prompts evolve, revealing distinct phases of processing and correlating curvature with computational complexity across GPT-2, TinyLlama, and Qwen2.5. AI
IMPACT Provides new insights into transformer architecture and optimization, potentially leading to more efficient and stable model training.