This paper analyzes the phenomenon of "suspicious alignment" in stochastic gradient descent (SGD) when dealing with ill-conditioned optimization problems. The study focuses on how step size selection influences the alignment of gradient updates with dominant subspaces. Researchers propose a step-size condition that differentiates between alignment-decreasing and alignment-increasing regimes, and demonstrate that under certain conditions, projecting SGD updates to the dominant space can paradoxically increase loss. AI
影响 Provides a theoretical understanding of SGD behavior, potentially informing the development of more robust optimization techniques for AI models.
排序理由 This is a research paper published on arXiv detailing a theoretical analysis of an optimization algorithm. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →