PulseAugur
LIVE 14:12:33
research · [2 sources] ·
5
research

New research quantifies hyperparameter transfer in LLMs

A new paper introduces a framework to quantify hyperparameter transfer, a crucial technique for scaling up large language model training. The research identifies that the primary benefit of the Maximal Update parameterization over standard parameterization stems from maximizing the embedding layer's learning rate. This adjustment smooths training and enhances hyperparameter transfer, with weight decay showing mixed results on scaling law fits and extrapolation robustness. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Identifies key factors for efficient LLM scaling, potentially improving training stability and performance.

RANK_REASON The cluster contains an academic paper detailing novel research findings on LLM training techniques.

Read on arXiv stat.ML →

COVERAGE [2]

  1. arXiv stat.ML TIER_1 · Dayal Singh Kalra, Maissam Barkeshli ·

    Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate

    arXiv:2605.21486v1 Announce Type: cross Abstract: Hyperparameter transfer allows extrapolating optimal optimization hyperparameters from small to large scales, making it critical for training large language models (LLMs). This is done either by fitting a scaling law to the hyperp…

  2. arXiv stat.ML TIER_1 · Maissam Barkeshli ·

    Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate

    Hyperparameter transfer allows extrapolating optimal optimization hyperparameters from small to large scales, making it critical for training large language models (LLMs). This is done either by fitting a scaling law to the hyperparameters or by a judicious choice of parameteriza…