Brief · PulseAugur

RESEARCH · Hugging Face Daily Papers English(EN) · 5d · [4 sources]

Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate

Researchers have developed new methods for hyperparameter transfer, enabling more efficient scaling of large neural networks. One paper introduces a parameterization justified by dynamical mean-field theory, allowing reliable hyperparameter transfer across models ranging from 51 million to over 2 billion parameters. Another study quantifies hyperparameter transfer and highlights the critical role of the embedding layer's learning rate, suggesting that maximizing it can significantly improve training stability and performance, particularly when using the AdamW optimizer. AI

IMPACT New parameterization and optimization techniques could significantly reduce the cost and complexity of training large-scale AI models.

Dayal Singh Kalra
AdamW
Maximal Update
Standard Parameterization
Mixture-of-Experts
Tianze Jiang
large language models
hyperparameter transfer
embedding layer