PulseAugur
EN
LIVE 07:10:08

New 'three-term law' refines AI model scaling by factoring in batch size

A new scaling law, termed the 'three-term law,' has been proposed that accounts for model size and training data, specifically differentiating between training steps and batch size. This law has been validated by fitting it to a large dataset of training runs, where it accurately predicted the optimal batch size. The research indicates that this three-term law can be robustly fitted with fewer training runs than previous methods and can also be used to derive scaling laws for suboptimal batch sizes, aligning with existing empirical observations on critical batch sizes. AI

IMPACT Refines understanding of optimal training parameters, potentially leading to more efficient model development.

RANK_REASON The cluster contains a research paper detailing a new scaling law for AI models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv stat.ML →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New 'three-term law' refines AI model scaling by factoring in batch size

COVERAGE [2]

  1. arXiv stat.ML TIER_1 English(EN) · Fabian Schaipp ·

    How to Allocate Your Tokens? Scaling Laws with Training Steps and Batch Size

    arXiv:2607.01487v1 Announce Type: cross Abstract: We propose a scaling law that takes into account model size and training data while explicitly splitting the latter into training steps and batch size (called three-term law). Fitting the proposed law on a large set of training ru…

  2. arXiv stat.ML TIER_1 English(EN) · Fabian Schaipp ·

    How to Allocate Your Tokens? Scaling Laws with Training Steps and Batch Size

    We propose a scaling law that takes into account model size and training data while explicitly splitting the latter into training steps and batch size (called three-term law). Fitting the proposed law on a large set of training runs, we find that it correctly recovers the scaling…