A transparent, step-by-step guide to estimating the constant terms in scaling law formulas. By training small models (1M-60M parameters), the authors demonstrate how to accurately predict the performance and optimal training configurations for models up to 33B parameters.

Paper

arXiv: 2403.06563

scalingresearch

Related