Unraveling the Mystery of Scaling Laws: Part I
paperA transparent, step-by-step guide to estimating the constant terms in scaling law formulas. By training small models (1M-60M parameters), the authors demonstrate how to accurately predict the performance and optimal training configurations for models up to 33B parameters.
Paper
arXiv: 2403.06563