Unraveling the Mystery of Scaling Laws: Part I

A transparent, step-by-step guide to estimating the constant terms in scaling law formulas. By training small models (1M-60M parameters), the authors demonstrate how to accurately predict the performance and optimal training configurations for models up to 33B parameters.

No results found