"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift." Normalizes layer inputs during training, enabling much higher learning rates and reducing sensitivity to initialization. Achieves same accuracy with 14x fewer training steps.

Became a standard component in virtually all deep neural networks. One of the most-cited papers in deep learning (~50K+ citations). ICML 2015. By Ioffe and Szegedy.

Paper

arXiv: 1502.03167

Venue: ICML 2015

foundational