Since its inception in 2015 by Ioffe and Szegedy, Batch Normalization has gained popularity among Deep Learning practitioners as a technique to achieve faster convergence by reducing the internal covariate shift and to some extent regularizing the network. We discuss the salient features of the paper followed by calculation of derivatives for backpropagation through the Batch Normalization layer. Lastly, we explain an efficient implementation of backpropagation using Python and Numpy.

Read this article