Neural Networks

Different results on different batch sizes? Culprit : Batch Normalization

We can train out model with or without using Batch Norm. Basically Batch Norm normalizes the activations (if used after each layer then it will normalize the activations after every layer). For this purpose it will learn two additional parameters in each of its use, mean and standard deviation. This extra flexibility helps to represent identity transformation and preserve the…

