Initialization techniques used for parameters

Why the need?

We need to initialize the parameters (weights, bias) while training a model. If we initialize all the parameters identical, we will see no learning, as all the neurons will learn the same thing. So we need randomness. If we use random numbers, then the gradients might explode or vanish. So we need special techniques.

He-Initialization

It is a method used specifically for ReLU activation functions, reason being that ReLU roughly cancels half of the variation at each layer, as we can see through the formulation ReLU(x) = max(0, x). So, He initialization compensates this by increasing the weights. The he initialization method is calculated as a random number with a Gaussian probability distribution (G) with a mean of 0.0 and a standard deviation of sqrt(2/n), where n is the number of inputs to the node.