Initializing kaiming distributions for Pytorch neural networks in detail

The gain value of the function

.calculate_gain(nonlinearity, param=None)

Calculation of gain values for nonlinear functions is provided.

The gain value gain is a proportional value to regulate the relationship between the input order of magnitude and the output order of magnitude.

fan_incap (a poem)fan_out

pytorchcountfan_incap (a poem)fan_outsource code


def _calculate_fan_in_and_fan_out(tensor):
 dimensions = ()
 if dimensions < 2:
  raise ValueError("Fan in and fan out can not be computed 
  for tensor with fewer than 2 dimensions")

 if dimensions == 2: # Linear
  fan_in = (1)
  fan_out = (0)
 else:
  num_input_fmaps = (1)
  num_output_fmaps = (0)
  receptive_field_size = 1
  if () > 2:
   receptive_field_size = tensor[0][0].numel()
  fan_in = num_input_fmaps * receptive_field_size
  fan_out = num_output_fmaps * receptive_field_size

 return fan_in, fan_out

xavier distribution

xavier distribution resolution:/2016/03/29/understanding-xavier-initialization-in-deep-neural-networks/

It is assumed that a sigmoid function is used. When the weight values (values refer to absolute values) are too small, the variance of the input values decreases with each passing network layer, and the weighted sum of each layer is very small, and the region in the 0 attachment of the sigmoid function is equivalent to a linear function, and loses the nonlinearity of the DNN.

When the value of the weights is too large, the variance of the input value will rise rapidly after each layer, and the output value of each layer will be very large, at which time the gradient of each layer will tend to 0.

The xavier initialization can make the input values x x x<math><semantics><mrow><mi>x</mi></mrow><annotation encoding="application /x-tex">x</annotation></semantics></math>Output value of x-variance after passing through network layer y y y y<math><semantics><mrow><mi>y </mi></mrow><annotation encoding="application/x-tex">y</annotation></semantics></math>y variance is unchanged.

(1) Uniform distribution of xavier

.xavier_uniform_(tensor, gain=1)

Also known as Glorot initialization.

>>> w = (3, 5)
>>> .xavier_uniform_(w, gain=.calculate_gain('relu'))

(2) xavier normal distribution

.xavier_normal_(tensor, gain=1)

Also known as Glorot initialization.

kaiming distribution

Xavier performs well in tanh but poorly in Relu activation function, so Kaiming He proposed an initialization method for relu. pytorch initializes the convolutional layer parameters by default using kaiming normal distribution.

(1) Uniform distribution of kaiming

.kaiming_uniform_
 (tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')

Also known as He initialization.

a – the negative slope of the rectifier used after this layer (0 for ReLU by default).The negative slope of the activation function，

mode – either ‘fan_in' (default) or ‘fan_out'. Choosing fan_in preserves the magnitude of the variance of the weights in the forward pass. Choosing fan_out preserves the magnitudes in the backwards

pass. defaults to fan_in mode. fan_in maintains the order of magnitude of the weight variance for forward propagation and fan_out maintains the order of magnitude of the weight variance for backward propagation.

>>> w = (3, 5)
>>> .kaiming_uniform_(w, mode='fan_in', nonlinearity='relu')

(2) kaiming normal distribution

.kaiming_normal_
 (tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')

Also known as He initialization.

 >>> w = (3, 5)
>>> .kaiming_normal_(w, mode='fan_out', nonlinearity='relu')

The above this on Pytorch neural network initialization kaiming distribution details is all I share with you, I hope it can give you a reference, and I hope you support me more.