Implementation of PyTorch's Optimizer training tool

is a library that implements various optimization algorithms. Most of the commonly used methods are supported and the interface is general enough to allow the integration of more complex methods in the future.

To use it, an optimizer object must be constructed. This object stores the current state of the parameters and updates them based on the computed gradient.

Example:

optimizer = ((), lr = 0.01, momentum=0.9)
optimizer = ([var1, var2], lr = 0.0001)

constructor method

Optimizer's __init__ function takes two arguments: the first is the parameter to be optimized, which must be in the form of a Tensor or dict; the second is the optimization options, including the learning rate, decay rate, and so on.

The optimized parameter is usually (), when there is a special need you can manually write a dict as input.

Example:

([
  {'params': ()},
  {'params': (), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)

This means that the majority of the parameters use a learning rate of 1e-2, while the parameters use a learning rate of 1e-3, and a momentum of 0.9 is used for all parameters.

gradient control

Before backpropagation, the gradient must be cleared with zero_grad(). This is done by iterating through all the arguments in self.param_groups and clearing them according to the grad property.

Example:

for input, target in dataset:
  def closure():
    optimizer.zero_grad()
    output = model(input)
    loss = loss_fn(output, target)
    ()
    return loss
  (closure)

Adjustment of learning rates

lr_scheduler is used to flexibly adjust the learning rate according to the number of rounds in the training process. There are many ways to adjust the learning rate, but the way to use it is more or less the same: use a Schedule to decorate the original Optimizer, and then input some relevant parameters, and then use this Schedule to do step().

Let's take LambdaLR as an example:

lambda1 = lambda epoch: epoch // 30
lambda2 = lambda epoch: 0.95 ** epoch
scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
for epoch in range(100):
 train(...)
 validate(...)
 ()

Two optimizers are used above

Optimization methods

Algorithms implemented in the optim library include Adadelta, Adagrad, Adam, discrete tensor-based Adam, ∞ \infty∞ paradigm-based Adam (Adamax), Averaged SGD, L-BFGS, RMSProp, resilient BP, and Nesterov-based SGD algorithms.

Take SGD as an example:

optimizer = ((), lr=0.1, momentum=0.9)
optimizer.zero_grad()
loss_fn(model(input), target).backward()
()

Other methods are used in the same way:

opt_Adam = (net_Adam.parameters(), lr=0.1, betas=(0.9, 0.99)
opt_RMSprop = (net_RMSprop.parameters(), lr=0.1, alpha=0.9)
...
...

This is the whole content of this article.