pytorch implements the way different layers of the model set different learning rates

In target detection model training, we usually have a feature extraction network backbone, such as darknet SSD using VGG-16 in YOLO.

In order to achieve better training results, the pre-trained backbone model parameters are often loaded, and then the detection network is trained on this basis, and the backbone is fine-tuned, which requires a small lr for the backbone.

class net():
  def __init__(self):
    super(net, self).__init__()
    # backbone
     = ...
    # detect
    self....

When setting up the optimizer, it is only necessary to divide the parameters into two parts and give different learning rates lr for each.

base_params = list(map(id, ()))
logits_params = filter(lambda p: id(p) not in base_params, ())
params = [
  {"params": logits_params, "lr": },
  {"params": (), "lr": config.backbone_lr},
]
optimizer = (params, momentum=, weight_decay=config.weight_decay)

Above this pytorch implementation model different layers set different learning rate way is all I share with you, I hope to give you a reference, and I hope you support me more.