Steps to build a model using Pytorch

Originally it was only Tenorflow, but because some Numpy features are not supported by TF, such as slicing arrays using lists, so you can only turn to Pytorch (pytorch is supported). Fortunately, Pytorch is easier to get started, almost perfect copy of Numpy's features (but there are still some features are not supported), no wonder the heat rises so quickly.

1 Model definition

Much like TF, Pytorch also builds custom models by inheriting from the parent class, again implementing two methods. In TF it is __init__() and call(), in Pytorch it is __init__() and forward(). The functions are similar, both initialize the internal structure of the model and perform inference respectively. Other functions, such as calculating the loss and training functions, can be inherited, but of course this is optional. The following is a demo to determine MNIST handwriting, first give the model code:

import numpy as np
import  as plt 
import torch 
from torch import nn,optim 
from torchsummary import summary 
from  import mnist
from  import to_categorical
device = ('cuda') #——————1——————
 
class ModelTest():
 def __init__(self,device):
  super().__init__() 
  self.layer1 = ((),(28*28,512),())#——————2——————
  self.layer2 = ((512,512),()) 
  self.layer3 = ((512,512),())
  self.layer4 = ((512,10),()) 

  (device) #——————3——————
   = ((),lr=0.01)#——————4——————
 def forward(self,inputs): #——————5——————
  x = self.layer1(inputs)
  x = self.layer2(x)
  x = self.layer3(x)
  x = self.layer4(x)
  return x 
 def get_loss(self,true_labels,predicts): 
  loss = -true_labels * (predicts) #——————6——————
  loss = (loss)
  return loss
 def train(self,imgs,labels): 
  predicts = model(imgs) 
  loss = self.get_loss(labels,predicts)
  .zero_grad()#——————7——————
  ()#——————8——————
  ()#——————9——————
model = ModelTest(device)
summary(model,(1,28,28),3,device='cuda') #——————10——————

#1: Get the device to facilitate memory migration of models and variables later, there are only two types of device names: 'cuda' and 'cpu'. This is usually needed if you have a GPU, so that you can migrate variables from main memory to video memory if needed. If you don't have a GPU, it's fine if you don't get it, pytorch will save all the parameters in main memory by default.

#2: Definition of layers in the model, Sequential can be used to centrally represent the layers that you want to manage uniformly as one layer.

#3: Migrate model parameters to GPU memory in initialization to accelerate computation, of course you can also migrate externally in execution (device) if needed.

#4: Define the model's optimizer, unlike TF, pytorch needs to pass in the parameters that require gradient descent at the time of definition, which is where (), indicating all the parameters of the current model. You don't actually have to worry about the order in which you define the optimizer and the model parameters, because the output of the () is not an instance of the model parameters, but a pointer to the entire model parameter object, so even if you define the optimizer after defining a layer, it will still be able to optimize to. Of course the optimizer you can also define externally, just pass in (). Here a stochastic gradient descent is defined.

#5: Forward propagation of the model, similar to TF's call(), it is this function that is executed by defining the model().

#6: I have integrated the function to obtain the loss into the model, where the cross entropy between the true and predicted labels is calculated.

#7/8/9: In TF, the parameter gradients are stored in the gradient bands, while in pytorch, the parameter gradients are each integrated into the corresponding parameter, which can be used to view them. Each time backward() is performed on the loss, pytorch superimposes (directly sums) the gradients of all trainable parameters involved in the calculation of the loss with respect to the loss. So if we don't have the intention to superimpose the gradients, then we have to remove the previous gradients before backward(). And since we have passed all the parameters to be trained into the optimizer earlier, using zero_grad() on the optimizer will zero out any existing gradients in all the parameters to be trained. So when is gradient stacking used? For example, batch gradient descent, when the memory is not enough to directly calculate the gradient of the entire batch, we can only divide the batch into a part of a part of the calculation, every part of the calculation to get the loss on the backward () once, so as to get the gradient of the entire batch. After the gradient is calculated, the optimizer's step() is executed, and the optimizer performs a step of optimization based on the gradient of the trainable parameters.

#10: Use the torchsummary function to display the model structure. Strange why you don't inherit this inside torch and have to reinstall a torchsummary library.

2 Training and visualization

Next the model was trained using the model, since the MNIST dataset that comes with pytorch doesn't work well, I used the one that comes with Keras, defining a generator to get the data. Here is the full training and plotting code (50 iterations to record an accuracy):

import numpy as np
import  as plt 
import torch 
from torch import nn,optim 
from torchsummary import summary 
from  import mnist
from  import to_categorical
device = ('cuda') #——————1——————
 
class ModelTest():
 def __init__(self,device):
  super().__init__() 
  self.layer1 = ((),(28*28,512),())#——————2——————
  self.layer2 = ((512,512),()) 
  self.layer3 = ((512,512),())
  self.layer4 = ((512,10),()) 

  (device) #——————3——————
   = ((),lr=0.01)#——————4——————
 def forward(self,inputs): #——————5——————
  x = self.layer1(inputs)
  x = self.layer2(x)
  x = self.layer3(x)
  x = self.layer4(x)
  return x 
 def get_loss(self,true_labels,predicts): 
  loss = -true_labels * (predicts) #——————6——————
  loss = (loss)
  return loss
 def train(self,imgs,labels): 
  predicts = model(imgs) 
  loss = self.get_loss(labels,predicts)
  .zero_grad()#——————7——————
  ()#——————8——————
  ()#——————9——————
def get_data(device,is_train = True, batch = 1024, num = 10000):
 train_data,test_data = mnist.load_data()
 if is_train:
  imgs,labels = train_data
 else:
  imgs,labels = test_data 
 imgs = (imgs/255*2-1)[:,,...]
 labels = to_categorical(labels,10) 
 imgs = (imgs,dtype=torch.float32).to(device)
 labels = (labels,dtype=torch.float32).to(device)
 i = 0
 while(True):
  i += batch
  if i > num:
   i = batch 
  yield imgs[i-batch:i],labels[i-batch:i] 
train_dg = get_data(device, True,batch=4096,num=60000) 
test_dg = get_data(device, False,batch=5000,num=10000) 

model = ModelTest(device) 
summary(model,(1,28,28),11,device='cuda') 
ACCs = []
import time
start = ()
for j in range(20000):
 #Training
 imgs,labels = next(train_dg)
 (imgs,labels)

 #Verify
 img,label = next(test_dg)
 predicts = model(img) 
 acc = 1 - torch.count_nonzero((predicts,axis=1) - (label,axis=1))/[0]
 if j % 50 == 0:
  t = () - start
  start = ()
  (().numpy())
  print(j,t,'ACC: ',acc)
# Drawing
x = (0,len(ACCs),len(ACCs))
(x,ACCs)

A graph of the change in accuracy is shown below:

3 Other tips for use

3.1 tensor vs. array

Note that pytorch's tensor is based on numpy's array, and they share memory. That is to say, if you insert a tensor into a list, when you modify the tensor, the tensor in the list will also be modified; what is even easier to ignore is that even if you use () to convert the tensor to an array before inserting it into the list, when you modify the original tensor, the array in the list will still be modified. When you modify the original tensor, this array in the list will still be modified. So if we just want to save the value of the tensor instead of the whole object, we have to copy the value of the tensor using (tensor).

3.2 Customization layer

In TF, custom models usually inherit from keras's Model, while custom layers are inherited. Inheriting from different parent classes usually causes trouble for beginners. In pytorch, on the other hand, custom layers are inherited, as are custom models. pytorch treats both layers and models as modules, which is easy to understand. Indeed, there is no clear boundary between layers and models. And the way to define the same way as the above definition of the model, but also the implementation of two functions can be. The code example is as follows:

import torch  
from torch import nn 

class ParaDeconv():#——————1——————
 def __init__(self,in_n,out_n):
  super().__init__() 
   = ((0,0.01,size = [in_n,out_n]),requires_grad=True)
   = ((0,0.01,size = [out_n]),requires_grad=True) 
 def forward(self,inputs):
  x = (inputs,)
  x = x + 
  return x 
layer = ParaDeconv(2,3)
y = layer((100,2))#——————2——————
loss = (y)#——————3——————
()#——————4——————
for i in ():#——————5——————
 print()#——————6——————

#1: Customize a fully connected layer. The definition of trainable parameters in the layer is used, if used directly they cannot be traversed in #5.

#2/3/4: Input and compute the loss, then backpropagate to compute the parameter gradient.

#5/6: Output the gradient of the layer parameters after completing backpropagation.

The layers defined above can be inserted directly into the model and used in the same way as the layers that come with pytorch.

3.3 Save/Load

3.3.1 Saving/loading models

There are two ways to do this, one is to save the parameters of the model:

(model.state_dict(), PATH)         #Save
model.load_state_dict((PATH),strict=True) #(of cargo etc) load

This loading method requires you to define the model first and then load the parameters. If the parameter name of the model you define doesn't match the saved parameter, it will give an error. But if you change strict to False, not strict match, it will only match the key value on the correspondence, and won't report error for extra or missing parameters.

The other is to save the model directly:

(model, PATH) #Save
model = (PATH) #(of cargo etc) load

This approach may seem convenient, but it is actually more error prone. Because python can't save the entire model's classes, it can only save the location of the code file that defines the class to get the structure of the class at load time. If you change the location of the code that defines the class, you risk getting an error because you can't find the class.

3.3.2 Saving of training points
When you want to save the state of a training phase, such as containing optimizer parameters, model parameters, number of training iterations, etc., you can do the following:

#Save training points
({
      'epoch': epoch,
      'model_state_dict': model.state_dict(),
      'optimizer_state_dict': optimizer.state_dict(),
      'loss': loss
      }, PATH)
#Load training points
model = TheModelClass(*args, **kwargs)
optimizer = TheOptimizerClass(*args, **kwargs)

checkpoint = (PATH)

model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

As with saving models, () is also used. It's flexible enough to save dictionaries, so it's just as easy to read by dictionary index when reading. Note of course that not any type can be saved, the four types saved here are:

1. int

4. list

3.4 Modifying model parameters

Pytorch doesn't provide an additional way for us to modify model parameters, we can use the way we loaded the model parameters above to modify them. For a given parameter, we just pass in load_state_dict with the key value and the corresponding value to be modified in a dictionary. If you don't pass in all the parameters, remember to set strict to False. example below:

model.load_state_dict({'weight':([0.])},strict=False) #Modifying Model Parameters

The parameter name, that is, the key value, and the corresponding parameter shape can be viewed via model.state_dict().

Above is the detailed content of the steps to build a model using Pytorch, for more information about building a model with Pytorch, please pay attention to my other related articles!