main body (of a book)
Empty the gradient before each batch, otherwise the gradients of different batch will be accumulated together, resulting in wrong model parameters.
We then move both the input and target tensor to the desired device and set the gradient of the model to zero. We call themodel(inputs)
to compute the output of the model and use a loss function (in this case cross entropy) to compute the error between the output and the target. We then calculate the error between the output and the target by calling()
to compute the gradient and finally call()
to update the parameters of the model.
During training, we also calculated the accuracy and average loss. We return these values and use them to track training progress.
assessment model
We also need a test function for evaluating the performance of the model on the test dataset.
The following is the code for this function:
def test(model, criterion, test_loader, device): () test_loss = 0 correct = 0 total = 0 with torch.no_grad(): for batch_idx, (inputs, targets) in enumerate(test_loader): inputs, targets = (device), (device) outputs = model(inputs) loss = criterion(outputs, targets) test_loss += () _, predicted = (1) total += (0) correct += (targets).sum().item() acc = 100 * correct / total avg_loss = test_loss / len(test_loader) return acc, avg_loss
In the test function, we define awith torch.no_grad()
Block. This is because we want to speed up the execution of the model and save memory by not computing the gradient when performing forward passes on the test set.
The inputs and targets are also moved to the desired device. We compute the output of the model and use a loss function (in this case cross-entropy) to compute the error between the output and the target. We evaluate the performance of the model by accumulating the losses and then calculating the accuracy and average loss.
Training the ResNet50 model
Next, we need to train the ResNet50 model. Pass the data loader to the training loop, along with some other parameters such as the number of training cycles and the learning rate.
Here is the full training code:
num_epochs = 10 learning_rate = 0.001 train_loader = DataLoader(train_set, batch_size=64, shuffle=True, num_workers=2) test_loader = DataLoader(test_set, batch_size=64, shuffle=False, num_workers=2) device = ("cuda" if .is_available() else "cpu") model = ResNet(num_classes=1000).to(device) criterion = () optimizer = ((), lr=learning_rate) for epoch in range(1, num_epochs + 1): train_acc, train_loss = train(model, optimizer, criterion, train_loader, device) test_acc, test_loss = test(model, criterion, test_loader, device) print(f"Epoch {epoch} Train Accuracy: {train_acc:.2f}% Train Loss: {train_loss:.5f} Test Accuracy: {test_acc:.2f}% Test Loss: {test_loss:.5f}") # Save the model if epoch == num_epochs or epoch % 5 == 0: (model.state_dict(), f"resnet-epoch-{epoch}.ckpt")
In the code above, we first define thenum_epochs
cap (a poem)learning_rate
. We used two data loaders, one for the training set and the other for the test set. Then we moved the model to the desired device and defined the loss function and optimizer.
In cycles, we train the model one at a time and compute the accuracy and average loss on the train and test datasets. These values are then printed out and the model parameters are optionally saved every five cycles.
You can try training your own image data using the ResNet50 model and further improve the model accuracy by increasing the learning rate, increasing the training period, etc. You can also adjust the architecture of ResNet and compare the performance. It is also possible to adapt the architecture of ResNet and compare performance, for example by using deeper networks such as ResNet101 and ResNet152.
Above is a detailed explanation of the use of Pytorch to achieve the details of the ResNet network, more information about Pytorch ResNet network please pay attention to my other related articles!