SoFunction
Updated on 2024-11-16

How pytorch trains on GPUs

1. Network models transferred to CUDA

net = AlexNet()
()#switch toCUDAfirst (of multiple parts)

2. Transferring the loss to CUDA

criterion = ()
criterion = ()

It's okay to not do this step, because the loss is calculated based on out, label

loss = criterion(out, label)

As long as out and label are on CUDA, the loss is naturally on CUDA as well, but it was found that the accuracy was surprisingly 1% lower without transferring to CUDA

3. Transferring datasets to CUDA

Here's an explanation of how the dataset is used

#download the dataset
train_set = CIFAR10("./data_cifar10", train=True, transform=data_tf, download=True)
train_data = (train_set, batch_size=64, shuffle=True)

The dataset is a large multidimensional array of all inputs and labels.

The dataloader is sampled in this large multidimensional array to make batches, which are used to train the

    for im, label in train_data:
        i = i + 1
        im = ()#Migrate data to CUDA
        im = Variable(im)# Putting data into a Variable
        label = ()
        label =Variable(label)
        out = net(im)#the output should have the size of (N,10)

When traversing the batch, the first thing to do is to transfer the taken Image and label to CUDA, so that the next calculations are on CUDA

In the beginning, only migrate to CUDA after converting to Variable, so that the data is not on CUDA during the network propagation, so it keeps reporting errors

Specify gpu graphics card when training network

See what gpu's are available

nvidia -smi

Live view gpu info 1 rep refreshes every 1 second

watch -n -1 nvidia -smi

Specify the gpu to use

import os
# Using the first vs. third GPU card
["CUDA_VISIBLE_DEVICES"] = "0,3"

The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.