SoFunction
Updated on 2024-11-17

Pytorch Training Neural Networks for Deep Learning with Google Colab

Preface to the study

Colab is a cloud learning platform provided by Google, Very Nice, recently the card is not enough to use decided to go to whoring wave. The blog will only explain how to use Colab to train on existing deep learning repositories, and will not say how to access the extranet, how to register, etc..

The blog is only to demonstrate the use of Colab, mainly for you to familiarize yourself with the operation of Colab, specific problems and specific analysis, improper operation and version iteration will lead to errors in the steps, if the error is recommended more Baidu, more code and instructions, check the cause of the error, and it is recommended that there is a certain basis for students to use Colab again!

What is Google Colab

Google Colab is a free Jupyter laptop environment provided by Google , no need what settings and environment configuration can be used , completely in the cloud to run . Does not affect the local use.

Google Colab offers researchers a certain free GPU to write and execute code, all of which is available for free through a browser. Students can easily run deep learning frameworks such as Tensorflow and Pytorch on it.

Although Google Colab offers some free resources, the amount of resources is limited and all Colab runtimes are reset after a certain period of time.Colab Pro subscribers will still have limited usage, but it is about double the limit available to non-subscribers.Colab Pro+ subscribers also enjoy a higher level of stability.

Related links

Colab's official website: / (requires extranet for access)

ipynb Github:/bubbliiiing/Colab

Training with Colab

In this paper, we take the training of YoloV4-Tiny-Pytorch version as an example to demonstrate the use of Colab.

I. Uploading of dataset and pre-training weights

1. Uploading of data sets

Colab and Google's own cloud disk linkage is very good, so we need to first upload the dataset to the cloud disk, this upload process is actually very simple, local first prepare the dataset.

Since the libraries I have uploaded all use the VOC dataset, we need to lay them out according to the VOC dataset

In this paper, the VOC07+12 dataset is directly demonstrated as an example.

JPEGImages are stored as image files, Annotations are stored as label files, and ImageSets are stored as txt files distinguishing between validation set, training set, and test set.

The entire VOCdevkit file is then packaged. It is important to note that it is not the three folders above that are packaged, but the VOCdevkit that is packaged so that it meets the format for data processing.

After getting the packed zip, upload the zip to Google Cloud Drive. I created a new VOC_datasets folder on Google Cloud Drive to store the zip.

At this point the upload of the dataset is complete.

2. Uploading of pre-training weights

Do the folder creation on Google Cloud Drive, first create Models, then create yolov4-tiny-pytorch inside Models, then create logs and model_data inside yolov4-tiny-pytorch.

model_data holds the pre-training files.

The logs place the weights generated during the training of the network.

Since we are using the YoloV4-Tiny-Pytorch library this time, we upload its pre-training weights to the model_data folder.

Second, open Colab and configure the environment

1. Notebook creation

In that step, we first open the official website of Colab.

Then click on File, Create Notebook and a jupyter notebook will be created at this point.

When you're done creating it rename the file to look better.

After that, click on Code Execution Program, then click on Change Runtime Type, and select GPU in the Hardware Accelerators section of it, and Colab will configure a machine with a GPU, at which point the laptop will be created.

2. Simple configuration of the environment

colab has an integrated pytorch environment and does not need to be configured specifically for pytorch, although a newer version of torch is used.

Since our dataset is on a Google Cloud Drive, we also have to mount the cloud drive.

from  import drive
('/content/gdrive')

We enter the above code into our laptop and execute it. Mount the cloud disk to the server. Then just click run.

At this point click on the left column, something like a folder, and you can open the folder to see the file deployment. gdrive is the Google Cloud Drive we configured. If you don't have it, go to the left side and refresh it.

Open gdrive, which has our dataset.

3、Download of deep learning libraries

In this step, we need to complete the download of the deep learning repository, which we do using the git clone command. After executing the following command, the yolov4-tiny-pytorch folder will appear in the files on the left. If it's not there, go to the left side and refresh it.

We then moved the root directory to the yolov4-tiny-pytorch folder via the cd command.

!git clone /bubbliiiing/
%cd yolov4-tiny-pytorch/

4. Copying and decompression of data sets

Laying out the dataset directly on Google Cloud Drive will result in a large amount of cloud data transfer, which is much slower than local files, so we need to copy the dataset locally for processing.

We enter the following code to copy and decompress the files. The first command executed is the delete command, which deletes the original empty VOCdevkit folder. Then the decompression is performed.

Since the file used here is a zip file so use the unzip command, if it is other forms of compressed packages, you need to modify the command according to the format of the compressed package (please students Baidu). Execute the following command, you can find that the left file has been decompressed VOC dataset. If not, go to the left side to refresh.

!rm -rf ./VOCdevkit
!cp /content/gdrive/MyDrive/VOC_datasets/VOC07+12+ ./
!unzip ./VOC07+12+ -d ./

5、Save path setting

The default save path for the code provided in this article is the logs folder, but there is an instability issue with Colab, where disconnection occurs after a period of time running.

If the weights are saved in the logs folder in the original root directory, the disconnected network is trained in vain when it occurs, wasting a lot of time.

You can soft connect google cloud drive to the root directory, then the weights remain in the cloud drive even if you disconnect.

This article was preceded by the creation of the logs folder in the cloud drive. Link that folder over.

!rm -rf logs
!ln -s /content/gdrive/MyDrive/Models/yolov4-tiny-pytorch/logs logs

III. Starting training

1. Handling of labeled documents

Open the voc_annotation.py file, and since we're now using the VOC dataset directly, and we've already divided the training set validation set and test set, we'll set the annotation_mode to 2.

Then enter the commands to complete the labeling process and generate 2007_train.txt and 2007_val.txt.

!python voc_annotation.py

2. Handling of training documents

Processing the training file consists of three main sections:

1. Use of pre-training documents.

2, save the cycle of settings, this setting is because the cloud disk storage space is limited, every generation will lead to save storage space full out.

a. Use of pre-training documents

First modify the model_path to point to the weights file we uploaded to Google Cloud Drive. In the left file bar, find models/yolov4-tiny-pytorch/model_data and copy the weights path.

Replace model_path on the right.

b. Setting of the retention period

There are some repositories have completed the update, added every how many generations of the save parameter, directly modify the save_period can be, in this article, we set the save_period to 4, that is, every 4 generations to save.

Warehouses that haven't been updated yet can only be saved for each generation, remember to go to the google cloud drive and delete them once in a while.

3. Start training

Type it inside the notebook at this point:

!python 

Ready to start training.

What about disconnections?

1. Anti-dropout measures

I've heard that you can reduce the frequency of dropouts by auto-clicking.

In Google colab's press F12, click on the webpage's console and paste the following code:

function ConnectButton(){
	("Connect pushed");
	("#top-toolbar > colab-connect-button").("#connect").click()
}
setInterval(ConnectButton,60000);

2, finished or dropped ah?

There's not much you can do about it. Cheap stuff is bound to have its downside.

Just follow the steps and do it again, then set the pre-training weights to the trained weights file inside the logs folder.

In addition to this, parameters such as Init_epoch need to be adjusted.

summarize

Training with Colab is more important to deal with the relationship between the path, find which file where, where the folder's execution directory, you can run up the program is relatively simple, but Colab does have the problem of disconnection, we need to save the file at all times, so I will save the weights directly on the cloud drive, so that it will not be lost.

Above is Pytorch using Google Colab to train neural networks deep learning in detail, more information about Pytorch training Google Colab neural networks please pay attention to my other related articles!