SoFunction
Updated on 2024-12-16

Details about the preprocessing of Pytorch's MNIST dataset

Details about the preprocessing of Pytorch's MNIST dataset

MNIST has a 99.7% accuracy rate

Implementation of Convolutional Neural Network (CNN) for MNIST with various techniques like data augmentation, loss, pseudo-randomization etc.

OS: ubuntu18.04

Graphics card: GTX1080ti

python version: 2.7 (3.7)

network infrastructure

A CNN with 4 layers has the following architecture.

Input layer: 784 nodes (MNIST image size)

First convolutional layer: 5x5x32

First largest pool layer

Second convolutional layer: 5x5x64

Second largest pool layer

Third fully connected layer: 1024 nodes

Output layer: 10 nodes (number of classes in MNIST)

Tools for improving CNN performance

The following techniques are used to improve the performance of CNNs.

1. Data augmentation

Increase the amount of train data up to 5 times by

Random Rotation: Each image is rotated randomly within the range of [-15°, + 15°].

Random Shift: Each image is shifted randomly on both axes by a value in the range [-2pix, + 2pix].

Zero-center normalization: subtracts (PIXEL_DEPTH / 2) from the pixel value and divides by PIXEL_DEPTH.

2. Parameter initializers

Weight initializer: xaiver initializer

Deviation initial value setting item: Constant (zero) initial value setting item

3. Batch normalization

All convolved/fully connected layers use bulk normalization.

4. Dropout

The third fully-connected layer employes dropout technique.

5. Exponentially decayed learning rate

A learning rate is decayed every after one-epoch.

code section

Step 1: Understanding the MNIST dataset

The MNIST dataset is a handwriting dataset with 60,000 images, all of which are 28×28. Download the dataset at: datasetofficial.com. This dataset consists of four parts, which are:

: training set images (9912422 bytes) 
: training set labels (28881 bytes) 
: test set images (1648877 bytes) 
: test set labels (4542 bytes)

That is, one training image set, one training label set, one test image set, one test label set; we can see that this is not really a normal text file

Or an image file, but a zip file, downloaded and extracted, we see the binary file.

Step 2: Load the MNIST dataset

Let's start by introducing some library files

import torchvision,torch
import  as transforms
from  import Dataset, DataLoader
import  as plt

There are many ways to load MNIST datasets:

Method 1: Under pytorch, you can directly call the MNIST dataset inside (this is the officially written dataset class)

train = (root='./mnist/',train=True, transform= ())

The return value is a tuple (train_data,train_target) (there is also a pitfall in the use of this class, you must index with train[i] to use the transform function)

Generally used in conjunction with

dataloader = DataLoader(train, batch_size=50,shuffle=True, num_workers=4)
for step, (x, y) in enumerate(dataloader):
 b_x = 
 b_y = 
 print 'Step: ', step, '| dimension of train_data' ,b_x,'| dimension of train_target',b_y

As shown in the figure, the data of 60,000 images will be divided into 1200 parts, each containing 50 images, so that parallel processing of the data can effectively speed up the calculation speed

Depends on personal preference, I do not like this fixed data class, so want to be flexible, you can start to write their own data set class

Method 2: Set up your own dataset

The API wraps the dataset using pytorch related classes. The dataset related classes in pytorch are located in the package.

For this experiment, the following classes were used:

Use of the Dataset class: All classes should be subclasses of this class (i.e. should inherit from it). All subclasses should override the methods len(), getitem().

Python packages used

python package goal
numpy Matrix operations to transpose images
skimage Image Processing, Image I/O, Image Transformation
matplotlib Display of images, visualization
os Some file finding operations
torch pytorch
torvision pytorch

Importing related packages

import numpy as np
from skimage import io
from skimage import transform
import  as plt
import os
import torch
import torchvision
from  import Dataset, DataLoader
from  import transforms
from PIL import Image

Step one:

Define a subclass that inherits the Dataset class and overrides the __len()__, __getitem()__ methods.

The details:

1. a representation of a sample of the dataset: a dictionary of the form sample = {'img': img, 'target': target}.

Image reading: Use to carry out the reading, and the result after reading is the form.

Image transform: transform parameter

class MY_MNIST(Dataset):
 training_file = ''
 test_file = ''
 def __init__(self, root, transform=None):
   = transform
  ,  = (root)
 def __getitem__(self, index):
  img, target = [index], int([index])
  img = ((), mode='L')

  if  is not None:
   img = (img)
  img =()(img)

  sample = {'img': img, 'target': target}
  return sample
 def __len__(self):
  return len()
  
train = MY_MNIST(root='./mnist/MNIST/processed/', transform= None)

second step

Instantiate an object and read and display the dataset

for (cnt,i) in enumerate(train):
 image = i['img']
 label = i['target']
 ax = (4, 4, cnt+1)
 # ('off')
 ((0))
 ax.set_title(label)
 (0.001)
 if cnt ==15:
  break

The output is as follows , which indicates that we have written our own data set to read the image and read the results for the form of success!

Step 3 (optional)

Transformation of the data set: generally there are differences in the size, dimensions, brightness etc. of the collected images and the purpose of the transformation is to normalize the data. On the other hand, data enhancement can be done through transformations

For more information about transforms in pytorch, see the previous articles in this series.

Since the samples in the dataset are represented in the form of dictionary dicts. Therefore, it is not possible to call the methods in directly.

This experiment performed operations such as rotating, random cropping, and adjusting the color saturation light and dark of the image.

compose = ([
   (20),
   (),
   (20),
   (brightness=1, contrast=0.1, hue=0.5),
   # (),
   # ([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
   ])
train_transformed = MY_MNIST(root='./mnist/MNIST/processed/', transform= compose)

# Display the transformed image
for (cnt,i) in enumerate(train_transformed):
 image = i['img']
 # print image[0].sum()
 # image = compose(image)
 print 'sdsdadfasfasfasf',type(image)
 label = i['target']
 ax = (4, 4, cnt+1)
 # ('off')
 ((0))
 ax.set_title(label)
 (0.001)
 if cnt ==15:
  break

Do you notice any difference in the transformed image, compared to before?

Step 4: Wrapping with DataLoader

Why use DataLoader?

① The input to deep learning is of the form mini_batch

② Sample loading may need to randomize the order, shuffle operation

③ Sample loading requires multi-threading.

The DataLoader provided by pytorch encapsulates the above functionality, which makes it easier to use.

# Use DataLoader to take advantage of multi-threading, batch, shuffle, etc.
trainset_dataloader = DataLoader(dataset=transformed_trainset,
         batch_size=4,
         shuffle=True,
         num_workers=4)

Visualization:

dataloader = DataLoader(train, batch_size=50,shuffle=True, num_workers=4)

After being wrapped by the DataLoader, the samples are output as min_batch and the order is randomized.

for step, i in enumerate(dataloader):
 b_x = i['img'].shape
 b_y = i['target'].shape
 print 'Step: ', step, '| dimension of train_data' ,b_x,'| dimension of train_target',b_y

As shown in the picture size has been cropped to 20 * 20, and parallel processing so that 60,000 data in 3 seconds can be processed, the efficiency is very high!

Step: 1186 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,)
Step: 1187 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,)
Step: 1188 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,)
Step: 1189 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,)
Step: 1190 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,)
Step: 1191 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,)
Step: 1192 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,)
Step: 1193 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,)
Step: 1194 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,)
Step: 1195 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,)
Step: 1196 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,)
Step: 1197 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,)
Step: 1198 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,)
Step: 1199 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,)

To be continued...

The above article on Pytorch's MNIST dataset preprocessing details is all I have shared with you, I hope it can give you a reference, and I hope you will support me more.