Details about the preprocessing of Pytorch's MNIST dataset
MNIST has a 99.7% accuracy rate
Implementation of Convolutional Neural Network (CNN) for MNIST with various techniques like data augmentation, loss, pseudo-randomization etc.
OS: ubuntu18.04
Graphics card: GTX1080ti
python version: 2.7 (3.7)
network infrastructure
A CNN with 4 layers has the following architecture.
Input layer: 784 nodes (MNIST image size)
First convolutional layer: 5x5x32
First largest pool layer
Second convolutional layer: 5x5x64
Second largest pool layer
Third fully connected layer: 1024 nodes
Output layer: 10 nodes (number of classes in MNIST)
Tools for improving CNN performance
The following techniques are used to improve the performance of CNNs.
1. Data augmentation
Increase the amount of train data up to 5 times by
Random Rotation: Each image is rotated randomly within the range of [-15°, + 15°].
Random Shift: Each image is shifted randomly on both axes by a value in the range [-2pix, + 2pix].
Zero-center normalization: subtracts (PIXEL_DEPTH / 2) from the pixel value and divides by PIXEL_DEPTH.
2. Parameter initializers
Weight initializer: xaiver initializer
Deviation initial value setting item: Constant (zero) initial value setting item
3. Batch normalization
All convolved/fully connected layers use bulk normalization.
4. Dropout
The third fully-connected layer employes dropout technique.
5. Exponentially decayed learning rate
A learning rate is decayed every after one-epoch.
code section
Step 1: Understanding the MNIST dataset
The MNIST dataset is a handwriting dataset with 60,000 images, all of which are 28×28. Download the dataset at: datasetofficial.com. This dataset consists of four parts, which are:
: training set images (9912422 bytes) : training set labels (28881 bytes) : test set images (1648877 bytes) : test set labels (4542 bytes)
That is, one training image set, one training label set, one test image set, one test label set; we can see that this is not really a normal text file
Or an image file, but a zip file, downloaded and extracted, we see the binary file.
Step 2: Load the MNIST dataset
Let's start by introducing some library files
import torchvision,torch import as transforms from import Dataset, DataLoader import as plt
There are many ways to load MNIST datasets:
Method 1: Under pytorch, you can directly call the MNIST dataset inside (this is the officially written dataset class)
train = (root='./mnist/',train=True, transform= ())
The return value is a tuple (train_data,train_target) (there is also a pitfall in the use of this class, you must index with train[i] to use the transform function)
Generally used in conjunction with
dataloader = DataLoader(train, batch_size=50,shuffle=True, num_workers=4) for step, (x, y) in enumerate(dataloader): b_x = b_y = print 'Step: ', step, '| dimension of train_data' ,b_x,'| dimension of train_target',b_y
As shown in the figure, the data of 60,000 images will be divided into 1200 parts, each containing 50 images, so that parallel processing of the data can effectively speed up the calculation speed
Depends on personal preference, I do not like this fixed data class, so want to be flexible, you can start to write their own data set class
Method 2: Set up your own dataset
The API wraps the dataset using pytorch related classes. The dataset related classes in pytorch are located in the package.
For this experiment, the following classes were used:
Use of the Dataset class: All classes should be subclasses of this class (i.e. should inherit from it). All subclasses should override the methods len(), getitem().
Python packages used
python package | goal |
---|---|
numpy |
Matrix operations to transpose images |
skimage |
Image Processing, Image I/O, Image Transformation |
matplotlib |
Display of images, visualization |
os |
Some file finding operations |
torch |
pytorch |
torvision |
pytorch |
Importing related packages
import numpy as np from skimage import io from skimage import transform import as plt import os import torch import torchvision from import Dataset, DataLoader from import transforms from PIL import Image
Step one:
Define a subclass that inherits the Dataset class and overrides the __len()__, __getitem()__ methods.
The details:
1. a representation of a sample of the dataset: a dictionary of the form sample = {'img': img, 'target': target}.
Image reading: Use to carry out the reading, and the result after reading is the form.
Image transform: transform parameter
class MY_MNIST(Dataset): training_file = '' test_file = '' def __init__(self, root, transform=None): = transform , = (root) def __getitem__(self, index): img, target = [index], int([index]) img = ((), mode='L') if is not None: img = (img) img =()(img) sample = {'img': img, 'target': target} return sample def __len__(self): return len() train = MY_MNIST(root='./mnist/MNIST/processed/', transform= None)
second step
Instantiate an object and read and display the dataset
for (cnt,i) in enumerate(train): image = i['img'] label = i['target'] ax = (4, 4, cnt+1) # ('off') ((0)) ax.set_title(label) (0.001) if cnt ==15: break
The output is as follows , which indicates that we have written our own data set to read the image and read the results for the form of success!
Step 3 (optional)
Transformation of the data set: generally there are differences in the size, dimensions, brightness etc. of the collected images and the purpose of the transformation is to normalize the data. On the other hand, data enhancement can be done through transformations
For more information about transforms in pytorch, see the previous articles in this series.
Since the samples in the dataset are represented in the form of dictionary dicts. Therefore, it is not possible to call the methods in directly.
This experiment performed operations such as rotating, random cropping, and adjusting the color saturation light and dark of the image.
compose = ([ (20), (), (20), (brightness=1, contrast=0.1, hue=0.5), # (), # ([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) ]) train_transformed = MY_MNIST(root='./mnist/MNIST/processed/', transform= compose) # Display the transformed image for (cnt,i) in enumerate(train_transformed): image = i['img'] # print image[0].sum() # image = compose(image) print 'sdsdadfasfasfasf',type(image) label = i['target'] ax = (4, 4, cnt+1) # ('off') ((0)) ax.set_title(label) (0.001) if cnt ==15: break
Do you notice any difference in the transformed image, compared to before?
Step 4: Wrapping with DataLoader
Why use DataLoader?
① The input to deep learning is of the form mini_batch
② Sample loading may need to randomize the order, shuffle operation
③ Sample loading requires multi-threading.
The DataLoader provided by pytorch encapsulates the above functionality, which makes it easier to use.
# Use DataLoader to take advantage of multi-threading, batch, shuffle, etc. trainset_dataloader = DataLoader(dataset=transformed_trainset, batch_size=4, shuffle=True, num_workers=4)
Visualization:
dataloader = DataLoader(train, batch_size=50,shuffle=True, num_workers=4)
After being wrapped by the DataLoader, the samples are output as min_batch and the order is randomized.
for step, i in enumerate(dataloader): b_x = i['img'].shape b_y = i['target'].shape print 'Step: ', step, '| dimension of train_data' ,b_x,'| dimension of train_target',b_y
As shown in the picture size has been cropped to 20 * 20, and parallel processing so that 60,000 data in 3 seconds can be processed, the efficiency is very high!
Step: 1186 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,) Step: 1187 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,) Step: 1188 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,) Step: 1189 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,) Step: 1190 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,) Step: 1191 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,) Step: 1192 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,) Step: 1193 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,) Step: 1194 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,) Step: 1195 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,) Step: 1196 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,) Step: 1197 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,) Step: 1198 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,) Step: 1199 | train_datadimensionality (50, 1, 20, 20) | train_targetdimensionality (50,)
To be continued...
The above article on Pytorch's MNIST dataset preprocessing details is all I have shared with you, I hope it can give you a reference, and I hope you will support me more.