[Artificial Intelligence Project] MNIST Handwriting Recognition Experiment and Analysis
1. Brief description of the experiment
1.1 Experimental environment
The hardware and software experimental environments used in this experiment are shown in the table:
MNIST was trained and tested under Windows operating system using a deep learning framework based on Tensorflow's Keras.
A deep learning framework using keras, a Python library designed for simple neural network assembly, with a large number of pre-packaged network types, including 2D and 3D style convolutional networks, short and long term networks, and a wider range of general networks. Building networks with keras is straightforward, the semantics used by keras in its Api design are hierarchy oriented and network formation is relatively intuitive, so the Keras AI framework was chosen for this work, with its focus on user-friendliness, modularity and scalability.
1.2 Introduction to the MNIST dataset
MNIST (official website) is very famous dataset for handwritten digit recognition. It consists of images of handwritten digits and corresponding labels such as:
The MNIST dataset is divided into training images and test images. There are 60,000 training images and 10,000 test images, each representing one of the numbers 0-9 and all of them are 28*28 matrix in size.
- : training set images (9912422 bytes) Training pictures
- : training set labels (28881 bytes) Training labels
- : test set images (1648877 bytes) test images
- : test set labels (4542 bytes) test labels
1.3 Data pre-processing
The data preprocessing stage normalizes the image, we reduce these values in the image to between 0 and 1 and then feed them to the neural network model. To do this, the data type of the image component is converted from an integer to a floating point number and then divided by 255. this makes it easier to train, here are the functions that preprocess the image: make sure to preprocess the training set and the test set in the same way:
Afterwards, one-hot coding is applied to the labels: the values of discrete features are extended to the Euclidean space, and a certain value of a discrete feature corresponds to a point in the Euclidean space; in machine learning algorithms, the computation of the distance between features or the common computation of the similarity is based on the Euclidean space; using one-hot coding for the discrete features will make the computation of the distance between the features more reasonable
2. Experimental core code
(1) MLP Perceptron
# Build MLP model = Sequential() (Dense(units=256, input_dim=784, kernel_initializer='normal', activation='relu')) (Dense(units=128, kernel_initializer='normal', activation='relu')) (Dense(units=64, kernel_initializer='normal', activation='relu')) (Dense(units=10, kernel_initializer='normal', activation='softmax')) ()
(2) CNN Convolutional Neural Network
# Build LeNet-5 model = Sequential() (Conv2D(filters=6, kernel_size=(5, 5), padding='valid', input_shape=(28, 28, 1), activation='relu')) # C1 (MaxPooling2D(pool_size=(2, 2))) # S2 (Conv2D(filters=16, kernel_size=(5, 5), padding='valid', activation='relu')) # C3 (MaxPooling2D(pool_size=(2, 2))) # S4 (Flatten()) (Dense(120, activation='tanh')) # C5 (Dense(84, activation='tanh')) # F6 (Dense(10, activation='softmax')) # output ()
model interpretation
During model training, we use the convolutional neural network structure of LENET-5.
First layer, convolutional layer
The input to this layer is the original image pixels, and the LeNet-5 model accepts an input layer size of 28x28x1. The size of the filter for the first convolutional layer is 5x5, the depth (convolutional kernel type) is 6, no all-0 padding is used, and the step size is 1. Because the all-0 padding is not used, the size of the output from this layer is 32-5+1=28, and the depth is 6. The parameters for this layer are 5x5x1x6+6=156 parameters (trainable parameters), of which 6 are bias term parameters. The number of parameters is 5x5x1x6+6=156 parameters (trainable parameters), of which 6 are bias term parameters. Since the node matrix of the next layer has has 28x28x6=4704 nodes (number of neurons), each node is connected to 5x5=25 nodes of the current layer, so the total number of connections in this convolutional layer is 28x28x6x(5x5+1).
Layer 2, pooling layer
The input to this layer is the output of the first layer, which is a 28x28x6=4704 node matrix. The filter used in this layer is of size 2x2, with a step size of 2 for both length and width, so the size of the output matrix for this layer is 14x14x6. There are slight differences between the filters used in the original LeNet-5 model and the ones that will be used here, which will not be covered too much here.
Layer 3, Convolutional Layer
The input matrix size of this layer is 14x14x6 and the filter size used is 5x5 with a depth of 16. this layer does not use full 0 padding and has a step size of 1. The output matrix size of this layer is 10x10x16. as per standard convolutional layer this layer should have 5x5x6x16+16=2416 parameters (trainable parameters), 10x10x16x(5x5+1) = 41600 connections.
Layer 4, pooling layer
The input matrix size for this layer is 10x10x16 and the filter size used is 2x2 with a step size of 2. The output matrix size for this layer is 5x5x16.
Layer 5, fully connected layer
The size of the input matrix for this layer is 5x5x16. If the nodes in this matrix are pulled into a vector, then this is the same as the input for the fully connected layer. The number of output nodes for this layer is 120, for a total of 5x5x16x120+120 = 48,120 parameters.
Layer 6, fully connected layer
The number of input nodes in this layer is 120 and the number of output nodes is 84 with a total of 120x84+84=10164 parameters.
Layer 7, fully connected layer
The structure of the last output layer in the LeNet-5 model is different from that of the fully connected layer, but here we approximate it with the fully connected layer. This layer has 84 input nodes and 10 output nodes, for a total of 84x10+10=850 parameters.
modeling process
After the initial parameters are set to start training, each training needs to fine-tune the parameters to get better training results, after many attempts, the parameters are finally set to:
- Optimizer: adam optimizer
- Number of training rounds: 10
- Amount of data per input: 500
The convolutional neural network of LENET-5 was trained on the MNIST dataset with the model parameters described above for 10 rounds of training, achieving 95% accuracy on the training set
3. Results analysis machine summary
3.1 Model testing and analysis of results
In order to verify the robustness of the model, the model with the best performance on the validation set is saved under the above optimal parameters, and the final test is performed on the test set, and the final accuracy is obtained as 95.13%.
To better analyze our results, the confusion matrix is used here to evaluate our model performance. Before the model evaluation, some metrics are learned.
TP (True Positive): Predicts positive classes as positive numbers, true 0, predicts 0 FN (False Negative): Predicts positive classes as negative numbers, true 0, predicts 1 FP (False Positive): Predicts negative classes as positive numbers, true 1, predicts 0 TN (True Negative): Predicts negative classes as negative numbers, true 1, predicts 1 Confusion Matrix Definition and Representation Meaning. TN (True Negative): the negative class is predicted to be a negative class number, the true is 1, the prediction is also 1 confusion matrix definition and representation meaning:
Confusion matrix is a machine learning to summarize the classification model prediction results of the situation analysis table, in the form of a matrix of records in the data set in accordance with the real category and classification model prediction of the category of the judgment of the two criteria for aggregation. Where the rows of the matrix represent the true value, the columns of the matrix represent the predicted value, the following case as an example, look at the matrix representation, as follows:
3.2 Comparison of results
And compare it with the four-layer fully connected layer model, which is structured as follows:
The results are as follows:
In conclusion, from the results, finally after continuous parameter tuning finally trained a model with a classification correctness of about 95%, and experimentally proved that the model has a strong robustness.
3.3 Model predictions
Predictions are made for a single image:
4 Summary
By analyzing the research process of convolutional neural network, this paper proposes a complete set of convolutional neural network MNIST handwriting recognition process and also improves the classification correctness rate of the dataset in this paper to the level of 95%; secondly, the model constructed in this paper is universally applicable, and it can be applied to different datasets for feature extraction and classification with a little improvement. Secondly, the model constructed in this paper is universal and can be applied to different datasets for feature extraction and classification with slight improvement. Thirdly, in the process of constructing the model in this paper, we have taken into account the computational resources and time cost, and the convolutional neural network model constructed in this paper can be trained in an ordinary personal laptop. Comprehensively, the research in this paper has realistic applicability and generalizability, and thus has high practical value!
To this article on Python MNIST handwriting recognition details and trials of the article is introduced to this, more related to Python handwriting recognition content, please search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!