Realization ideas and detailed explanations
1. Access to Fashion data, processing of data
(1) This practical project uses the Fashion dataset, which contains 70,000 grayscale images of clothing from 10 categories, each image shows one piece of clothing, and each image is a low-resolution 28x28 pixel (actually a 28*28 matrix of integers). Some of the results are as follows:
In this practice we divide the dataset into training and test sets, using 60,000 images to train the model and 10,000 images to evaluate how accurately the model classifies dress images.
The dataset is downloaded directly from the web via tensorflow's built-in interface function, where (train_images, train_labels) are the images and labels in the training set, and (test_images, test_labels) are the images and labels in the test set, respectively.
fashion = .fashion_mnist (train_images, train_labels), (test_images, test_labels) = fashion.load_data()
(2) The main use here is to use lists to hold the names of all clothing types that have appeared in the dataset.
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
(3) Define a function that is mainly used to display a particular photo.
def showImage(image): () (image, cmap=) () (False) ()
(4) Here we mainly show what the first raw image in the training set looks like, and we can see that each image is a low-resolution 28x28 pixel (which is actually a 28*28 matrix number), and each pixel is a number between 0-255
showImage(train_images[0])
The effect is as follows:
Since each pixel of each image is an integer between 0-255, we normalize all images for faster convergence of model training.
train_images = train_images / 255.0 test_images = test_images / 255.0
Here we mainly show the result of the normalization operation on the first original image in the training set, you can see that each pixel point is a decimal between 0-1.
showImage(train_images[0])
The effect is as follows:
2. Build a model with tensorflow 2.1
(1) Because the input of each picture is 28*28 pixels, the first layer is input_shape=(28, 28), the output is a 784-dimensional vector, the operation can be regarded as a multi-dimensional array of input_shape values, re-spliced together into a one-dimensional array.
(2) The second and third layers are nonlinear changes in the activation function relu, the output of a 64-dimensional vector of the fully connected layer operation, of course, the number of layers of this network structure, the activation function, the dimension of the output of each layer can be adjusted arbitrarily, the size of which affects the indicators of the final evaluation of the model, theoretically, the more complex the structure of the better results, but the slower the speed of training, which will also cause the phenomenon of overfitting. Theoretically, the more complex the structure, the better the effect, but the slower the training speed, and this will also cause overfitting phenomenon, the degree of grasp needs to be constantly adjusted through the output indicators.
(3) The third layer is to output a 10-dimensional fully connected layer operation, which is actually the corresponding probability distribution of that input image belonging to each of the ten categories.
model = ([ (input_shape=(28, 28)), (64, activation='relu'), (64, activation='relu'), (10) ])
3. Configure and compile the model
(1) Here we have chosen the Adam optimizer, which is a relatively mature and widely used optimizer.
(2) For the loss function here we have chosen the more common cross-entropy SparseCategoricalCrossentropy.
(3) Here we have chosen the most commonly used model evaluation metric, accuracy.
(optimizer='adam', loss=(from_logits=True), metrics=['accuracy'])
4. Training models
(1) We use the training data images and labels to train the model, and set the epochs to 5, that is, all the training set from beginning to end of the repeated training 5 times, if the model has not converged that you can also set the epochs of the value of the larger, with the more complex network structure, and finally the model should be able to achieve more than 98% of the accuracy in the training phase.
(2) During the training of the model, the command line displays the overall loss value of the model and the accuracy evaluation metrics, which are written in the output format by tensorflow's internal functions, or you can write your own code to change them.
(train_images, train_labels, epochs=5)
The output of the training process is shown below:
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 5s 78us/sample - loss: 0.5140 - accuracy: 0.8180
Epoch 2/5
60000/60000 [==============================] - 4s 73us/sample - loss: 0.3724 - accuracy: 0.8654
Epoch 3/5
60000/60000 [==============================] - 4s 74us/sample - loss: 0.3388 - accuracy: 0.8763
Epoch 4/5
60000/60000 [==============================] - 4s 70us/sample - loss: 0.3165 - accuracy: 0.8831
Epoch 5/5
60000/60000 [==============================] - 4s 74us/sample - loss: 0.2985 - accuracy: 0.8902
5. Assessment models
(1) Here we use test data to evaluate the model by the same metrics as previously specified for accuracy.
(2) verbose=2 is just to specify the form of the result output, you can choose any one of 0, 1, or 2.
loss, acc = (test_images, test_labels, verbose=2) print('Test accuracy:%f, Test loss:%f'%(acc, loss))
The resultant output is shown below, indicating that the trained model can achieve an accuracy of 0.8725 under the evaluation of the test set:
10000/1 - 1s - loss: 0.2279 - accuracy: 0.8725 Test accuracy:0.872500, Test loss:0.352148
6. Use of models for forecasting
(1) The final output of the above model is a 10-dimensional cluttered array of floating-point numbers for each image. In order to ensure that the output is easily understood, we add a layer of Softmax at the end of the above model.
(2) The function of Softmax is very simple, it is to transform these 10 messy floating point numbers into 10 probabilities, each with a probability value between 0 and 1, and the sum of these 10 probabilities is 1, so that we take the clothing category corresponding to the value with the highest probability as the result of our prediction.
model_by_softmax = ([model, ()]) predictions = model_by_softmax.predict(test_images) predict_for_one_image = predictions[3] predict_for_one_image
The output probability distribution is shown below:
array([4.1939876e-07, 9.9996161e-01, 8.4085507e-08, 3.7719459e-05,
3.1557637e-08, 1.0500006e-13, 1.5945717e-07, 1.3569163e-13,
1.8028586e-09, 4.5183642e-11], dtype=float32)
(3) Looking at the probability distribution of a test image across these 10 clothing categories, we find that 9.9996161e-01 is the largest, indicating that the image has the highest probability of being a clothing style in the second category.
(4) We can output the predicted clothing category, and by finding the clothing category corresponding to the maximum probability, we find that the predicted clothing category for this image is Trouser.
class_names[(predict_for_one_image)]
The output is as follows:
'Trouser'
(5) We plot the original image and find that it is indeed a pair of pants, indicating that the model prediction is correct.
showImage(test_images[3])
The display effect is as follows:
The above is Tensorflow2.1 to achieve Fashion image classification example details, more information about Tensorflow Fashion image classification please pay attention to my other related articles!