Three ways Tensorflow 2.4 loads and processes images in detail

preamble

In this paper, we introduce three ways to load and preprocess image data by using cpu version of tensorflow 2.4.

Here we have to make sure that tensorflow in version 2.4 or above, python in version 3.8 or above, because the version is too low some of the built-in functions can not be used, and then to install a good pillow and tensorflow_datasets in advance, to facilitate the subsequent data loading and processing work.

Since this paper does not quality assure the model and only describes the process of loading and processing the data, it is sufficient to train the model only briefly.

Data preparation

First of all, we prepare the image data of this paper, here we directly use the built-in function of tensorflow, download a flower photo dataset from the network, you can also directly use the following link to download using Thunderbolt.

The data directory contains 5 subdirectories, each subdirectory corresponds to a class, which are daisy, dandelion, rose, sunflower and tulip, and there are a total of 3670 images.

import pathlib
import numpy as np
import os
import PIL
import 
import tensorflow as tf
import tensorflow_datasets as tfds
dataset_url = "//example_images/flower_photos.tgz"
data_dir = .get_file(origin=dataset_url, fname='flower_photos', untar=True)
data_dir = (data_dir)
image_count = len(list(data_dir.glob('*/*.jpg')))

Read and process disk data using built-in functions

(1) Use KERAS built-in function image_dataset_from_directory to load data from local, first define the batch_size as 32, the dimension size of each picture is (64,64,3), that is, there are 64 pixels in the length and width, the number of pixels in the length and width here can be modified by yourself. The larger the number, the clearer the picture is but the larger the subsequent computation will be, the smaller the number, the blurrier the picture is but the smaller the subsequent computation will be.

Each pixel point is a 3-dimensional RGB color vector. And the label corresponding to each image is a string of flower categories.

We used image_dataset_from_directory to select 80% (2936) of the images in the data for training and 20% (734) for validation of the model results.

We define these 5 image classes as daisy, dandelion, roses, sunflowers, tulips and save them in class_names.

batch_size = 32
height = 64
width = 64
train_datas = .image_dataset_from_directory( data_dir, validation_split=0.2, subset="training", seed=0,
                                                                 image_size=(height, width), batch_size=batch_size)
val_datas = .image_dataset_from_directory(  data_dir, validation_split=0.2, subset="validation", seed=0,
                                                                image_size=(height, width), batch_size=batch_size)
class_names = train_datas.class_names

(2) The conventional training process is that we load a good data from disk to train a model, then go to load the next data to train the model, and then repeat the process, but sometimes the preparation of the data set is very time-consuming to process, so that we need to spend a lot of time to prepare the data to be trained before each training, and at this time the CPU can only wait for the data, resulting in a waste of computing resources and time.

(3) After loading images from disk is complete, () will keep these images in memory so that data fetching can be done quickly, or a cache can be created if the amount of data is too large.

(4) We use the prefetch() method, which allows us to have the Dataset pre-prepared with a number of data samples during training, so that each time the model is trained, the data can be brought in directly for computation, which avoids the time-consuming wait and improves the training efficiency.

AUTOTUNE = 
train_datas = train_datas.cache().prefetch(buffer_size=AUTOTUNE)
val_datas = val_datas.cache().prefetch(buffer_size=AUTOTUNE)

(5) The main thing to accomplish here is to build, compile and train the model.

The first layer uses the scaling function Rescaling to compress the RGB values, because each pixel represents the color of the three values of the RGB range of 0-255, so we have to normalize these values, after this operation, the range of the three values of the RGB is compressed to between 0 and 1, which can accelerate the convergence of the model.
The second, third, and fourth layers all use a convolution function with a convolution kernel size of 3, outputting a 32-dimensional convolution result vector with a nonlinear transformation using the relu activation function, and adding a maximum pooling layer after convolution
The fifth layer accomplishes the re-splicing and compression of the convolution result vectors of each photo from three-dimensional to one-dimensional
The sixth layer is a fully connected layer with an output of 128 and a nonlinear transformation using the relu activation function
The seventh layer is a fully-connected layer with an output of 5, the output layer, which outputs the probability distribution of the image belonging to each of these 5 categories

Optimizer Selection Adam

Loss function selection SparseCategoricalCrossentropy

Selection of assessment indicators Accuracy

  model = ([   (1./255),
                                  .Conv2D(32, 3, activation='relu'), .MaxPooling2D(),
                                  .Conv2D(32, 3, activation='relu'), .MaxPooling2D(),
                                  .Conv2D(32, 3, activation='relu'), .MaxPooling2D(),
                                  (),
                                  (128, activation='relu'),
                                  (5) ])
  ( optimizer='adam', loss=(from_logits=True), metrics=['accuracy'])
  ( train_datas, validation_data=val_datas, epochs=5 )

Output results:

Epoch 1/5
92/92 [==============================] - 10s 101ms/step - loss: 1.5019 - accuracy: 0.3167 - val_loss: 1.1529 - val_accuracy: 0.5177
Epoch 2/5
92/92 [==============================] - 6s 67ms/step - loss: 1.1289 - accuracy: 0.5244 - val_loss: 1.0833 - val_accuracy: 0.5736
...
Epoch 5/5
92/92 [==============================] - 6s 65ms/step - loss: 0.8412 - accuracy: 0.6795 - val_loss: 1.0528 - val_accuracy: 0.6172

Read and process disk data in a customized way

(1) The above processes are built-in toolkit to process the data directly, which is more convenient but may not be flexible enough, whereas here we can do it manually by ourselves and process the data according to our own ideas.

(2) from the hard disk to read the specified directory of all the flowers in the absolute path of the picture, that is, read out only the absolute path of the picture string, such as the first picture on my computer absolute path is

C:\Users\QJFY-VR\.keras\datasets\flower_photos\roses\24781114_bc83aa811e_n.jpg

This data is then first disrupted by taking 20% for the validation set and 80% for the training set.

datas = .list_files(str(data_dir/'*/*'), shuffle=False)
datas = (image_count, reshuffle_each_iteration=False)
val_size = int(image_count * 0.2)
train_datas = (val_size)
val_datas = (val_size)

(3) Each piece of data in both the training and test sets is processed to obtain the final image content and corresponding image labels:

The label of each image is extracted from the absolute path of each image, using the \ separator to split the absolute path into a list, and then taking the penultimate string that is its category label and converting it into a one-hot vector

The content of each image is changed by loading an absolute path and resizing the loaded image content in pixels with specified height and width.

  def get_label(file_path):
      parts = (file_path, )
      return (parts[-2] == class_names)
  def decode_img(img):
      return (.decode_jpeg(img, channels=3), [height, width])
  def process_path(file_abs_path):
      label = get_label(file_abs_path)
      img = decode_img(.read_file(file_abs_path))
      return img, label
  train_datas = train_datas.map(process_path, num_parallel_calls=AUTOTUNE)
  val_datas = val_datas.map(process_path, num_parallel_calls=AUTOTUNE)

(4) The obtained test set and training set are saved in memory by cache() and the data to be used is loaded in advance using prefetch() as well, the data is broken up using shuffle() and batch_size samples are obtained one at a time using batch().

(5) Use the training data to train 5 epochs and use the validation set for metrics evaluation. Since the model has already been trained with the above data, the val_accuracy is higher from the beginning of the training process here.

def configure_for_performance(ds):
    ds = ().prefetch(buffer_size=AUTOTUNE)
    ds = (buffer_size=1000).batch(batch_size)
    return ds
train_datas = configure_for_performance(train_datas)
val_datas = configure_for_performance(val_datas)
( train_datas, validation_data=val_datas, epochs=5 )

Result Output:

Epoch 1/5
92/92 [==============================] - 11s 118ms/step - loss: 0.1068 - accuracy: 0.9680 - val_loss: 0.1332 - val_accuracy: 0.9537
Epoch 2/5
92/92 [==============================] - 10s 113ms/step - loss: 0.0893 - accuracy: 0.9721 - val_loss: 0.0996 - val_accuracy: 0.9673
...
Epoch 5/5
92/92 [==============================] - 10s 112ms/step - loss: 0.0328 - accuracy: 0.9939 - val_loss: 0.1553 - val_accuracy: 0.9550

Downloading data from the web

The above two ways to read disk data from the local, in addition to the network we can also get data and processing, tfds prepared for us a lot of kinds of data, including audio, text, images, videos, translations, etc., through the built-in function from the network can be downloaded from the specified data, here we downloaded the tf_flowers data from the network, in fact, the Here we download the tf_flowers data from the network, which is actually the flower disk data from the disk we used above.

(train_datas, val_datas, test_datas), metadata = (  'tf_flowers', split=['train[:70%]', 'train[70%:90%]', 'train[90%:]'], with_info=True, as_supervised=True)
train_datas = configure_for_performance(train_datas)
val_datas = configure_for_performance(val_datas)
test_datas = configure_for_performance(test_datas)

After loading out the data, you can choose your own way of processing, and the above two are very similar.

The above is Tensorflow 2.4 load processing images of the three ways to explain the details, more information about Tensorflow load processing images please pay attention to my other related articles!