Interpreting Convolutional Neural Networks for Face Recognition

Being a bit of a noob, it was quite rewarding to get my hands on it with facial recognition.

operating environment

python3.7
tensorflow 2.2.0
opencv-python 4.4.0.40
Keras 2.4.3
numpy 1.18.5, the specific installation process as well as the environment to build omitted, you can learn from the Internet.

a specific target

Train your own data with a convolutional neural network and be able to successfully recognize yourself

The realization steps are shown below

1. Face data acquisition and reading

1.1 Data acquisition

This dataset uses opencv to open the camera, and collects picture information from the camera as well as pictures of certain people from outside, and collects a total of 5 people's information. (Here, there is no cropping of face information directly from the camera in order to facilitate the processing of pictures collected from the outside, and there are 800 pictures of each person, and this dataset puts the pictures taken of each person in the same folder with the beginning of the name abbreviation).

The relevant codes are listed below:

import os
import cv2
import time
from PIL import Image

# Only screenshots are realized
global path
path='./images/'
#Face sampling, wrapper function
def cy(path):
   #path is the path to save the image
   # Call the laptop's built-in camera, the parameter is 0, if there are other cameras you can adjust the parameter to 1,2
   cap = (0)
   #Tag an id for the face that's about to be recorded
   face_id = input('\n User face information entry, enter user name (preferably in English): \n')
   #sampleNum is used to count the number of samples
   count = 0
   
   while True:
       # Reading pictures from the camera
       success,img = ()    
       count += 1  
       #Save the image and see the grayscale image as a 2D array to detect the face region
       #Save to the appropriate folder
       (path+str(face_id)+'.'+str(count)+'.jpg',img)
       #Display Pictures
       ('image',img)
           #The waitkey method allows you to bind a key to keep the screen in and out, and exit the camera with the q key.
       k = (1)
       if k == '27':
           break
           # or get 800 samples to exit the camera, here you can modify the amount of data according to the actual situation, the actual test after 800 sheets of the effect is more satisfactory
       elif count >= 500:
           (2)
           success,img = ()
           break
   
   # Turn off the camera and free up resources
   ()
   ()
# Call function for face sampling
cy(path)

Get the partial region of the face of each person, here face detection cascade classifier is used and save the image in a specific folder.

import cv2
import os

# Processing of images, the input is not gray-scale images, to facilitate the external collection of images for processing
CASE_PATH = "haarcascade_frontalface_default.xml"
RAW_IMAGE_DIR = 'images/'
DATASET_DIR = 'hh/'
path='D:\\pythonlx\\test\\images\\'

#Face Classifier
face_cascade = (CASE_PATH)

#Define face size
def save_feces(img, name,x, y, width, height):
   image = img[y:y+height, x:x+width]
   (name, image)

image_list = (RAW_IMAGE_DIR) # List all directories and files in a folder
for image_path in range(len(image_list)):
   gh=path+image_list[image_path]
  # print(gh)
   image = (gh)
  #gray = (image, cv2.COLOR_BGR2GRAY)
   faces = face_cascade.detectMultiScale(image,
                                        scaleFactor=1.2,
                                        minNeighbors=5,
                                        minSize=(5, 5), )
   for (x, y, width, height) in faces:
       save_feces(image, '%ss%' % (DATASET_DIR, image_path+1), x, y - 30, width, height+30)

1.2 Data reading

The image dataset is converted to a four-dimensional array, normalized, and the labels are vectorized using one-hot coding, randomly divided according to 80% of the training set and 20% of the testing set.

and normalized.

 #Read the picture
def read_image():
 data_x, data_y = [], []
 image_list = ('mine/')
 for i in range(len(image_list)):
     try:
         im = ('mine/{}'.format(image_list[i]))
         im = resize_without_deformation(im)
         data_x.append((im, dtype = np.int8))
         #define tags
         a=image_list[i].split('.')[0]
         if a=='s2':
             data_y.append(0)
         elif a=='s4':
             data_y.append(1)
         elif a=='s5':
             data_y.append(2)
         elif a=='s6':
             data_y.append(3)
         elif a=='s7':
             data_y.append(4)
     except IOError as e:
         print(e)
     except:
         print('Unknown Error!')
 return data_x,data_y

#Read all images as well as tags
raw_images, raw_labels = read_image()
##View data per tag
#a=raw_labels.count(0)#583
#b=raw_labels.count(1)#621
#c=raw_labels.count(2)#717
#d=raw_labels.count(3)#765
#e=raw_labels.count(4)#698

# to floating point
raw_images, raw_labels = (raw_images, dtype = np.float32),(raw_labels, dtype = np.int32)

#Converting tags to one_hot type
ont_hot_labels = np_utils.to_categorical(raw_labels)

#Divide the dataset, 80% training set, 20% test set
train_input, valid_input, train_output, valid_output =train_test_split(raw_images, 
               ont_hot_labels,
               test_size = 0.2)
# Data normalization
train_input /= 255.0
valid_input /= 255.0

2. Image pre-processing

The shape of the collected picture samples may have irregular size, must do size transformation of the picture, transformed into 100 * 100 size, in order to prevent deformation of the picture, the shorter side of the picture will be blackened to fill.

Make it into the same proportion as the target image, and then resize, so as to retain the face information of the original image, but also to prevent the image deformation; and finally equalize the histogram of the gray-scale image to enhance the details and contrast of the image, and improve the recognition rate.

def resize_without_deformation(image, size = (100, 100)):
  height, width, _ = 
  #Find the longest edge for edges of unequal lengths
  longest_edge = max(height, width)
  # Use 0 to fill the border
  top, bottom, left, right = 0, 0, 0, 0
   #Calculate the short side by adding more pixels to make it equal to the long side.
  if height < longest_edge:
      height_diff = longest_edge - height
      top = int(height_diff // 2)
      bottom = height_diff - top
  elif width < longest_edge:
      width_diff = longest_edge - width
      left = int(width_diff // 2)
      right = width_diff - left
   # Add a border to the image, is the image length, width and so on, cv2.BORDER_CONSTANT specify the border color specified by value
  image_with_border = (image, top , bottom, left, right, cv2.BORDER_CONSTANT, value = [0, 0, 0])

  resized_image = (image_with_border, size)
  #Convert cropped images to grayscale
  resize_image= (resized_image,cv2.COLOR_BGR2GRAY)
  # Histogram equalization
  hist = (resize_image)
  img2 = ((100, 100, 1))
  return img2

3. Model building and training

According to the different roles of the constituent structures in the convolutional neural network to build a convolutional network model, adjust each parameter to achieve the optimization of the model and improve the model training effect.

Modeling framework:

模型框架

Model parameters:

Building Convolutional Networks and Training:

Since the images in the dataset may be too homogeneous as well as have small variations, a data boost is added later to train the model using the generator.

 #Build Convolutional Neural Networks, Sequential Models
model = ()

# Convolutional layer, convolution kernel size 32, each size 383, step size 1, input of type (100,100,1),1 is channel, activation function relu
(Conv2D(filters=32,kernel_size=(3,3),padding='valid',strides= (1, 1),#1
                                  input_shape = (100, 100,1),
                                  activation='relu'))

(Conv2D(filters=32,kernel_size=(3,3),padding='valid',strides= (1, 1),#2
                                  activation='relu'))


# pooling layer
(MaxPooling2D(pool_size=(2, 2)))#3

#Dropout layer
(Dropout(0.25))#4

# Convolutional layers
(Conv2D(64, (3, 3), padding='valid',
                                  strides = (1, 1),
                                  activation = 'relu'))#5

(Conv2D(64, (3, 3), padding='valid',
                                  strides = (1, 1),
                                  activation = 'relu'))#6

# pooling layer
(MaxPooling2D(pool_size=(2, 2)))#7
(Dropout(0.25))#8

# Full connection
(Flatten())#9

(Dense(512, activation = 'relu'))#10

(Dropout(0.25))#11

# Output layer, the number of neurons is the number of labeled species, use the sigmoid activation function, output the final result
(Dense(len(ont_hot_labels[0]), activation = 'sigmoid'))#12



# Optimization models, #
#SGD ---- gradient descent operator
#learning_rate = 1# learning rate
#decay = 1e-6# learning rate decay factor
#momentum = 0.8 #impulse
#nesterov = True
#sgd_optimizer = SGD(lr = learning_rate, decay = decay,
#                    momentum = momentum, nesterov = nesterov)

#categorical_crossentropy
# Optimizer with Adam's algorithm and loss function with cross entropy
(optimizer='adam',loss='categorical_crossentropy',
              metrics=['accuracy'])


# Output model parameters
()

# Define a data generator for data lifting, which returns a generator object datagen, datagen is invoked every
# Subsequent to its generation of a set of data (sequential generation), saving memory, is actually python's data generator
datagen = ImageDataGenerator(
                 featurewise_center = False,             # whether to decentralize the input data (mean value 0).
                 samplewise_center  = False,             # Whether or not to make each sample mean of the input data 0
                 featurewise_std_normalization = False,  # Is data standardized (input data divided by the standard deviation of the data set)
                 samplewise_std_normalization  = False,  # Whether to divide each sample data by its own standard deviation
                 zca_whitening = False,                  # Whether or not to apply ZCA whitening to the input data
                 rotation_range = 20,                    # Angle of random rotation of the picture during data boosting (range 0 to 180)
                 width_shift_range  = 0.2,               # Amplitude of the horizontal offset of the image when data is boosted (in units of a percentage of the image width, a floating point number between 0 and 1)
                 height_shift_range = 0.2,               # Same as above, except here it's vertical
                horizontal_flip = True,                 #Whether to perform random horizontal flips
                 vertical_flip = False)                  #Whether to perform random vertical flips
 
# Calculate the number of the entire training sample set for processing such as eigenvalue normalization, ZCA whitening, etc.
(train_input)

#Start training the model with the generator
history=model.fit_generator((train_input, train_output,batch_size = 50),
                                      epochs = 10,
                                      validation_data = (valid_input, valid_output))

#Model validation
print((valid_input, valid_output, verbose=2))

## Draw a graph of LOSS changes
(['accuracy'])
(['val_accuracy'],label='val_accuracy')
('Epoch')
('Accuracy')
([0.5,1])
(loc='lower right')

#Save the model
MODEL_PATH = 'face_model.h5'
(MODEL_PATH)

4. Identification and verification

The saved model is reloaded and the camera is turned on for recognition, it can recognize itself as well as others very accurately (the prediction is changed to recognizing self and non-self since the identity of others cannot be disclosed easily), when the model prediction is performed, each frame of the image obtained from the camera needs to be processed and converted to the appropriate format, otherwise there will be a problem with the SHAPE.

And the prediction is corresponding to the confidence of each label, return to its maximum value of the column index can know the corresponding label, you can identify who it is.

The rendering is shown below

Identify yourself with yourself:

Identify yourself and what is not you:

5 Summary

In the first attempt for model training, there is no detail processing of the image as well as data enhancement, although the model accuracy is very high, but when the camera is used for recognition, there will be recognition inaccuracy, so later on the detail processing of the image as well as data enhancement processing is added, the prediction effect is up to the expected value, and the model achieves a maximum accuracy of 99.4% with a loss rate of 0.015.

The problem of the data set, due to the direct use of the camera to shoot, there will be some angles are not captured completely or will be affected by environmental factors. When comparing to face recognition, the features can only be recognized at those angles, and will not be recognized at a different angle.

In this example, the face localization is directly using the cascade classifier that comes with opencv, which can be found in the installed opencv folder. We did not write our own algorithm.

The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.