SoFunction
Updated on 2024-12-15

python neural network learning data enhancement and preprocessing examples in detail

Preface to the study

For training, it's possible to train directly with the original image (as in our favorite Mnist handwriting), but most images are not the same length and width, so it's easy to have problems if you resize directly.

In addition to the problem of resize, there are times when the data is not enough what to do, of course, we need to use data enhancement.

This post is a way to document some of the data preprocessing I've been collecting lately

Handling images with different lengths and widths

For many classification and target detection algorithms, the input images are of the same length and width, such as 224,224, 416,416, and so on.

If you resize directly, the image will be distorted.

But we can use the following code to make it undistorted with padding.

from PIL import Image
def letterbox_image(image, size):
    # resize the image so that it is not distorted. Padding in the empty areas
    iw, ih = 
    w, h = size
    scale = min(w/iw, h/ih)
    nw = int(iw*scale)
    nh = int(ih*scale)
    image = ((nw,nh), )
    new_image = ('RGB', size, (128,128,128))
    new_image.paste(image, ((w-nw)//2, (h-nh)//2))
    return new_image
img = ("2007_000039.jpg")
new_image = letterbox_image(img,[416,416])
new_image.show()

Get the picture as:

data enhancement

1. Data enhancement within the dataset

The idea of this is that you can augment the data by adding images directly. The main function used for this is:

ImageDataGenerator(featurewise_center=False,  
                    samplewise_center=False, 
                    featurewise_std_normalization=False, 
                    samplewise_std_normalization=False, 
                    zca_whitening=False, 
                    zca_epsilon=1e-06, 
                    rotation_range=0, 
                    width_shift_range=0.0, 
                    height_shift_range=0.0, 
                    brightness_range=None, 
                    shear_range=0.0, 
                    zoom_range=0.0, 
                    channel_shift_range=0.0, 
                    fill_mode='nearest', 
                    cval=0.0, 
                    horizontal_flip=False, 
                    vertical_flip=False, 
                    rescale=None, 
                    preprocessing_function=None, 
                    data_format=None, 
                    validation_split=0.0, 
                    dtype=None)

For me, the common methods are as follows:

datagen = ImageDataGenerator(
        rotation_range=10,
        width_shift_range=0.1,
        height_shift_range=0.1,
        shear_range=0.2,
        zoom_range=0.1,
        horizontal_flip=False,
        brightness_range=[0.1, 2],
        fill_mode='nearest')

where the significance of the parameter is:

1、rotation_range:rotation range

2、width_shift_range:horizontal shift range

3、height_shift_range:vertical shift range

4, shear_range: float, perspective transformation range

5、zoom_range:zoom range

6、horizontal_flip:horizontal flip

7, brightness_range: image random brightness enhancement, given a list containing two float values, the brightness value is taken from the upper and lower limits of the value between the

8、fill_mode:'constant','nearest','reflect'or' wrap', when transforming the points beyond the boundary will be processed according to the method given in this parameter.

In practice, the following function can be utilized to generate an image:

from  import ImageDataGenerator, array_to_img, img_to_array, load_img
import os 
datagen = ImageDataGenerator(
        rotation_range=10,
        width_shift_range=0.1,
        height_shift_range=0.1,
        shear_range=0.2,
        zoom_range=0.1,
        horizontal_flip=False,
        brightness_range=[0.1, 2],
        fill_mode='nearest')
trains = ("./train/")
for index,train in enumerate(trains):
    img = load_img("./train/" + train)
    x = img_to_array(img)
    x = ((1,) + )
    i = 0
    for batch in (x, batch_size=1,
                            save_to_dir='./train_out', save_prefix=str(index), save_format='jpg'):
        i += 1
        if i > 20:
            break  

Generate effects as:

2、Data enhancement when reading pictures

ImageDataGenerator is a very nice enhancement, but it's also possible if you don't want to generate too many images and then want to process them directly while reading them.

We use the ImageEnhance library in PIL.

1、Brightness enhancement (image)

2、Color enhancement (image)

3, contrast enhancement (image)

4, sharpness enhancement (image)

In the following functions, different enhancements can be realized by changing the parameters in the Ehance function.

import os
import numpy as np
from PIL import Image
from PIL import ImageEnhance
def Enhance_Brightness(image):
    # Brighten, an enhancement factor of 0.0 will produce a black image, 1.0 will keep the original image.
    # Brightness enhancement
    enh_bri = (image)
    brightness = (0.6,1.6)
    image_brightened = enh_bri.enhance(brightness)
    return image_brightened
def Enhance_Color(image):
    # Chroma, enhancement factor of 1.0 is the original image
    # Chroma enhancement
    enh_col = (image)
    color = (0.4,2.6)
    image_colored = enh_col.enhance(color)
    return image_colored
def Enhance_contrasted(image):
    # Contrast, enhancement factor of 1.0 is original image
    # Contrast enhancement
    enh_con = (image)
    contrast = (0.6,1.6)
    image_contrasted = enh_con.enhance(contrast)
    return image_contrasted
def Enhance_sharped(image):
    # Sharpness with an enhancement factor of 1.0 is the original image
    # Sharpness enhancement
    enh_sha = (image)
    sharpness = (0.4,4)
    image_sharped = enh_sha.enhance(sharpness)
    return image_sharped
def Add_pepper_salt(image):
    # Increase pretzel noise
    img = (image)
    rows,cols,_=
    random_int = (500,1000)
    for _ in range(random_int):
        x=(0,rows)
        y=(0,cols)
        if (0,2):
            img[x,y,:]=255
        else:
            img[x,y,:]=0
    img = (img)
    return img
def Enhance(image_path, change_bri=1, change_color=1, change_contras=1, change_sha=1, add_noise=1):
    #Read the picture
    image = (image_path)
    if change_bri==1:
        image = Enhance_Brightness(image)
    if change_color==1:
        image = Enhance_Color(image)
    if change_contras==1:
        image = Enhance_contrasted(image)
    if change_sha==1:
        image = Enhance_sharped(image)
    if add_noise==1:
        image = Add_pepper_salt(image)
    ("")
Enhance("2007_000039.jpg")

Original image:

The effect is as follows:

3. Data enhancement in target detection

If you want to enhance the data in target detection, it is not just a matter of directly enhancing the image, but also taking into account the position of the frame after the image is distorted.

That is, the position of the box should follow the position of the image.

Original image:

Enhanced Post:

from PIL import Image, ImageDraw
import numpy as np
from  import rgb_to_hsv, hsv_to_rgb
def rand(a=0, b=1):
    return ()*(b-a) + a
def get_random_data(annotation_line, input_shape, random=True, max_boxes=20, jitter=.3, hue=.1, sat=1.5, val=1.5, proc_img=True):
    '''random preprocessing for real-time data augmentation'''
    line = annotation_line.split()
    image = (line[0])
    iw, ih = 
    h, w = input_shape
    box = ([(list(map(int,(',')))) for box in line[1:]])
    # resize image
    new_ar = w/h * rand(1-jitter,1+jitter)/rand(1-jitter,1+jitter)
    scale = rand(.7, 1.3)
    if new_ar < 1:
        nh = int(scale*h)
        nw = int(nh*new_ar)
    else:
        nw = int(scale*w)
        nh = int(nw/new_ar)
    image = ((nw,nh), )
    # place image
    dx = int(rand(0, w-nw))
    dy = int(rand(0, h-nh))
    new_image = ('RGB', (w,h), (128,128,128))
    new_image.paste(image, (dx, dy))
    image = new_image
    # flip image or not
    flip = rand()<.5
    if flip: image = (Image.FLIP_LEFT_RIGHT)
    # distort image
    hue = rand(-hue, hue)
    sat = rand(1, sat) if rand()<.5 else 1/rand(1, sat)
    val = rand(1, val) if rand()<.5 else 1/rand(1, val)
    x = rgb_to_hsv((image)/255.)
    x[..., 0] += hue
    x[..., 0][x[..., 0]>1] -= 1
    x[..., 0][x[..., 0]<0] += 1
    x[..., 1] *= sat
    x[..., 2] *= val
    x[x>1] = 1
    x[x<0] = 0
    image_data = hsv_to_rgb(x) # numpy array, 0 to 1
    # correct boxes
    box_data = ((max_boxes,5))
    if len(box)>0:
        (box)
        box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
        box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
        if flip: box[:, [0,2]] = w - box[:, [2,0]]
        box[:, 0:2][box[:, 0:2]<0] = 0
        box[:, 2][box[:, 2]>w] = w
        box[:, 3][box[:, 3]>h] = h
        box_w = box[:, 2] - box[:, 0]
        box_h = box[:, 3] - box[:, 1]
        box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box
        if len(box)>max_boxes: box = box[:max_boxes]
        box_data[:len(box)] = box
    return image_data, box_data
if __name__ == "__main__":
    line = r"F:\Collection\yolo_Collection\keras-yolo3-master\VOCdevkit/VOC2007/JPEGImages/ 738,279,815,414,0"
    image_data, box_data = get_random_data(line,[416,416])
    left, top, right, bottom  = box_data[0][0:4]
    img = ((image_data*255).astype(np.uint8))
    draw = (img)
    ([left, top, right, bottom])
    ()

Above is the detailed content of python neural network learning data enhancement and preprocessing examples, more information about python neural network data enhancement preprocessing please pay attention to my other related articles!