SoFunction
Updated on 2024-11-20

Python Image Processing Geometric Transformations

I. Geometric transformations of images

Image geometric transformation does not change the pixel values of the image, and performs pixel transformations on the image plane. Proper geometric transformations can minimize the negative effects of geometric distortions due to imaging angles, perspective relationships and even the lens itself. Geometric transformations are often used as a preprocessing step in image processing applications and are one of the core tasks in image normalization [1].

A geometric transformation requires a two-part operation:

Spatial transformations: including translation, scaling, rotation and ortho-parallel projection are needed to represent the pixel mapping relationship between the output image and the input image.

Gray scale interpolation algorithm: computed according to this transformation relation, the pixels of the output image may be mapped to non-integer coordinates of the input image [2].

Image geometric transformation establishes a mapping relationship between the pixels of the original image and the pixels of the transformed image during the transformation process, through which the coordinate positions of the pixels on one side can be calculated from the pixels on the other side. The process of mapping the image coordinates to the output is usually called forward mapping, and conversely, the process of mapping the output image to the input is called backward mapping. Backward mapping is more commonly used in practice because it avoids the problems of incomplete mapping and overlapping mappings that occur with forward mapping.

Figure 6-1 shows an example of image enlargement, only four coordinates (0,0), (0,2), (2,0), and (2,2) in the right image find corresponding pixels in the original image according to the mapping relationship, and the rest of the 12 coordinates have no valid values [3].

For digital images, the coordinates of a pixel are discrete non-negative integers, but it is possible to generate floating-point coordinate values during the transformation process. This is an invalid coordinate in image processing. To solve this problem interpolation algorithms are used. The common algorithms are as follows:

  • nearest neighbor interpolation
  • bilinear interpolation
  • bicubic interpolation

Image transformations are based on matrix operations, through which correspondences can be quickly found. In this article, we will introduce common image geometric transformations, including graphic translation, image scaling, image rotation, image mirroring, image affine, image perspective and so on.

II. Image translation

Image panning is to move all the pixel points in an image horizontally or vertically according to a given amount of panning. Suppose the position coordinates of the original pixel are (x0, y0), and after the translation amount (△x, △y), the coordinates become (x1, y1), as shown in Figure 6-2 [3-5].

Expressed mathematically as equation (6-1).

The matrix representation is shown in Equation (6-2):

In the equation, the matrix is called the translation transformation matrix or factor, and △x and △y are called the translations. Image translation first defines the translation matrix M and then calls the warpAffine() function to realize the translation, the core function is as follows:

M = np.float32([[1, 0, x], [0, 1, y]])

- M denotes the translation matrix, where x denotes the horizontal translation and y denotes the vertical translation

shifted = (src, M, dsize[, dst[, flags[, borderMode[, borderValue]]]])

- src represents the original image

- M denotes the translation matrix

- dsize denotes the size of the transformed output image

- dst is the output image, its size is dsize, and it is of the same type as src

- flag indicates combinations of interpolation methods and optional values

- borderValue represents the pixel extrapolation method, when borderMode = BORDER_TRANSPARENT, it means that the pixels in the target image will not modify the "outliers" in the source image.

- borderValue is used when the border is unchanged, and is 0 by default.

The following code is a simple case of image translation, it defines the image translation matrix M and then calls the warpAffine() function to translate the original image vertically down by 50 pixels and horizontally to the right by 100 pixels.

# -*- coding:utf-8 -*-
# By:Eastmount
import cv2
import numpy as np

#Read the picture
src = ('')

# Image translation matrix
M = np.float32([[1, 0, 100], [0, 1, 50]])

# Get the number of columns and rows of the original image
rows, cols = [:2]

#Image panning
result = (src, M, (cols, rows)) 

#Display Image
("original", src)
("result", result)

# Waiting for the display
(0)
()

The output is shown in Figure 6-3:

The following case is the process of panning the image down, up, right, and left, and then calling the matplotlib plotting library to draw them in turn.

# -*- coding:utf-8 -*-
# By:Eastmount
import cv2  
import numpy as np
import  as plt
 
#Read the picture
img = ('')
image = (img,cv2.COLOR_BGR2RGB)

#Image panning
#Vertical translation down 100
M = np.float32([[1, 0, 0], [0, 1, 100]])
img1 = (image, M, ([1], [0]))

#Vertical translation up 100
M = np.float32([[1, 0, 0], [0, 1, -100]])
img2 = (image, M, ([1], [0]))

#Horizontal. Translation to the right, 100.
M = np.float32([[1, 0, 100], [0, 1, 0]])
img3 = (image, M, ([1], [0]))

#Horizontal. Pan left 100.
M = np.float32([[1, 0, -100], [0, 1, 0]])
img4 = (image, M, ([1], [0]))

# Cyclic display of graphics
titles = [ 'Image1', 'Image2', 'Image3', 'Image4']  
images = [img1, img2, img3, img4]  
for i in range(4):  
   (2,2,i+1), (images[i], 'gray')  
   (titles[i])  
   ([]),([])  
()

The output is shown in Figure 6-4, which translates in all four directions and calls the subplot() function to plot the four subplots together.

III. Image scaling

Image scaling is the process of resizing a digital image. In Python, image scaling is mainly realized by calling resize() function, the function prototype is as follows:

result = (src, dsize[, result[. fx[, fy[, interpolation]]]])

- src represents the original image

- dsize indicates the size of the image scaling

- result represents the image result

- fx denotes the number of times the image is scaled in the x-axis direction

- fy denotes the number of times the image is scaled in the y-direction.

- interpolation denotes the transformation method. cv_inter_nn denotes nearest neighbor interpolation; cv_inter_linear denotes bilinear interpolation (used by default);

CV_INTER_AREA indicates the use of pixel-relative resampling, which prevents ripples from appearing when the image is scaled down and is similar to CV_INTER_NN when the image is scaled up;

CV_INTER_CUBIC denotes cubic interpolation

Two common ways of image scaling are shown below, the first way is to set the original image to (160, 160) pixel size, and the second way is to reduce the original image to 0.5 times.

  • result = (src, (160,160))
  • result = (src, None, fx=0.5, fy=0.5)

Let (x1, y1) be the coordinates after scaling, (x0, y0) be the coordinates before scaling, and sx, sy be the scaling factors, then the formula (6-3) for image scaling is shown:

Here is the Python code that implements image scaling, which shrinks the landscape image it reads.

# -*- coding:utf-8 -*-
# By:Eastmount
import cv2  
import numpy as np  
 
#Read the picture
src = ('')

#Image Scaling
result = (src, (200,100))
print()

#Display Image
("original", src)
("result", result)

# Waiting for the display
(0)
()

The output is shown in Figure 6-5, with the image reduced to (100, 200, 3) pixels. Note that the code calls the function (src, (200,100)) to set the number of columns of the new image size dsize to 200 and the number of rows to 100.

Another method of image scaling transformation is explained below, where the image is transformed by multiplying the original image pixels by the scaling factor, the code is as follows:

# -*- coding:utf-8 -*-
# By:Eastmount
import cv2  
import numpy as np  
 
#Read the picture
src = ('')
rows, cols = [:2]
print(rows, cols)

#Image scaling dsize(column,row)
result = (src, (int(cols*0.6), int(rows*1.2)))

#Display Image
("src", src)
("result", result)
(0)
()

Get the pixel value of the elements of the picture "", the value of its rows is 384, the value of cols is 512, and then the width is reduced by 0.6 times, the height is enlarged by 1.2 times of the processing, before and after the operation of the comparative effect shown in Figure 6-6.

Finally, it explains the method of calling the (fx,fy) parameter to set the zoom factor to zoom in or out on the original image. The following code is fx and fy direction to reduce the original image to 0.3 times the operation.

# -*- coding:utf-8 -*-
# By:Eastmount
import cv2  
import numpy as np  
 
#Read the picture
src = ('')
rows, cols = [:2]
print(rows, cols)

#Image Scaling
result = (src, None, fx=0.3, fy=0.3)

#Display Image
("src", src)
("result", result)

# Waiting for the display
(0)
()

The output is shown in Figure 6-7, which is scaled down to 0.3 x 0.3.

IV. Image Rotation

Image rotation is the process of rotating an image by a certain angle with a certain point as the center to form a new image. Image rotation transformation will have a center of rotation, this center of rotation is generally the center of the image, the size of the image will generally change after the rotation. Figure 6-8 shows the process of rotating the coordinates (x0, y0) of the original image to (x1, y1).

The rotation formula is shown in (6-4), where (m,n) is the center of rotation, a is the angle of rotation, and (left,top) are the coordinates of the upper left corner of the rotated image.

Image rotation transformation is mainly called getRotationMatrix2D () function and warpAffine () function to achieve, around the center of the image rotation, function prototype is as follows:

M = cv2.getRotationMatrix2D(center, angle, scale)

- center means the center of rotation, usually set to (cols/2, rows/2).

- angle indicates the angle of rotation, positive values indicate counterclockwise rotation, and the coordinate origin is designated as the upper left corner.

- scale indicates the scale factor

rotated = (src, M, (cols, rows))

- src represents the original image

- M denotes the rotation parameter, which is the result of the getRotationMatrix2D() function definition

- (cols, rows) represents the width and height of the original image

The implementation code is shown below:

# -*- coding:utf-8 -*-
# By:Eastmount
import cv2  
import numpy as np  
 
#Read the picture
src = ('')

#Height, width and number of channels of the source image.
rows, cols, channel = 

# Rotate around the center of the image
#Function parameters: center of rotation degree of rotation scale
M = cv2.getRotationMatrix2D((cols/2, rows/2), 30, 1)

#Function parameters: original image Rotation parameters Element image width and height
rotated = (src, M, (cols, rows))  

#Display Image
("src", src)
("rotated", rotated)

# Waiting for the display
(0)
()

The display effect is shown in Figure 6-9, rotating 30 degrees counterclockwise around the center point of the image.

V. Summary

This chapter mainly explains the image geometry transformation of Python and OpenCV, detailed introduction to the image translation, image scaling and image rotation, these knowledge points are also common algorithms in our PC or mobile image processing applications, readers can try to combine these applications to complete a set of image processing software.

Above is the detailed content of Python image processing of geometric transformations, more information about Python image geometric transformations please pay attention to my other related articles!