SoFunction
Updated on 2024-11-20

pytorch transforms image data into tensor operations

Abstracts:

In image recognition, the general step is to read the picture first, and then convert the picture data into tensor format, and then transported to the network. In this article, we will introduce how to convert the image into tensor.

I. Data conversion

Convert a picture into tensor data for torch, generally using the function:. Illustrated by an example, first use opencv to read a picture, and then in the conversion; Note one thing is: the format of opencv to store the picture and torch storage is not the same, opencv to store the picture format is (H,W,C), and torch to store the format is (C,H,W).

import  as transforms
import cv2 as cv
img = ('image/')
print()  # numpy array format (H,W,C)
transf = ()
img_tensor = transf(img) # tensor data format is torch(C,H,W)
print(img_tensor.size())

Note: Be careful when using ToTensor(), its sub-function has no parameter input, and the following usage will result in errors

img_tensor = (img)

It must be the correct usage to define and assign the conversion function first, then call and enter the parameters:

img = ('image/')
transf = ()
img_tensor = transf(img)

Regularization during reconversion

In the process of using () for image data conversion, the pixel value of the image will be regularized, i.e., the pixel value of the image is usually read in 8 bit binary, so its decimal range is [0, 255], and the regularization will divide each pixel value by 255, i.e., the pixel value will be regularized to the range of [0.0, 1.0]. Understand this with an example:

import  as transforms
import cv2 as cv
img = ('image/')
transf = ()
img_tensor = transf(img)
print('opencv', img)
print('torch', img_tensor)

III. Modifying the scope of regularization on your own

You can modify the range of regularization by using the function, here is an example of regularization to [-1.0, 1.0].

transf2 = (
  [
    (),
    (mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
  ]
)
img_tensor2 = transf2(img)
print(img_tensor2)

The calculation is:

C=(C-mean)/ std

C is all the pixel values for each channel, and the color image is a three-channel image (BGR), so mean and std are arrays of three numbers.

It's already regularized to [0,0, 0,1] when using (), so (0.0 - 0.5)/0.5 = -1.0 and (1.0 - 0.5)/0.5 = 1.0, so it's regularized to [-1.0, 1.0]

ADDED: Python: documenting a question about the difference between a direct conversion of an image to a sum

img = (img_path).convert("RGB")
img2 = .to_tensor(img)
print(img2)
img1 = (img)
print(img1)

The output looks like this:

Not only is the SHAPE different, but the values are also different.

The explanation is as follows:

tensor = torch.from_numpy(((path))).permute(2, 0, 1).float() / 255
tensor = .to_tensor((path)) # The two methods are the same

() to get HWC format, directly use numpy to convert to (h,w,c) format, and use to_tensor to get (c,h,w) format and the value is already except 255.

byte() is equivalent to to(torch.uint8), and () is converting the tensor to format.

It should be noted here that PIL and OPENCV image reading format are HWC format, the general model training using the CHW format, H for the Y axis is the vertical direction, W for the X axis horizontal direction.

and .to_tensor() has transform operations for all inputs.

The above is a personal experience, I hope it can give you a reference, and I hope you can support me more. If there is any mistake or something that has not been fully considered, please do not hesitate to give me advice.