Comparison of Python's performance in reading image files

Python is used to read a video file saved on a local hard disk, which is encoded in the original RGBA format, i.e. the original uncompressed video file. At first, we used Python to process the read file data and display it on the Matplotlib window, but then we realized that the video playback speed is much slower than the C++ code with the same processing logic, so we tried different methods, and finally realized to read and display the video file in Python, and the frame rate can reach more than 120 FPS.

Read a frame of image data and display it on the window

The easiest way to do this is to read the file directly into Python and assign RGB values to the window on a pixel-by-pixel basis, starting with the pyplot component of matplotlib.

Some constants used:

FILE_NAME = "I:/"
WIDTH = 2096
HEIGHT = 150
CHANNELS = 4
PACK_SIZE = WIDTH * HEIGHT * CHANNELS

The width of each frame is 2096 pixels, the height is 150 pixels, and CHANNELS refers to the four channels of RGBA, so the size of PACK_SIZE is the number of bytes of space occupied by a picture.

First we need to read the file. Since the video is encoded without any compression, a video of about 70s (about 1.2M per frame, 60 frames per second) takes up about 4Gb of space, so we can't just read the whole file into memory, but we can read it with the help of Python'sfunctools offeredpartial method, we can read a small part of the data from the file each time, the partial with iter wrapped up into an iterative object, each time you read a frame of the picture, use next to read the data of the next frame, the next first to use this method will be saved in the file of a frame of data to read the display in the window.

with open( file, 'rb') as f:
  e1 = ()
  records = iter( partial( , PACK_SIZE), b'' ) # Generate an iterator
  frame = next( records ) # Read a frame of data
  img = ( ( HEIGHT, WIDTH, CHANNELS ), dtype = np.uint8)
  for y in range(0, HEIGHT):
    for x in range( 0, WIDTH ):
      pos = (y * WIDTH + x) * CHANNELS
      for i in range( 0, CHANNELS - 1 ):
        img[y][x][i] = frame[ pos + i ]
      img[y][x][3] = 255
  ( img )
  plt.tight_layout()
  plt.subplots_adjust(left=0, right=1, top=1, bottom=0)
  ([])
  ([])
  e2 = ()
  elapsed = ( e2 - e1 ) / ()
  print("Time Used: ", elapsed )
  ()

It should be noted that when saving a file the 4th channel saves the transparency and therefore has a value of 0, but when displaying in a matplotlib (including opencv) window the 4th channel saves the opacity in general. I've assigned the 4th channel directly to 255 in order to be able to display the image properly.

This displays a picture in our window, but because the width to length ratio of the picture is not coordinated, the window drawn with matplotlib has to be scaled to a large size in order for the picture to be displayed clearly.

In order to facilitate performance comparisons later on, it is standardized to use theopencv offeredgetTickCount method to measure the time spent. You can see from the console that displaying an image takes about 1.21s from reading the file to the final display. If we only measure the time spent in the three nested loops, we can see that 0.8s of time is wasted on the loops.

Reading and displaying a frame took 1.21s

0.8s on processing loop

The performance of the same code in C++ is perfectly fine, but not in Python, where the processing speed is 1.2 fps at most. In Python, the processing speed is at most 1.2 fps, so we're not going to consider any other optimizations, but instead we're going to dynamically display multi-frame images on the window to achieve the effect of playing a video.

Continuously read and display pictures

At this point we continue to read the file and display it on the window, in order to be able to display the image dynamically, we can use the To display images dynamically, the previous program needs to be changed accordingly.

fig = ()
ax1 = fig.add_subplot(1, 1, 1)
try:
  img = ( ( HEIGHT, WIDTH, CHANNELS ), dtype = np.uint8)
  f = open( FILE_NAME, 'rb' )
  records = iter( partial( , PACK_SIZE ), b'' )
  
  def animateFromData(i):
    e1 = ()
    frame = next( records ) # drop a line data
    for y in range( 0, HEIGHT ):
      for x in range( 0, WIDTH ):
        pos = (y * WIDTH + x) * CHANNELS
        for i in range( 0, CHANNELS - 1 ):
          img[y][x][i] = frame[ pos + i]
        img[y][x][3] = 255
    ()
    ( img )
    e2 = ()
    elapsed = ( e2 - e1 ) / ()
    print( "FPS: %.2f, Used time: %.3f" % (1 / elapsed, elapsed ))

  a = ( fig, animateFromData, interval=30 ) # Don't omit the assignment a = here #
  plt.tight_layout()
  plt.subplots_adjust(left=0, right=1, top=1, bottom=0)
  ([])
  ([])
  ()
except StopIteration:
  pass
finally:
  ()

Slightly different from part 1, our code for displaying each frame is in theanimateFromData function executed in the The function loops through each frame (passing interval = 30 to this function doesn't help because the processing can't keep up). It's also worth noting that you don't want to omit thea = ( fig, animateFromData, interval=30 ) The assignment operation on this line is not quite clear how it works, but when I put thea = When I deleted it, the program inexplicably didn't work properly.

The processing speed displayed in the console:

Since I don't know much about matplotlib, at first I thought that matplotlib was too slow in displaying the image, which caused the frame rate not to go up, but after printing out the code's time, I realized that it wasn't matplotlib's problem. So I also used PyQt5 to display the image, and the result was still 1 to 2 frames of processing speed. Since I just switched to the Qt interface for the display, the logic code is still the same as in provided methods, so there is no essential difference. This code for displaying images with Qt comes from thegithub matplotlib issue, I made some adaptations to it.

Using Numpy's array processing api

We know that the reason why displaying images is so slow is that Python takes a lot of time to process the 2096 * 150 two-level loop. Next, we'll switch to anumpy (used form a nominal expression)reshape method reads the pixel data from the file into memory. Note that the reshape method takes an ndarray object. My method of creating an ndarray array per frame of data may run the risk of a memory leak, and it's actually possible to call the reshape method on an ndarray array object. I won't go into that here.

Redefine a function for dynamically displaying imagesoptAnimateFromDataThe following is an example of how to pass theFuncAnimation：

def optAnimateFromData(i):
  e1 = ()
  frame = next( records ) # one image data
  img = ( ( list( frame ), dtype = np.uint8 ), ( HEIGHT, WIDTH, CHANNELS ) )
  img[ : , : , 3] = 255
  ()
  ( img )
  e2 = ()
  elapsed = ( e2 - e1 ) / ()
  print( "FPS: %.2f, Used time: %.3f" % (1 / elapsed, elapsed ))

a = ( fig, optAnimateFromData, interval=30 )

The effect is as follows, and you can see that using thenumpy (used form a nominal expression)reshape After the method, the processing time is significantly reduced and frame rates of 8-9 frames can be achieved. However, the optimized processing speed is still slow:

Optimized code execution results

Using Numpy's memmap

In the course of doing machine learning in Python, I found that many computationally heavy programs could be run if they were entirely in Python, so I was convinced that I could solve my problem in Python. I was convinced that I could solve my problem in Python, and in the course of my efforts I found the Numpy programmemmap api, which creates an array mapping from hard disk files to memory, making the program a bit simpler after using this API:

("file")
count = 0
start = ()
try:
  number = 1
  while True:
    e1 = ()
    img = (filename=FILE_NAME, dtype=np.uint8, shape=SHAPE, mode="r+", offset=count )
    count += PACK_SIZE
    ( "file", img )
    e2 = ()
    elapsed = ( e2 - e1 ) / ()
    print("FPS: %.2f Used time: %.3f" % (number / elapsed, elapsed ))
    key = (20)
    if key == 27: # exit on ESC
      break
except StopIteration:
  pass
finally:
  end = ()
  print( 'File Data read: {:.2f}Gb'.format( count / 1024 / 1024 / 1024), ' time used: {:.2f}s'.format( end - start ) )
  ()

Display the img of the data read by the memmap directly in the window.( "file", img)The time taken to display the frame is printed for each frame, and finally the total time and size of the data read is displayed:

Implementation of the most efficient results

The readout is very fast, taking only a few milliseconds per frame. This processing speed is perfectly suited for 60FPS.

summarize

Python is a very easy language to write programs in, but native Python code does not execute as efficiently as C++, although it is still faster than JS. When developing performance-critical programs in Python, you can either use libraries like Numpy, or you can write your own code using theWriting a C Library for Python to CallIn my experiments, I also used Flask to read files and send them to the browser as a stream. During my experiments, I also used Flask to read the file and send it as a stream to the browser, allowing the JS file to be displayed in the browser, but again, there were serious performance issues and memory leaks. I'll save that for later.

The corresponding code in this article can be found in thegithub View on.

Reference

functools

partial

opencv

matplotlib animation

numpy

numpy reshape

memmap

matplotlib issue on github

C Language Extensions

This is the whole content of this article.