SoFunction
Updated on 2024-11-14

How to realize real-time progress bar display when crawling video with python crawler

Preface:

When crawling and downloading a video from a web page, we need real-time progress bars, which can help us visualize the video'sDownload progress

I. All code display

from contextlib import closing
from requests import get
url = '/57cdd29ee3a718825bf7b1b14d63955b/615d475f/video/tos/cn/tos-cn-ve-15/72c47fb481464cfda3d415b9759aade7/?a=6383&br=2192&bt=2192&cd=0%7C0%7C0&ch=26&cr=0&cs=0&cv=1&dr=0&ds=4&er=&ft=jal9wj--bz7ThWG4S1ct&l=021633499366600fdbddc0200fff0030a92169a000000490f5507&lr=all&mime_type=video_mp4&net=0&pl=0&qs=0&rc=ank7OzU6ZnRkNjMzNGkzM0ApNmY4aGU8MzwzNzo3ZjNpZWdiYXBtcjQwLXNgLS1kLTBzczYtNS0tMmE1Xi82Yy9gLTE6Yw%3D%3D&vl=&vr='
with closing(get(url,  stream=True)) as response:
    chunk_size = 1024  # Maximum single request
    # ['content-length'] gets a data type of str instead of int
    content_size = int(['content-length'])  # Total file size
    data_count = 0  # Size currently transferred
    with open('filename.mp4', "wb") as file:
        for data in response.iter_content(chunk_size=chunk_size):
            (data)
            done_block = int((data_count / content_size) * 50)
            # Size of files already downloaded
            data_count = data_count + len(data)
            # Real-time progress bar progress
            now_jd = (data_count / content_size) * 100
            # %% denotes %
            print("\r [%s%s] %d%% " % (done_block * '█', ' ' * (50 - 1 - done_block), now_jd), end=" ")

Note: The above url has expired, you need to find the video url on the webpage by yourself!

II. Explanations

closing

We often use the sentence with open() as f: when reading file resources on a daily basis.

But the use of the with statement isRequirementss, any object that correctly implements context management can use the with statement, and implementing context management is done through the__enter__cap (a poem)__exit__These two methods implement it.

with usage (no context management implemented)

class Door():
    def open(self):
        print('Door is opened')
    def close(self):
        print('Door is closed')
with Door() as d:
    ()
    ()

The result was an error report:

with usage (implements context management)

Use __enter__cap (a poem)__exit__ implements context management

class Door():
    def open(self):
        print('Door is opened')
    def close(self):
        print('Door is closed')
with Door() as d:
    ()
    ()

It turned out not to report an error:

closing usage (perfect solution to the above problem)

An object has no implementation context, and we can't use it for thewithstatement. This time, you can use thecontextlibhit the nail on the head

closing()replace sb. withObjects become context objects

class Door():
    def __enter__(self):
        print('Begin')
        return self
 
    def __exit__(self, exc_type, exc_value, traceback):
        if exc_type:
            print('Error')
        else:
            print('End')
    def open(self):
        print('Door is opened')
    def close(self):
        print('Door is closed')
with Door() as d:
    ()
    ()

For example, use get(url) in requests with the with statement

That is, in the case of this article, using with closing() to download the video (in a web page)

2. File stream

Imagine if you compare file reading to pumping water into a pool, synchronous will block the program and asynchronous will wait for the result, what if the pool is very large?

So with file streaming, it's like you're pumping and fetching at the same time, and you don't have to wait for the pool to get full before you use it.

So for some large files (several gigabytes of video) this parameter is generally used.(can also be used for small documents)

['content-length']

This means to get the total size of the file, but the data type of the result it gets is str instead of int, so a data type conversion is needed.

.iter_content()

This method is generally used to download files and pages from the web (requires (url))

Where chunk_size indicates the maximum value for a single request.

5.\r and %

\r for carriage return (back to the beginning of the line)

% is a placeholder

In the case of %%, the first % acts as an escape so that the result is output as a percent sign %%

III. Presentation of results

IV. Summary

I have looked at many progress bars before, these progress bars can move, but can't meet the loading according to the content of the file (the parameters in it are either all fixed, or have nothing to do with the size of the file), and can't do the real interactive function, and this time the progress bar shows it very well, so you can go and try it!

This time, downloading the video to show the progress bar is arguing against a url that you can add to your crawler's loop so that you can show a real-time progress bar as you crawl each video!

to this article on how to realize the python crawler crawling video to achieve real-time progress bar display of the article is introduced to this, more related python crawling display progress bar content please search for my previous posts or continue to browse the following related articles I hope you will support me in the future more!