Example of hands-on Python crawling for knowledge network material images

[I. Project background]

In the material network want to find the right picture need page by page down, now learn python can use the program to save all the pictures, slowly pick the right picture.

[II. Project objectives]

1. Get the source code of the web page according to the given URL.

2、Use regular expressions to filter out the image address in the source code.

3. Filter out the image address to download the material image.

[III. Libraries and websites involved]

1. The web site is listed below:

https:///

2, the library involved: requests, lxml

[IV. Project analysis]

First need to solve the problem of how to request the URL of the next page. You can click on the button for the next page and observe the changes in the website as shown below respectively:

https:///so-sucai/
https:///so-sucai/1789243/p_2/
https:///so-sucai/1789243/p_3/

We can find that the number of image pages is 1789243/p{}, and the p{} parenthetical number indicates which page the image is on.

[V. Project implementation]

1, open the foraging network, in the search, enter the picture material you want (to the Year of the Rat material picture as an example).

一篇文章教会你利用Python网络爬虫获取素材图片

2, according to the previous step of the analysis of the URL, we first define a class called ImageSpider, the class inside the definition of the initialization function, send a request to obtain the response data function, parsing function, the main function. First initialization function, prepare the url address and headers, the code is shown below.

一篇文章教会你利用Python网络爬虫获取素材图片

3. Send a request to get the response data function.

一篇文章教会你利用Python网络爬虫获取素材图片

4, parse the data, use xpath to get the secondary page link, and finally store the image in the folder. Use Google Chrome to select developer tools or directly press F12, found that we need the image src is under the img tag, so use Python's requests to extract the component.

一篇文章教会你利用Python网络爬虫获取素材图片

5, the main function, the code is shown below.

一篇文章教会你利用Python网络爬虫获取素材图片

[VI. Effectiveness]

1, run the program, enter the number of pages you want to crawl in the console, as shown below.

一篇文章教会你利用Python网络爬虫获取素材图片

2, in the local can see the effect of the map, as shown below.

一篇文章教会你利用Python网络爬虫获取素材图片

[VII. Summary]

1, it is not recommended to grab too much data, easy on the server load, shallow can be.

2, I hope that this program will help you to download the material images.

3, this article is based on Python web crawler, the use of reptile libraries, the realization of the material picture acquisition. The realization of the time, there will always be a variety of problems, do not be too high and low, diligent hands, in order to understand more profound.

To this point this article on the practical Python crawl for knowledge of the network material picture example of the article is introduced to this, more related Python crawl for knowledge of the network material picture content, please search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!