Example of python multithreaded crawling wallpaper website

Basic development environment

· Python 3.6

· Pycharm

Libraries to be imported

Target Page Analysis

The site is a static site, not encrypted and can be crawled directly

Overall Thoughts:

1, first in the list page to get each wallpaper details page address

2, in the wallpaper details page to get the wallpaper real HD url address

3. Save address

code implementation

Simulate the browser to request a web page and get the web page data

Here only the first 10 pages of data are selected to be crawled

The code is as follows

import threading
import parsel
import requests

def get_html(html_url):
 '''
 Get the source code of a web page
 :param html_url: page url
 :return.
 '''
 response = (url=html_url, headers=headers)
 return response


def get_par(html_data):
 '''
 Convert to selector object, parse and extract data
 :param html_data.
 :return: selector object
 '''
 selector = (html_data)
 return selector

def download(img_url, title):
 '''
 Save data
 :param img_url: image address
 :param title: title of the image
 :return.
 '''
 content = get_html(img_url).content
 path = 'Wallpaper\\\' + title + '.jpg'
 with open(path, mode='wb') as f:
  (content)
  print('Saving', title)

def main(url):
 '''
 Main function
 :param url: list page url
 :return.
 '''
 html_data = get_html(url).text
 selector = get_par(html_data)
 lis = ('.wb_listbox div dl dd a::attr(href)').getall()
 for li in lis:
  img_data = get_html(li).text
  img_selector = get_par(img_data)
  img_url = img_selector.css('.wb_showpic_main img::attr(src)').get()
  title = img_selector.css('.wb_pictitle::text').get().strip()
  download(img_url, title)
 end_time = () - s_time
 print(end_time)

if __name__ == '__main__':
 for page in range(1, 11):
  url = '/min/list-{}.html'.format(page)
  main_thread = (target=main, args=(url,))
  main_thread.start()

The above is python multi-threaded crawl wallpaper website example of the details, more information about python crawl wallpaper website please pay attention to my other related articles!