Example of python crawling a wallpaper site

The URL used for this crawl is:

/: The Other Side Desktop. There are a lot of good-looking wallpapers in there, and they are all downloadable in HD lossless, which is quite good, so I'm practicing with this site.

As a beginner, at the beginning, regardless of the quality of the code, as long as the code can be run correctly and completely, it will be able to make yourself happy, as in our games, we will be more interested in playing if we can get positive feedback in a short time.

The same is true for learning, as long as we can get feedback from our learning in the short term, then our desire to learn is also strong.

As a rookie, to be able to complete this crawler program in its entirety would be one of the greatest rewards, but I actually gained much more than that in this process.

Good code should actually have the following characteristics

Ability to meet the most critical needs
easily understood
Fully annotated
Use standardized naming
No apparent security issues
Fully tested

As an example of adequate testing, often write code should know that although most of the time your code is not buggy, but that just means that most of the time it is stable, but under certain conditions it will be wrong (to reach the error conditions, there are logic problems, etc.). That's for sure. As for what causes, different codes have different reasons. If the code program is perfected once, then the software of the software we use would not be updated so often. I won't go into all the other reasons.
I have known for a long time that...

5 Characteristics Good Code Generally Possesses

1. Easy maintenance
2. Reusable
3. Scalable
4. Strong flexibility
5. Robustness

After running my code I found that the time complexity is relatively large, so this is where I will improve, but there is more to it than that. There are also a lot of unreasonable utilization of the place, as for the existence of the shortcomings of the place to wait for me to slowly enhance the improvement of it!

Passing by the big brother welcome to leave your valuable code changes, the

The complete code is as follows

import os
import bs4
import re
import time
import requests
from bs4 import BeautifulSoup

def getHTMLText(url, headers):
  """Make a request to the target server and return a response."""
  try:
    r = (url=url, headers=headers)
     = r.apparent_encoding
    soup = BeautifulSoup(, "")
    return soup
  except:
    return ""

def CreateFolder():
  """Create storage data folder"""
  flag = True
  while flag == 1:
    file = input("Please enter the name of the folder where the data is saved:")
    if not (file):
      (file)
      flag = False
    else:
      print('This file already exists, please re-enter')
      flag = True

  # (file) Get the absolute path of the folder
  path = (file) + "\\"
  return path

def fillUnivList(ulist, soup):
  """Get the original page for each image."""
  # [0] makes the obtained ul be of type <class ''>''
  div = soup.find_all('div', 'list')[0]
  for a in div('a'):
    if isinstance(a, ):
      hr = ['href']
      href = (r'/desk/[1-9]\d{4}.htm', hr)
      if bool(href) == True:
        (href[0])

  return ulist

def DownloadPicture(left_url,list,path):
  for right in list:
    url = left_url + right
    r = (url=url, timeout=10)
     = r.apparent_encoding
    soup = BeautifulSoup(,"")
    tag = soup.find_all("p")
    # Get the alt attribute of the img tag, name the saved image
    name = tag[0].['alt']
    img_name = name + ".jpg"
    # Get information about the picture
    img_src = tag[0].['src']
    try:
      img_data = (url=img_src)
    except:
      continue

    img_path = path + img_name
    with open(img_path,'wb') as fp:
      (img_data.content)
    print(img_name, " ****** Download complete!")

def PageNumurl(urls):
  num = int(input("Please enter the number of page numbers crawled to:"))
  for i in range(2,num+1):
    u = "/index_" + str(i) + ".htm"
    (u)

  return urls


if __name__ == "__main__":
  uinfo = []
  left_url = ""
  urls = ["/"]
  headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36"
  }
  start = ()
  # 1. Create a folder to store data
  path = CreateFolder()
  # 2. Determine the number of pages to be crawled and return links to each page
  PageNumurl(urls)
  n = int(input("Starting page of the visit:"))
  for i in urls[n-1:]:
    # 3. Get the first page data text for each page
    soup = getHTMLText(i, headers)
    # 4. Visit the link to the page where the original image is located and return the link to the image
    page_list = fillUnivList(uinfo, soup)
    # 5. Download the original image
    DownloadPicture(left_url, page_list, path)

  print("All downloads complete!", "Together." + str(len((path))) + "A picture.")
  end = ()
  print("Total Time Consumption" + str(end-start) + "Seconds.")

(of a computer) run

Some of the results of the display are shown below:

Above is python crawl wallpaper website example of the details, more information about python crawl wallpaper website please pay attention to my other related articles!