The URL used for this crawl is:
/: The Other Side Desktop. There are a lot of good-looking wallpapers in there, and they are all downloadable in HD lossless, which is quite good, so I'm practicing with this site.
As a beginner, at the beginning, regardless of the quality of the code, as long as the code can be run correctly and completely, it will be able to make yourself happy, as in our games, we will be more interested in playing if we can get positive feedback in a short time.
The same is true for learning, as long as we can get feedback from our learning in the short term, then our desire to learn is also strong.
As a rookie, to be able to complete this crawler program in its entirety would be one of the greatest rewards, but I actually gained much more than that in this process.
Good code should actually have the following characteristics
- Ability to meet the most critical needs
- easily understood
- Fully annotated
- Use standardized naming
- No apparent security issues
- Fully tested
As an example of adequate testing, often write code should know that although most of the time your code is not buggy, but that just means that most of the time it is stable, but under certain conditions it will be wrong (to reach the error conditions, there are logic problems, etc.). That's for sure. As for what causes, different codes have different reasons. If the code program is perfected once, then the software of the software we use would not be updated so often. I won't go into all the other reasons.
I have known for a long time that...
5 Characteristics Good Code Generally Possesses
1. Easy maintenance
2. Reusable
3. Scalable
4. Strong flexibility
5. Robustness
After running my code I found that the time complexity is relatively large, so this is where I will improve, but there is more to it than that. There are also a lot of unreasonable utilization of the place, as for the existence of the shortcomings of the place to wait for me to slowly enhance the improvement of it!
Passing by the big brother welcome to leave your valuable code changes, the
The complete code is as follows
import os import bs4 import re import time import requests from bs4 import BeautifulSoup def getHTMLText(url, headers): """Make a request to the target server and return a response.""" try: r = (url=url, headers=headers) = r.apparent_encoding soup = BeautifulSoup(, "") return soup except: return "" def CreateFolder(): """Create storage data folder""" flag = True while flag == 1: file = input("Please enter the name of the folder where the data is saved:") if not (file): (file) flag = False else: print('This file already exists, please re-enter') flag = True # (file) Get the absolute path of the folder path = (file) + "\\" return path def fillUnivList(ulist, soup): """Get the original page for each image.""" # [0] makes the obtained ul be of type <class ''>'' div = soup.find_all('div', 'list')[0] for a in div('a'): if isinstance(a, ): hr = ['href'] href = (r'/desk/[1-9]\d{4}.htm', hr) if bool(href) == True: (href[0]) return ulist def DownloadPicture(left_url,list,path): for right in list: url = left_url + right r = (url=url, timeout=10) = r.apparent_encoding soup = BeautifulSoup(,"") tag = soup.find_all("p") # Get the alt attribute of the img tag, name the saved image name = tag[0].['alt'] img_name = name + ".jpg" # Get information about the picture img_src = tag[0].['src'] try: img_data = (url=img_src) except: continue img_path = path + img_name with open(img_path,'wb') as fp: (img_data.content) print(img_name, " ****** Download complete!") def PageNumurl(urls): num = int(input("Please enter the number of page numbers crawled to:")) for i in range(2,num+1): u = "/index_" + str(i) + ".htm" (u) return urls if __name__ == "__main__": uinfo = [] left_url = "" urls = ["/"] headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36" } start = () # 1. Create a folder to store data path = CreateFolder() # 2. Determine the number of pages to be crawled and return links to each page PageNumurl(urls) n = int(input("Starting page of the visit:")) for i in urls[n-1:]: # 3. Get the first page data text for each page soup = getHTMLText(i, headers) # 4. Visit the link to the page where the original image is located and return the link to the image page_list = fillUnivList(uinfo, soup) # 5. Download the original image DownloadPicture(left_url, page_list, path) print("All downloads complete!", "Together." + str(len((path))) + "A picture.") end = () print("Total Time Consumption" + str(end-start) + "Seconds.")
(of a computer) run
Some of the results of the display are shown below:
Above is python crawl wallpaper website example of the details, more information about python crawl wallpaper website please pay attention to my other related articles!