SoFunction
Updated on 2024-11-12

python capture QQ screenshot files uploaded in blogs

Hey, when I wrote a blog post before, I didn't pay attention to it, and some pictures were intercepted with QQ, and the file names of the pictures obtained were similar to the form of QQ screenshot 20120926174732-300×, and I found out yesterday when I backed up my website files with ftp that the Chinese names displayed in flashfxp were garbled, and it looked hard to bear, so I wrote a small python script to crawl the the entire site, and then get the picture name of each article page, and determine if it is similar to the QQ screenshot 20120926174732-300× form on the output and the address of the picture and the corresponding article address is saved in the file, and then through the file to modify one by one.

Okay, here's the program code:

import urllib2
from bs4 import BeautifulSoup
import re
import sys
reload(sys)
('utf-8')
 
baseurl = "https:///"
# To clarify, the starting address is the address of the first post, and the page through which the post was made is
# Can use the BeautifulSoup module to get the address of the previous post
 
file = open(r"E:\","a")
 
def pageloop(url):
  page = (url)
  soup = BeautifulSoup(page)
  img = (['img'])
  if img == []:
    print "There are no pictures on the current page."
    return
  else:
    for myimg in img:
      link = ('src')
      print link
      
      pattern = (r'QQ\S*[0-9]*png')
      badimg = (str(link))
      if badimg:
        print url
        (link + "\n")
        (url+"\n")
      
 
 
def getthenextpage(url):
  pageloop(url)
  page = (url)
  soup = BeautifulSoup(page)
  for spanclass in (attrs={"class" : "article-nav-prev"}):
    #print spanclass
    if ('article-nav-prev') != -1:
      pattern = (r'https:///\S*html')
      pageurl = (str(spanclass))
      for i in pageurl:
        #print i
        getthenextpage(i)
       
      
 
getthenextpage(baseurl)
 
 
 
print "the end!"
()

Finally, and I just started to do the site students say, picture naming, it is best to use the digital form or the form of English, Pinyin, or in the end to modify the words will be troublesome, so it is best to start from the beginning to develop good habits, with the correct naming specification to ask the article, the picture to name, so it will be a lot better.