Continuing to drum up crawlers, today I posted a code to crawl the original image under the "Beauty" tab on dot com.
# -*- coding: utf-8 -*- #--------------------------------------- # Program: Dot Pretty Picture Crawler # Version: 0.2 # By zippera # Date: 2013-07-26 # Language: Python 2.7 # Description: Can set the number of pages to download #--------------------------------------- import urllib2 import urllib import re pat = ('<div class="feed-big-img">\n.*?imgsrc="(ht.*?)\".*?') nexturl1 = "/tag/%E7%BE%8E%E5%A5%B3?page=" count = 1 while count < 2: print "Page " + str(count) + "\n" myurl = nexturl1 + str(count) myres = (myurl) mypage = () ucpage = ("utf-8") #Transcoding mat = (ucpage) if len(mat): cnt = 1 for item in mat: print "Page" + str(count) + " No." + str(cnt) + " url: " + item + "\n" cnt += 1 fnp = ('(\w{10}\.\w+)$') fnr = (item) if fnr: fname = fnr[0] (item, fname) else: print "no data" count += 1
How to use: create a new folder, save the code as a file, run python then you can download the image to the folder.