Environment: python 2.7 + win10
Tools: fiddler postman android emulator
First of all, open fiddler, fiddler as http/https packet grabber, here is not much introduction.
Configure to allow https
Configure to allow remote connections, i.e. turn on the http proxy.
Computer ip: 192.168.1.110
Then Make sure the phone and computer are under a LAN and can communicate. Since I don't have an Android phone on my side, I used an Android emulator instead, and it works the same.
Open the mobile browser, enter 192.168.1.110:8888, that is, set the proxy address, after the installation of the certificate in order to capture the packet
After installing the certificate, modify the network in WiFi settings to manually specify the http proxy.
After saving, fiddler can catch the app's data, open the hand Refresh, you can see a lot of http requests come in, the general interface address and so on is very obvious, you can see it is the json type of the
http post request, the return data is json , expanded to find a total of 20 video information, first to ensure that it is correct, find a video link to see.
ok it's playable it's clean and no watermarks.
So now open up postman and simulate this post to see if there are any test parameters.
There are so many parameters, and I thought the client_key and sign would be validated... I thought the client_key and sign would be validated... but then I realized I was wrong and nothing was validated, so I just submitted it...
Form-data submission results in an error
Then switch to raw.
The error message is different. Try adding headers.
nice Successful return data, I tried a few more times, and found that each time the return result is not the same, are 20 videos, just one of the post parameter there is a page=1 may have been the first page, just like has been in the cell phone does not turn down the start has been refreshing that, anyway It does not matter, as long as the return data is not duplicated on the good.
Here's the code.
# -*-coding:utf-8-*- # author : Corleone import urllib2,urllib import json,os,re,socket,time,sys import Queue import threading import logging # Log module logger = ("AppName") formatter = ('%(asctime)s %(levelname)-5s: %(message)s') console_handler = () console_handler.formatter = formatter (console_handler) () video_q = () # Video queue def get_video(): url = "http://101.251.217.210/rest/n/feed/hot?app=0&lon=121.372027&c=BOYA_BAIDU_PINZHUAN&sys=ANDROID_4.1.2&mod=HUAWEI(HUAWEI%20C8813Q)&did=ANDROID_e0e0ef947bbbc243&ver=5.4&net=WIFI&country_code=cn&iuid=&appver=5.4.7.5559&max_memory=128&oc=BOYA_BAIDU_PINZHUAN&ftt=&ud=0&language=zh-cn&lat=31.319303 " data = { 'type': 7, 'page': 2, 'coldStart': 'false', 'count': 20, 'pv': 'false', 'id': 5, 'refreshTimes': 4, 'pcursor': 1, 'os': 'android', 'client_key': '3c2cd3f3', 'sig': '22769f2f5c0045381203fc57d1b5ad9b' } req = (url) req.add_header("User-Agent", "kwai-android") req.add_header("Content-Type", "application/x-www-form-urlencoded") params = (data) try: html = (req, params).read() except : (u"Network is unstable. Retrying access.") html = (req, params).read() result = (html) reg = (u"[\u4e00-\u9fa5]+") # Match Chinese only for x in result['feeds']: try: title = x['caption'].replace("\n","") name = " ".join((title)) video_q.put([name, x['photo_id'], x['main_mv_urls'][0]['url']]) except KeyError: pass def download(video_q): path = u"D:\ Racer." while True: data = video_q.get() name = data[0].replace("\n","") id = data[1] url = data[2] file = (path, name + ".mp4") (u"Downloading: %s" %name) try: (url,file) except IOError: file = (path, u"Nuts."+ '%s.mp4') %id try: (url, file) except (,): (u"Request disconnected. Dormant for 2 seconds.") (2) (url, file) (u"Download complete: %s" % name) video_q.task_done() def main(): # Use help try: threads = int([1]) except (IndexError, ValueError): print u"\n usage:" + [0] + u" [Number of threads:10] \n" print u"For example:" + [0] + " 10" + u" Crawl Video opens10thread Crawl once a day (math.) linear (of degree one)2000Around one video(space between)" return False # Judge the catalog if (u'D:\ Racer') == False: (u'D:\ Racer') # Parsing web pages (u"Crawling the web.") for x in range(1,100): (u"%s request." % x) get_video() num = video_q.qsize() (u"Total %s of video" %num) # Multi-threaded downloads for y in range(threads): t = (target=download,args=(video_q,)) (True) () video_q.join() (u"----------- All have been crawled ---------------") main()
Tested below
Multi-threaded downloads Download about 2000 videos at a time Default downloads go to D:\Shortcuts
Okay, that's it for this one. It's actually quite simple. I can't believe Racer isn't encrypted. Because when I climbed Jitterbug, I ran into a problem. .....
summarize
The above is a small introduction to the python crawler crawling fast video multi-threaded download, I hope to help you, if you have any questions please leave me a message, I will reply to you in time. Here also thank you very much for your support of my website!