If you see a video of Shake vlogger that you are particularly interested in and want to dump all of them, how do you do it? The following introduction describes how to use python to export all the video information of a specific user
packet analysis
Chrome Deveploer Tools Chrome Developer Tools
On the Jitterbug App side, copy the vlogger homepage address, for example:/kGcU4y/ Then, in the PC with chrome browser card, and simulate the phone, here choose the iPhone, and then copy the home page address, put the browser to visit, the page jumps to the/share/user/110677980134
Scroll down the home page, select the Network=>XHR tab, and see something like this request
:authority: :method: GET :path: /web/api/v2/aweme/post/?user_id=110677980134&sec_uid=&count=21&max_cursor=1561112910000&aid=1128&_signature=3Xf-nxAQgGfUO4SKisB.&dytk=061ae6e81229e178146aa674327eba89 :scheme: https accept: application/json accept-encoding: gzip, deflate, br accept-language: zh-CN,zh;q=0.9,en;q=0.8,ja;q=0.7,zh-TW;q=0.6,da;q=0.5 cookie: tt_webid=6690145457198417412; _ga=GA1.2.605400954.1557670882; _ba=BA0.2-20181226-5199e-GIJXgXk9ajNkyFhmv7Wy; _gid=GA1.2.1914501522.1562857517 referer: /share/user/110677980134 user-agent: Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1 x-requested-with: XMLHttpRequest
Screenshot of returned data
By analyzing the URL of an ajax request/web/api/v2/aweme/post/?user_id=110677980134&sec_uid=&count=21&max_cursor=1559299764000&aid=1128&_signature=3Xf-nxAQgGfUO4SKisB.&dytk=061ae6e81229e178146aa674327eba89 Deriving the request parameters mainly contains:
field | typology | clarification |
---|---|---|
user_id | int | Jitterbug account ID |
count | int | For the number of data items returned, use the default value of 21 |
max_cursor | int | Cursor of the request, each request takes the max_cursor returned by the previous request. |
aid | int | Use the default value of 11128 |
_signature | string | Parameter signatures on each request |
dytk | string | One parameter per request |
The method of obtaining the parameter:
/share/user/110677980134
(function() { $(function(){ __M.require('douyin_falcon:page/reflow_user/index').init({ uid: "110677980134", dytk: '061ae6e81229e178146aa674327eba89' }); }); })();
This parameter is obtained by the regular
- _signature Getting is more complicated, Jitterbug has obfuscated and compressed the front-end js code, it's not easy to analyze the algorithm process directly, but you can execute the signature algorithm code and return the corresponding signature result.
- Execution of js code can use nodejs or selenium webdriver, here we recommend the use of selenium webdriver, nodejs js execution environment and the browser has a difference in the results of the calculated signature, and can not be verified, selenium webdriver can call the local browser, the calculated signature can be consistent with the browser direct access to the calculated signature. selenium webdriver can call the local browser, and the calculated signature can be consistent with the signature calculated by the browser directly accessing the access.
- The js code after formatting, click to view, executes the js method _bytedAcrawler.sign("110677980134") to sign the parameters
Code implementation to export homepage video list
def get_user_video_list_by_uid(user_id, cursor=0): url = '/web/api/v2/aweme/post/?' sign, dytk = signature(user_id) tk_logger.info("sign:%s,dytk:%s" % (sign, dytk)) if sign is None or dytk is None: tk_logger.log("sign [%s] or dytk [%s] is none" % (sign, dytk)) return None headers = dict_merge(CHROME_HEADER, { "Accept": "application/json", "X-Requested-With": "XMLHttpRequest", }) params = { "user_id": user_id, "count": "21", "max_cursor": cursor, "aid": "1128", "_signature": sign, "dytk": dytk } res = (url, headers=headers, params=params) tk_logger.info("request url: %s" % ) content = ("utf8") jsn = (content) return jsn
Information about the list of acquired videos
Get video information code snippet
def get_video_detail_by_id(video_id): url = "/aweme/v1/aweme/detail/?version_code=6.5.0&pass-region=1&pass-route=1&js_sdk_version=1.16.2.7&app_name=aweme&vid=9D5F078E-A1A9-4F64-81C7-F89CA6A3B1DC&app_version=6.5.0&device_id=34712926793&channel=App%20Store&mcc_mnc=46011&aid=1128&screen_width=750&openudid=263bd93f02801d126ca004edccbff8f6e1b19f51&os_api=18∾=WIFI&os_version=12.3.1&device_platform=iphone&build_number=65014&device_type=iPhone9,1&iid=74239983401&idfa=F39B285A-4B4F-4874-9D7E-C728A892BF6D" data = {"aweme_id": video_id} headers = { "sdk-version": "1", "x-Tt-Token": "00fc1e7950db67b5f43a312e9265cdfee513ea70c36d918c871f3bb553347f3db50ffca143b8722327b345816a75efca071d", "User-Agent": "Aweme 6.5.0 rv:65014 (iPhone; iOS 12.3.1; en_CN) Cronet", "Content-Type": "application/x-www-form-urlencoded", "Cookie": "tt_webid=6636348554880222728; __tea_sdk__user_unique_id=6636348554880222728; odin_tt=76d9b82d6e6f2ddfc99719a5b5d44a7d703cf977f0f7bddf8537f93920d57cb9ec33162ee47868b760f6b09e69209bb2f90bad220b75678af850a0dfa9f056e2; install_id=74239983401; ttreq=1$dab0516952a4157c0c11d4993533c09d6e45fc94; sid_guard=fc1e7950db67b5f43a312e9265cdfee5%7C1559955316%7C5184000%7CWed%2C+07-Aug-2019+00%3A55%3A16+GMT; uid_tt=0afcb06309f632d872799ec0ac3b2c80; sid_tt=fc1e7950db67b5f43a312e9265cdfee5; sessionid=fc1e7950db67b5f43a312e9265cdfee5", "X-Khronos": "1559956401", "X-Gorgon": "8300000000002e40eee38cad71d14037bd1385d18bc973f094f5", } ret = {} res = (url, data=data, headers=headers) if res.status_code == 200: # tk_logger.info("video detail raw:%s" % ("utf8")) jsn = () detail = ("aweme_detail", {}) video_info = get_video_info(detail) user_info = get_user_info(detail) play_addr = get_play_address(detail) video_cover = get_video_cover(detail) ret["video_info"] = video_info ret["user_info"] = user_info ret["play_addr"] = play_addr ret["video_cover"] = video_cover else: raise TKException("get video detail failed [%s][%d]" % (url, res.status_code)) return ret
Download video code snippet
detail = get_video_detail_by_id(video_id) def download_video(detail): url = ("play_addr", {}).get("url_list", []) if len(url) == 0: raise TKException("cannot get video url list [%s]" % detail) url = url[0] folder = DOWNLOAD_DIR + '/' + ('user_info', {}).get("uid", "unknown") if not (folder): (folder) video_id = ('video_info', {}).get('statistics', {}).get('aweme_id') # filename = "%s/%s" % (folder, ("video_info", {}).get("desc", video_id) + ".mp4") filename = "%s/%s" % (folder, video_id + ".mp4") tk_logger.info("download video %s" % url) if (filename): file_size = get_remote_file_size(url) if file_size == (filename): tk_logger.info("file already downloaded, skip ...") return else: tk_logger.info("download file , file size:%d" % file_size) res = (url, headers=IOS_HEADER) if res.status_code == 200: with open(filename, "wb") as fp: for chunk in res.iter_content(chunk_size=1024): (chunk) else: raise TKException("download video [%s] failed [%d]" % (url, res.status_code))
Download Video
summarize
The above is a small introduction to the use of python to crawl the Jitterbug video list information ,I hope to help you, if you have any questions please leave me a message, I will reply to you in a timely manner. Here also thank you very much for your support of my website!
If you find this article helpful, please feel free to reprint it, and please note the source, thank you!