The idea of using Python to download the major V videos of Jitterbug in detail

preamble

The text and images in this article from the network, for learning, communication purposes only, does not have any commercial purposes, if there is any problem, please contact us in a timely manner in order to deal with.

The following article is from Python Seven , by somenzz

Python crawler, data analytics, web development and other case study tutorial videos to watch online for free

/523606542

Last time I wrote about the way to batch download Zhihu videos with Python, this time I'll share the way to batch download all the watermark-free videos from Jitterbug's personal homepage with Python. The focus of this article is not to provide a good script, but to talk about how to write such a script. As the saying goes, it's better to teach someone to fish than to teach him to fish, and the so-called crawler is basically the same set of rules.

reasoning

First of all, the idea, to batch download the video, you can first try to successfully download a, to make sure there is no watermark, and then write a loop for batch download.

Difficulty: Downloading a video may be very simple, but downloading more than one is slightly more complicated, you need to capture the url corresponding to more than one video, Jitterbug has made anti-climbing measures, only allowing cell phones to see the video list of the personal homepage, but not the computer side of the web page, which requires the capture of cell phone's https packets, with the help of Burpsuite to capture packets here.

Here used Burpsuite, so I put their own commonly used Burpsuite 2.1.06 professional version on the net disk, public number "Python seven" reply "burp" to get, download and run start_burp.bat or sh start_burp.sh can be a one-click start, no need to buy a license, very convenient.

Crawling individual videos

Find a Jitterbug video link, click share, copy the link, open it with on your computer, then open developer tools and click network option.
Refresh, look at the interfaces, and find the interface that has the playback address in the return value:

There's a play_addr in there, and there's a urllist inside of it. We copy this urllist[0] and open it in the browser, and the site jumps to the real play address, and you can see the download button at the same time:

I downloaded this video and realized it is with watermark, how to download the video without watermark? I searched online, the way is to change playwm in the above urllist[0] to play and it will work.

Then start writing code to get this urllist[0] and download the

def get(share_url) -> dict:
  """
  share_url -> Jitterbug Video Sharingurl
  Return Format [{'url':'', 'title','format':'',},{}]
  """
  data = []
  headers = {
    'accept': 'application/json',
    'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1'
  }
  api = "/web/api/v2/aweme/iteminfo/?item_ids={item_id}"

  rep = (share_url, headers=headers, timeout=10)
  if :
    # item_id
    item_id = (r'video/(\d+)', )
    if item_id:
      item_id = item_id[0]
      # video info
      rep = ((item_id=item_id), headers=headers, timeout=10)
      if  and ()["status_code"] == 0:
        info = ()["item_list"][0]
        tmp = {}
        tmp["title"] = info["desc"]

        #de-watermarked video links
        play_url = info["video"]["play_addr"]["url_list"][0].replace('playwm', 'play')
        tmp["url"] = play_url
        tmp["format"] = 'mp4'
       def get(share_url) -> dict:
  """
  share_url -> Jitterbug Video Sharingurl
  Return Format [{'url':'', 'title','format':'',},{}]
  """
  data = []
  headers = {
    'accept': 'application/json',
    'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1'
  }
  api = "/web/api/v2/aweme/iteminfo/?item_ids={item_id}"

  rep = (share_url, headers=headers, timeout=10)
  if :
    # item_id
    item_id = (r'video/(\d+)', )
    if item_id:
      item_id = item_id[0]
      # video info
      rep = ((item_id=item_id), headers=headers, timeout=10)
      if  and ()["status_code"] == 0:
        info = ()["item_list"][0]
        tmp = {}
        tmp["title"] = info["desc"]

        #de-watermarked video links
        play_url = info["video"]["play_addr"]["url_list"][0].replace('playwm', 'play')
        tmp["url"] = play_url
        tmp["format"] = 'mp4'
        (tmp)

  return data

if __name__ =='__main__':
  videos = get('/share/video/6920538027345415431/?region=&mid=6920538030852885262&u_code=48&titleType=title&did=0&iid=0')
  for video in videos:
    (video['url'],video['title'],video['format'],'./download') (tmp)

  return data

if __name__ =='__main__':
  videos = get('/share/video/6920538027345415431/?region=&mid=6920538030852885262&u_code=48&titleType=title&did=0&iid=0')
  for video in videos:
    (video['url'],video['title'],video['format'],'./download')

Here the function, and the previous Zhihu video to download the function in the same, here will not post the code.

Get a link to your personal homepage video

The first two steps have enabled the watermark-free download of individual Shake Shack videos, and now all we have to do is find a large number of such links and just loop through them.

Anyone who opens the personal homepage of a big V, shares, copies the link, and opens it using a browser can't see a single video, while using the Shake App you can see it:

browser (software)

Jitterbug APP

It means that Jitterbug has made certain restrictions to prevent seeing information from multiple videos from the browser. This is where you need to learn to grab packets from the mobile APP to see how http requests are initiated on the phone, and then use the program to simulate it.

I've been using BurpSuite (hereafter referred to as Burp) which works very well, so I'll share how it works here by the way:

1. Run Burp

After downloading and running start_burp.bat or sh start_burp.sh to start Burp, then open the proxy settings and bind to the IP of the machine running Burp as shown below:

Be careful not to set the ip to 127.0.0.1, if you set it this way, only local requests can use the proxy, and the cell phone can't connect to this proxy.

2、Mobile phone setting agent

If your phone and computer are connected to the same wifi, the operation for IPhone is as follows: then go to Settings -> WLAN -> click the information symbol on the right side of the same wifi, and then drop down, click Configure Proxy, and configure the same ip and port as BurpSuite, and the settings are similar to that of the Android phone. Now you can capture http traffic from your cell phone on BurpSuite.

3. Download Burp's certificate from your cell phone and set up trust.

Mobile Browser Go to http://burp. Click CA to download the certificate. Settings->General->Description Files->Click PortSwigger CA->Installation Settings->General->About This Machine->Certificate Trust Settings to turn BurpSuite's certificate on.

This will grab the https packets initiated on the phone.

4. Setting up BurpSuite interrupts

After this step is set up, requests on the phone will block here, and you can either release the option to release, or modify the packet and release it, or send it to repeater for subsequent replay of the request, so requests from the front-end are not trusted.

Now open the Shake App on your phone, here will appear a large number of requests blocked here, we choose to release, you will find the data in the Shake App step by step. Quickly brush to the personal home page of the video before the request will be sent to the Repeater, as shown in the figure below:

Then open the Repeater tab of BurpSuite to see the request that was just sent, at which point we select Replay, look at the data, and decide which interface we need to use, as shown below:

found that this interface to meet the request, here you can see the interface url, headers of the various parameters, headers in the User-Agent parameter, is to distinguish between the client is a browser or an important identification of the App, so you can write code to simulate the request, and then get the required batch download link.

Since there are so many parameters in the url, some of which are fixed, and some of which change with the parameters of different people's home pages, if you're just using them yourself, you can simply extract these url links through regular expressions, and then just do a batch download.

If you want to write a good script for others to use, then you need to do more work, for example, you need to see more api, in order to determine the url and headers in the parameter is how to get or generate, and then write a script to automate the process, in some cases, also involves encrypted obfuscation and other anti-crawling measures, will not be expanded, please interested readers to explore on their own.

final words

The key to crawling the video is to find the playback address of the video, with the playback address, even if you do not write the code, you can use the browser to download, looking for the playback address is not enough to consider whether it can be watermarked, if you want to batch downloads, it is necessary to know how to get more video links, in the browser can not be crawled, consider using BurpSuite to capture the cell phone's traffic packets, to further extract the interface data, or simulate cell phone requests, to engage in crawling students, BurpSuite is a Swiss army knife, very practical.

If this article was helpful to you, please like it or read it again, thanks for the support.

To this point this article on the use of Python to download jitterbugs major V video ideas detailed article is introduced to this, more related Python download jitterbugs video content please search for my previous articles or continue to browse the following related articles I hope that you will support me more in the future!