SoFunction
Updated on 2024-11-17

Python implementation of Youku video batch download function

Some time ago, due to the need to collect video data, I made a YouKu video batch download program. Something although simple, but also quite practical, take out to share with you.

Version: Python2.7+BeautifulSoup3.2.1

import urllib,urllib2,sys,os
from BeautifulSoup import BeautifulSoup
import itertools,re
url_i =1
pic_num = 1
# Self-defined quote format conversion function
def _en_to_cn(str):
  obj = (['“','”'])
  _obj = lambda x: ()
  return (r"['\"]",_obj,str)
if __name__ == '__main__':
  #Download videos from 3 consecutive web pages
  while url_i <= 3:
    webContent = ("/focus/index/_page26716_" + str(url_i) + ".html")
    data = ()
    #Read video listings web page data with BeautifulSoup
    soup = BeautifulSoup(data)
    print "-------------------------Page " + str(url_i) + "-------------------------"
    # Get a list of the video thumbnail and title of the corresponding page
    tag_list_thumb = ('li','v_thumb')
    tag_list = ('li', "v_title")
    for item in tag_list:
      #Directs to the video playback page via herf in each thumbnail
      web_video_play = (['href'])
      data_vp = web_video_play.read()
      # Read video playback web page data with BeautifulSoup
      soup_vp = BeautifulSoup(data_vp)
      # Find the link for "Download".
      tag_vp_list = soup_vp.findAll('a', id = 'fn_download')
      for item_vp in tag_vp_list:
        #Save the download link to url_dw
        url_dw = '"' + item_vp['_href'] + '"'
        print ['title'] + ": " + url_dw
        # Invoke the command line to run iku to download the video, you need to add iku to the environment variable
        ("iku " + url_dw)
    #Savethumbnail for each video
    for item_thumb in tag_list_thumb:
      (item_thumb.img['src'], "E:\\\\ Download video\\\\thumbnails\\\" + str(pic_num) + "." +
                _en_to_cn(item_thumb.img['title']) + ".jpg")
      pic_num += 1
    print "--------------------------------------------------------------"
    print "--------Page " + str(url_i) + "'s video thumbnails have been saved!"
    url_i += 1

The idea of the program is very simple, that is, by parsing the web page data to find the corresponding video playback web page link, and then according to the playback page to find the download link, as shown in the following figure:

Since the download link from the web data must be downloaded via youku's own iku, the easiest way to do this was to use Python's command line (just iku download_link). This took me a while, but by some fluke I found that the command line of iku is very simple (directly iku download_link can be), so the easiest way is to use the command line interface in Python to call iku to download the video. Also note that the program needs to start iku before running, otherwise you will have to start it again after downloading a video.

PS: when you download the video will find that these domestic video web pages are really not fine enough, containing too many duplicate links and dead links, a little contempt.

The above is a small introduction to the implementation of Python Youku video batch download function, I hope to help you, if you have any questions please leave me a message, I will reply to you in a timely manner. I would also like to thank you very much for your support of my website!