(journalism) lede
I remembered that I wrote some video downloaders on Chinese university MOOC a long time ago, but it seems that they have been out of repair. It just so happens that recently there is a need, so re-write a, by the way, up to share a wave, winter vacation you can also use it to download some of the course in the volume a bit:
Without further ado, let's start happily~
development tool
Python version: 3.7.8
Related modules:
DecryptLogin module;
tqdm module;
click module;
argparse module;
and some of python's own modules.
Environment Setup
Just install Python and add it to the environment variables, and pip install the relevant modules you need.
the pleasure of reading sth for the first time
Running style:
python --url Course Links
The effect is as follows:
moocdl
A randomly picked course for testing turned out to be in m3u8 format, so it was a bit slow to download. By default it will download all the courseware and all this stuff together into the corresponding directory as well.
Introduction to the Principle
First of all, we need to simulate the login to the Chinese University MOOC, so that we can download the corresponding course materials, here with the help of the public before the open-source DecryptLogin package is good:
'''Login''' def login(self, username, password): lg = () infos_return, session = lg.icourse163(username, password) return infos_return, session
Next, we briefly explain how to download the corresponding course materials. First of all, we need to get the basic information related to the course, which can be found directly on the return page by clicking on the course homepage:
The code to extract the course information we need is implemented as follows:
# Get information from the main course page url = ('learn/', 'course/') response = (url) term_id = (r'termId : "(\d+)"', )[0] course_name = ' - '.join((r'name:"(.+)"', )) course_name = (course_name) course_id = (r'https?:///(course|learn)/\w+-(\d+)', url)[0] print(f'The information obtained from the main course page is as follows:\n\t[course name]: {course_name}, [academic programID]: {course_name}, [TID]: {term_id}')
This information is then used to crawl the corresponding resource list:
# Get a list of resources resource_list = [] data = { 'tid': term_id, 'mob-token': self.infos_return['results']['mob-token'], } response = ('https:///mob/course/courseLearn/v1', data=data) course_info = () file_types = [1, 3, 4] for chapter_num, chapter in enumerate(course_info.get('results', {}).get('termDto', {}).get('chapters', [])): for lesson_num, lesson in enumerate(('lessons', [])) if ('lessons') is not None else []: for unit_num, unit in enumerate(('units', [])): if unit['contentType'] not in file_types: continue savedir = course_name (savedir) for item in [(chapter['name']), (lesson['name']), (unit['name'])]: savedir = (savedir, item) (savedir) if unit['contentType'] == file_types[0]: savename = (unit['name']) + '.mp4' resource_list.append({ 'savedir': savedir, 'savename': savename, 'type': 'video', 'contentId': unit['contentId'], 'id': unit['id'], }) elif unit['contentType'] == file_types[1]: savename = (unit['name']) + '.pdf' resource_list.append({ 'savedir': savedir, 'savename': savename, 'type': 'pdf', 'contentId': unit['contentId'], 'id': unit['id'], }) elif unit['contentType'] == file_types[2]: if ('jsonContent'): json_content = eval(unit['jsonContent']) savename = (json_content['fileName']) resource_list.append({ 'savedir': savedir, 'savename': savename, 'type': 'rich_text', 'jsonContent': json_content, }) print(f'Successful access to resource lists, quantities{len(resource_list)}')
Finally, just parse the download according to the resource type:
# Download the corresponding resources pbar = tqdm(resource_list) for resource in pbar: pbar.set_description(f'downloading {resource["savename"]}') # --Download video if resource['type'] == 'video': data = { 'bizType': '1', 'mob-token': self.infos_return['results']['mob-token'], 'bizId': resource['id'], 'contentType': '1', } while True: response = ('https:///mob/j/v1/', data=data) if ()['results'] is not None: break (0.5 + ()) signature = ()['results']['videoSignDto']['signature'] data = { 'enVersion': '1', 'clientType': '2', 'mob-token': self.infos_return['results']['mob-token'], 'signature': signature, 'videoId': resource['contentId'], } response = ('./mob/api/v1/vod/videoByNative', data=data) # ---- download video videos = ()['results']['videoInfo']['videos'] resolutions, video_url = [3, 2, 1], None for resolution in resolutions: for video in videos: if video['quality'] == resolution: video_url = video["videoUrl"] break if video_url is not None: break if '.m3u8' in video_url: self.m3u8download({ 'download_url': video_url, 'savedir': resource['savedir'], 'savename': resource['savename'], }) else: ({ 'download_url': video_url, 'savedir': resource['savedir'], 'savename': resource['savename'], }) # ---- download subtitles srt_info = ()['results']['videoInfo']['srtCaptions'] if srt_info: for srt_item in srt_info: srt_name = (resource['savename'])[0] + '_' + srt_item['languageCode'] + '.srt' srt_url = srt_item['url'] response = (srt_url) fp = open((resource['savedir'], srt_name), 'wb') () () # --Download PDF elif resource['type'] == 'pdf': data = { 't': '3', 'cid': resource['contentId'], 'unitId': resource['id'], 'mob-token': self.infos_return['results']['mob-token'], } response = ('http:///mob/course/learn/v1', data=data) pdf_url = ()['results']['learnInfo']['textOrigUrl'] ({ 'download_url': pdf_url, 'savedir': resource['savedir'], 'savename': resource['savename'], }) # --Download Rich Text elif resource['type'] == 'rich_text': download_url = 'http:///mob/course/?' + urlencode(resource['jsonContent']) ({ 'download_url': download_url, 'savedir': resource['savedir'], 'savename': resource['savename'], })
ok, the work is done, write a little brief, because there are some other things at night. You can try grabbing packets on your cell phone yourself, it's very simple~.
to this article on the use of Python to create a MOOC open course downloader article is introduced to this, more related Python open course downloader content, please search for my previous posts or continue to browse the following related articles I hope you will support me in the future!