Making a MOOC Open Course Downloader with Python

(journalism) lede

I remembered that I wrote some video downloaders on Chinese university MOOC a long time ago, but it seems that they have been out of repair. It just so happens that recently there is a need, so re-write a, by the way, up to share a wave, winter vacation you can also use it to download some of the course in the volume a bit:

Without further ado, let's start happily~

development tool

Python version: 3.7.8

Related modules:

DecryptLogin module;

tqdm module;

click module;

argparse module;

and some of python's own modules.

Environment Setup

Just install Python and add it to the environment variables, and pip install the relevant modules you need.

the pleasure of reading sth for the first time

Running style:

python  --url Course Links

The effect is as follows:

moocdl

A randomly picked course for testing turned out to be in m3u8 format, so it was a bit slow to download. By default it will download all the courseware and all this stuff together into the corresponding directory as well.

Introduction to the Principle

First of all, we need to simulate the login to the Chinese University MOOC, so that we can download the corresponding course materials, here with the help of the public before the open-source DecryptLogin package is good:

'''Login'''
def login(self, username, password):
    lg = ()
    infos_return, session = lg.icourse163(username, password)
    return infos_return, session

Next, we briefly explain how to download the corresponding course materials. First of all, we need to get the basic information related to the course, which can be found directly on the return page by clicking on the course homepage:

The code to extract the course information we need is implemented as follows:

# Get information from the main course page
url = ('learn/', 'course/')
response = (url)
term_id = (r'termId : "(\d+)"', )[0]
course_name = ' - '.join((r'name:"(.+)"', ))
course_name = (course_name)
course_id = (r'https?:///(course|learn)/\w+-(\d+)', url)[0]
print(f'The information obtained from the main course page is as follows:\n\t[course name]: {course_name}, [academic programID]: {course_name}, [TID]: {term_id}')

This information is then used to crawl the corresponding resource list:

# Get a list of resources
resource_list = []
data = {
    'tid': term_id,
    'mob-token': self.infos_return['results']['mob-token'],
}
response = ('https:///mob/course/courseLearn/v1', data=data)
course_info = ()
file_types = [1, 3, 4]
for chapter_num, chapter in enumerate(course_info.get('results', {}).get('termDto', {}).get('chapters', [])):
    for lesson_num, lesson in enumerate(('lessons', [])) if ('lessons') is not None else []:
        for unit_num, unit in enumerate(('units', [])):
            if unit['contentType'] not in file_types: continue
            savedir = course_name
            (savedir)
            for item in [(chapter['name']), (lesson['name']), (unit['name'])]:
                savedir = (savedir, item)
                (savedir)
            if unit['contentType'] == file_types[0]:
                savename = (unit['name']) + '.mp4'
                resource_list.append({
                    'savedir': savedir,
                    'savename': savename,
                    'type': 'video',
                    'contentId': unit['contentId'],
                    'id': unit['id'],
                })
            elif unit['contentType'] == file_types[1]:
                savename = (unit['name']) + '.pdf'
                resource_list.append({
                    'savedir': savedir,
                    'savename': savename,
                    'type': 'pdf',
                    'contentId': unit['contentId'],
                    'id': unit['id'],
                })
            elif unit['contentType'] == file_types[2]:
                if ('jsonContent'):
                    json_content = eval(unit['jsonContent'])
                    savename = (json_content['fileName'])
                    resource_list.append({
                        'savedir': savedir,
                        'savename': savename,
                        'type': 'rich_text',
                        'jsonContent': json_content,
                    })
print(f'Successful access to resource lists, quantities{len(resource_list)}')

Finally, just parse the download according to the resource type:

# Download the corresponding resources
pbar = tqdm(resource_list)
for resource in pbar:
    pbar.set_description(f'downloading {resource["savename"]}')
    # --Download video
    if resource['type'] == 'video':
        data = {
            'bizType': '1',
            'mob-token': self.infos_return['results']['mob-token'],
            'bizId': resource['id'],
            'contentType': '1',
        }
        while True:
            response = ('https:///mob/j/v1/', data=data)
            if ()['results'] is not None: break
            (0.5 + ())
        signature = ()['results']['videoSignDto']['signature']
        data = {
            'enVersion': '1',
            'clientType': '2',
            'mob-token': self.infos_return['results']['mob-token'],
            'signature': signature,
            'videoId': resource['contentId'],
        }
        response = ('./mob/api/v1/vod/videoByNative', data=data)
        # ---- download video
        videos = ()['results']['videoInfo']['videos']
        resolutions, video_url = [3, 2, 1], None
        for resolution in resolutions:
            for video in videos:
                if video['quality'] == resolution:
                    video_url = video["videoUrl"]
                    break
            if video_url is not None: break
        if '.m3u8' in video_url:
            self.m3u8download({
                'download_url': video_url,
                'savedir': resource['savedir'],
                'savename': resource['savename'],
            })
        else:
            ({
                'download_url': video_url,
                'savedir': resource['savedir'],
                'savename': resource['savename'],
            })
        # ---- download subtitles
        srt_info = ()['results']['videoInfo']['srtCaptions']
        if srt_info:
            for srt_item in srt_info:
                srt_name = (resource['savename'])[0] + '_' + srt_item['languageCode'] + '.srt'
                srt_url = srt_item['url']
                response = (srt_url)
                fp = open((resource['savedir'], srt_name), 'wb')
                ()
                ()
    # --Download PDF
    elif resource['type'] == 'pdf':
        data = {
            't': '3',
            'cid': resource['contentId'],
            'unitId': resource['id'],
            'mob-token': self.infos_return['results']['mob-token'],
        }
        response = ('http:///mob/course/learn/v1', data=data)
        pdf_url = ()['results']['learnInfo']['textOrigUrl']
        ({
            'download_url': pdf_url,
            'savedir': resource['savedir'],
            'savename': resource['savename'],
        })
    # --Download Rich Text
    elif resource['type'] == 'rich_text':
        download_url = 'http:///mob/course/?' + urlencode(resource['jsonContent'])
        ({
            'download_url': download_url,
            'savedir': resource['savedir'],
            'savename': resource['savename'],
        })

ok, the work is done, write a little brief, because there are some other things at night. You can try grabbing packets on your cell phone yourself, it's very simple~.

to this article on the use of Python to create a MOOC open course downloader article is introduced to this, more related Python open course downloader content, please search for my previous posts or continue to browse the following related articles I hope you will support me in the future!