SoFunction
Updated on 2024-11-19

Python Concurrent Asynchronous Crawling Data (asyncio+aiohttp) Example

Crawling a novel asynchronously with asyncio+aiohttp

reasoning

It involves reading and writing asynchronous filesaiofiles, at the same time when sending a request involves converting the dictionary to a string, and when accepting a response will be converted to a dictionary, so this involves thejsonlibraries, and in requesting a download link to thecidcap (a poem)titleThe response to a request is obtained synchronously, so it is a synchronous operation using therequestsstorehouse

coding

import requests
import aiohttp
import asyncio
import json
import aiofiles
# url = '/api/pc/getCatalog?data={%22book_id%22:%224306063500%22}'
# bookid = '/api/pc/getChapterContent?data={%22book_id%22:%224306063500%22,%22cid%22:%224306063500|1569782244%22,%22need_bookinfo%22:1'
async def downloadNovel(cid,title,bid):
    data2 = {
        "book_id": bid,
        "cid": f"{bid}|{cid}",
        "need_bookinfo": 1
    }
    # Convert dictionaries to strings
    data2 = (data2)
    # Create request links
    bookurl = f'/api/pc/getChapterContent?data={data2}'
    # I keep forgetting the parentheses here () #
    async with () as session:
        async with (bookurl) as resp:
            # Waiting for results to be returned
            dic = await ()
            # Set encoding format encoding='utf-8'
            async with (f'./articles/{title}',mode = 'w',encoding='utf-8') as f:
                # Asynchronously write content to a file
                await (dic['data']['novel']['content'])
async def getCataList(url):
    # Synchronized crawling of all chapter-related information
    resp = (url)
    # Convert the returned characters into dictionary form
    dic = ()
    # print(dic)
    # Create an empty object for storing asynchronous tasks
    tasks = []
    # Loop to create asynchronous tasks and add them to tasks
    for item in dic['data']['novel']['items']:
        title = item['title']
        cid = item['cid']
        (asyncio.create_task(downloadNovel(cid,title,bid)))
        print(title,cid)
        # Perform asynchronous tasks
    await (tasks)
if __name__ == '__main__':
    bid = "4306063500"
    url = '/api/pc/getCatalog?data={"book_id":"' + bid + '"}'
    print(url)
    (getCataList(url))

The effect is as follows

Data Crawl Successful

The above is Python concurrent asynchronous crawling data (asyncio+aiohttp) example of the details, more information about Python concurrent asynchronous crawling data please pay attention to my other related articles!