Crawling a novel asynchronously with asyncio+aiohttp
reasoning
It involves reading and writing asynchronous filesaiofiles
, at the same time when sending a request involves converting the dictionary to a string, and when accepting a response will be converted to a dictionary, so this involves thejson
libraries, and in requesting a download link to thecid
cap (a poem)title
The response to a request is obtained synchronously, so it is a synchronous operation using therequests
storehouse
coding
import requests import aiohttp import asyncio import json import aiofiles # url = '/api/pc/getCatalog?data={%22book_id%22:%224306063500%22}' # bookid = '/api/pc/getChapterContent?data={%22book_id%22:%224306063500%22,%22cid%22:%224306063500|1569782244%22,%22need_bookinfo%22:1' async def downloadNovel(cid,title,bid): data2 = { "book_id": bid, "cid": f"{bid}|{cid}", "need_bookinfo": 1 } # Convert dictionaries to strings data2 = (data2) # Create request links bookurl = f'/api/pc/getChapterContent?data={data2}' # I keep forgetting the parentheses here () # async with () as session: async with (bookurl) as resp: # Waiting for results to be returned dic = await () # Set encoding format encoding='utf-8' async with (f'./articles/{title}',mode = 'w',encoding='utf-8') as f: # Asynchronously write content to a file await (dic['data']['novel']['content']) async def getCataList(url): # Synchronized crawling of all chapter-related information resp = (url) # Convert the returned characters into dictionary form dic = () # print(dic) # Create an empty object for storing asynchronous tasks tasks = [] # Loop to create asynchronous tasks and add them to tasks for item in dic['data']['novel']['items']: title = item['title'] cid = item['cid'] (asyncio.create_task(downloadNovel(cid,title,bid))) print(title,cid) # Perform asynchronous tasks await (tasks) if __name__ == '__main__': bid = "4306063500" url = '/api/pc/getCatalog?data={"book_id":"' + bid + '"}' print(url) (getCataList(url))
The effect is as follows
Data Crawl Successful
The above is Python concurrent asynchronous crawling data (asyncio+aiohttp) example of the details, more information about Python concurrent asynchronous crawling data please pay attention to my other related articles!