I. Database design and operation
1. Analysis of data
The B-site's watch list is at
/x/relation/followings?vmid=UID&pn=1&ps=50&order=desc&order_type=attention
in up to 50 messages on a page.
Let's roughly analyze the information that
{ "code": 0, "message": "0", "ttl": 1, "data": { "list": [{……
First, the list contents exist in data:list.
Second, for each item in the list, there is the following information
"mid": 672353429, "attribute": 2, "mtime": 1630510107, "tag": null, "special": 0, "contract_info": { "is_contractor": false, "ts": 0, "is_contract": false, "user_attr": 0 }, "uname": "Bella Kira.", "face": "/bfs/face/", "sign": "Genki's A-SOUL Dance Stretcher is attending~ Target TOP IDOL, let's go for it!", "official_verify": { "type": 0, "desc": "Artist affiliated with the virtual idol group A-SOUL." }, "vip": { "vipType": 2, "vipDueDate": 1674576000000, "dueRemark": "", "accessStatus": 0, "vipStatus": 1, "vipStatusWarn": "", "themeType": 0, "label": { "path": "", "text": "Grand Member of the Year", "label_theme": "annual_vip", "text_color": "#FFFFFF", "bg_style": 1, "bg_color": "#FB7299", "border_color": "" }, "avatar_subscript": 1, "nickname_color": "#FB7299", "avatar_subscript_url": "/bfs/vip/icon_Certification_big_member_22_3x.png" }
Among them, mid is the user's unique UID, vipType, 0 is nothing open, 1 is a large member, 2 is a large member of the year, and in official_verify, type 0 stands for official authentication, and -1 stands for no official authentication.
Also, we found that if the other party locks the list, it will return the
{"code":-400,"message":"Request error.","ttl":1}
2. Database design
Based on these, we first design the database containing two tables, the basic attribute table for user information and the relationship table for concerns.
def createDB(): link=('') print("database open success") UserTableDDL=''' create table if not exists user( UID int PRIMARY KEY NOT NULL, NAME varchar NOT NULL, SIGN varchar DEFAULT NULL, vipType int NOT NULL, verifyType int NOT NULL, verifyDesc varchar DEFAULT NULL) ''' RelationTableDDL=''' create table if not exists relation( follower int NOT NULL, following int NOT NULL, followTime int NOT NULL, PRIMARY KEY (follower,following), FOREIGN KEY(follower,following) REFERENCES user(UID,UID) ) ''' # create user table (UserTableDDL) # create relation table (RelationTableDDL) print("database create success") () ()
3. Database operation
Next is to insert a list of new users, my idea is to finish crawling a person's follow list, throw a whole list to that function, determine if there is a new user, and if there is, pass the new user back as a starting point for the next crawl.
def insertUser(infos): conn=('') link=() InsertCmd="insert into user (UID,NAME,vipType,verifyType,sign,verifyDesc) values (?,?,?,?,?,?);" ExistCmd="select count(UID) from user where UID='%d';"# % UID newID=[] for info in infos: answer=(ExistCmd%info['uid']) for row in answer: exist_ID=row[0] if exist_ID==0: (info['uid']) (InsertCmd,(info['uid'],info['name'],info['vipType'],info['verifyType'],info['sign'],info['verifyDesc'])) () () return newID
Then there's the function that inserts the relationship, which is relatively simple
def insertFollowing(uid:int,subscribe): conn=('') link=() InsertCmd="insert into relation (follower,following,followTime) values (?,?,?);" for follow in subscribe: (InsertCmd,(uid,follow[0],follow[1])) () ()
II. Crawlers
Through observation, we find that Uncle Rui locks 5 pages of concern lists
Even if it is a manual operation can only visit 5 pages, then there is no way, we will climb 5 pages.
def getFollowingList(uid:int): url="/x/relation/followings?vmid=%d&pn=%d&ps=50&order=desc&order_type=attention&jsonp=jsonp"# % (UID, Page Number) infos=[] subscribe=[] for i in range(1,6): html=(url%(uid,i)) if html.status_code!=200: print("GET ERROR!") text= dic=(text) if dic['code']==-400: break list=dic['data']['list'] for usr in list: info={} info['uid']=usr['mid'] info['name']=usr['uname'] info['vipType']=usr['vip']['vipType'] info['verifyType']=usr['official_verify']['type'] info['sign']=usr['sign'] if info['verifyType']==-1: info['verifyDesc']='NULL' else : info['verifyDesc']=usr['official_verify']['desc'] ((usr['mid'],usr['mtime'])) (info) newID=insertUser(infos) insertFollowing(uid,subscribe) return newID
III. Complete code
#by concyclics # -*- coding:UTF-8 -*- import sqlite3 import json import requests def createDB(): link=('') print("database open success") UserTableDDL=''' create table if not exists user( UID int PRIMARY KEY NOT NULL, NAME varchar NOT NULL, SIGN varchar DEFAULT NULL, vipType int NOT NULL, verifyType int NOT NULL, verifyDesc varchar DEFAULT NULL) ''' RelationTableDDL=''' create table if not exists relation( follower int NOT NULL, following int NOT NULL, followTime int NOT NULL, PRIMARY KEY (follower,following), FOREIGN KEY(follower,following) REFERENCES user(UID,UID) ) ''' # create user table (UserTableDDL) # create relation table (RelationTableDDL) print("database create success") () () def insertUser(infos): conn=('') link=() InsertCmd="insert into user (UID,NAME,vipType,verifyType,sign,verifyDesc) values (?,?,?,?,?,?);" ExistCmd="select count(UID) from user where UID='%d';"# % UID newID=[] for info in infos: answer=(ExistCmd%info['uid']) for row in answer: exist_ID=row[0] if exist_ID==0: (info['uid']) (InsertCmd,(info['uid'],info['name'],info['vipType'],info['verifyType'],info['sign'],info['verifyDesc'])) () () return newID def insertFollowing(uid:int,subscribe): conn=('') link=() InsertCmd="insert into relation (follower,following,followTime) values (?,?,?);" for follow in subscribe: try: (InsertCmd,(uid,follow[0],follow[1])) except: print((uid,follow[0],follow[1])) () () def getFollowingList(uid:int): url="/x/relation/followings?vmid=%d&pn=%d&ps=50&order=desc&order_type=attention&jsonp=jsonp"# % (UID, Page Number) infos=[] subscribe=[] for i in range(1,6): html=(url%(uid,i)) if html.status_code!=200: print("GET ERROR!") return [] text= dic=(text) if dic['code']==-400: return [] try: list=dic['data']['list'] except: return [] for usr in list: info={} info['uid']=usr['mid'] info['name']=usr['uname'] info['vipType']=usr['vip']['vipType'] info['verifyType']=usr['official_verify']['type'] info['sign']=usr['sign'] if info['verifyType']==-1: info['verifyDesc']='NULL' else : info['verifyDesc']=usr['official_verify']['desc'] ((usr['mid'],usr['mtime'])) (info) newID=insertUser(infos) insertFollowing(uid,subscribe) return newID def getFollowingUid(uid:int): url="/x/relation/followings?vmid=%d&pn=%d&ps=50&order=desc&order_type=attention&jsonp=jsonp"# % (UID, Page Number) for i in range(1,6): html=(url%(uid,i)) if html.status_code!=200: print("GET ERROR!") return [] text= dic=(text) if dic['code']==-400: return [] try: list=dic['data']['list'] except: return [] IDs=[] for usr in list: (usr['mid']) return IDs def work(root): IDlist=root tmplist=[] while len(IDlist)!=0: tmplist=[] for ID in IDlist: print(ID) tmplist+=getFollowingList(ID) IDlist=tmplist def rework(): conn=('') link=() SelectCmd="select uid from user;" answer=(SelectCmd) IDs=[] for row in answer: (row[0]) () () newID=[] print(IDs) for ID in IDs: ids=getFollowingUid(ID) for id in ids: if id not in IDs: (id) return newID if __name__=="__main__": createDB() #work([**put root UID here**,])
IV. Project warehouse
/Concyclics/BiliBiliFollowSpider
The above is python crawl B station concern list and database design and operation of the details, more information about python crawl B station concern list please pay attention to my other related articles!