SoFunction
Updated on 2024-11-18

python crawling B site follow list and the design and operation of the database

I. Database design and operation

1. Analysis of data

The B-site's watch list is at

/x/relation/followings?vmid=UID&pn=1&ps=50&order=desc&order_type=attention

in up to 50 messages on a page.

Let's roughly analyze the information that

{
	"code": 0,
	"message": "0",
	"ttl": 1,
	"data": {
		"list": [{……

First, the list contents exist in data:list.

Second, for each item in the list, there is the following information

			"mid": 672353429,
			"attribute": 2,
			"mtime": 1630510107,
			"tag": null,
			"special": 0,
			"contract_info": {
				"is_contractor": false,
				"ts": 0,
				"is_contract": false,
				"user_attr": 0
			},
			"uname": "Bella Kira.",
			"face": "/bfs/face/",
			"sign": "Genki's A-SOUL Dance Stretcher is attending~ Target TOP IDOL, let's go for it!",
			"official_verify": {
				"type": 0,
				"desc": "Artist affiliated with the virtual idol group A-SOUL."
			},
			"vip": {
				"vipType": 2,
				"vipDueDate": 1674576000000,
				"dueRemark": "",
				"accessStatus": 0,
				"vipStatus": 1,
				"vipStatusWarn": "",
				"themeType": 0,
				"label": {
					"path": "",
					"text": "Grand Member of the Year",
					"label_theme": "annual_vip",
					"text_color": "#FFFFFF",
					"bg_style": 1,
					"bg_color": "#FB7299",
					"border_color": ""
				},
				"avatar_subscript": 1,
				"nickname_color": "#FB7299",
				"avatar_subscript_url": "/bfs/vip/icon_Certification_big_member_22_3x.png"
			}

Among them, mid is the user's unique UID, vipType, 0 is nothing open, 1 is a large member, 2 is a large member of the year, and in official_verify, type 0 stands for official authentication, and -1 stands for no official authentication.

Also, we found that if the other party locks the list, it will return the

{"code":-400,"message":"Request error.","ttl":1}

2. Database design

Based on these, we first design the database containing two tables, the basic attribute table for user information and the relationship table for concerns.

def createDB():
    link=('')
    print("database open success")
    UserTableDDL='''
                create table if not exists user(
                UID int PRIMARY KEY     NOT NULL,
                NAME varchar            NOT NULL,
                SIGN varchar            DEFAULT NULL,
                vipType int             NOT NULL,
                verifyType int          NOT NULL,
                verifyDesc varchar      DEFAULT NULL)
                '''
    RelationTableDDL='''
                create table if not exists relation(
                follower int           NOT NULL,
                following int          NOT NULL,
                followTime int         NOT NULL,
                PRIMARY KEY (follower,following),
                FOREIGN KEY(follower,following) REFERENCES user(UID,UID)
                )
                '''
    # create user table
    (UserTableDDL)
    # create relation table
    (RelationTableDDL)
    print("database create success")
    ()
    ()

3. Database operation

Next is to insert a list of new users, my idea is to finish crawling a person's follow list, throw a whole list to that function, determine if there is a new user, and if there is, pass the new user back as a starting point for the next crawl.

def insertUser(infos):
    conn=('')
    link=()
    InsertCmd="insert into user (UID,NAME,vipType,verifyType,sign,verifyDesc) values (?,?,?,?,?,?);"
    ExistCmd="select count(UID) from user where UID='%d';"# % UID
    newID=[]
    for info in infos:
        answer=(ExistCmd%info['uid'])
        for row in answer:
            exist_ID=row[0]
        if exist_ID==0:
            (info['uid'])
            (InsertCmd,(info['uid'],info['name'],info['vipType'],info['verifyType'],info['sign'],info['verifyDesc']))
    ()
    ()
    return newID

Then there's the function that inserts the relationship, which is relatively simple

def insertFollowing(uid:int,subscribe):
    conn=('')
    link=()
    InsertCmd="insert into relation (follower,following,followTime) values (?,?,?);"
    for follow in subscribe:
        (InsertCmd,(uid,follow[0],follow[1]))
    ()
    ()
 

II. Crawlers

Through observation, we find that Uncle Rui locks 5 pages of concern lists

Even if it is a manual operation can only visit 5 pages, then there is no way, we will climb 5 pages.

def getFollowingList(uid:int):
    url="/x/relation/followings?vmid=%d&pn=%d&ps=50&order=desc&order_type=attention&jsonp=jsonp"# % (UID, Page Number)
    infos=[]
    subscribe=[]
    for i in range(1,6):
        html=(url%(uid,i))
        if html.status_code!=200:
            print("GET ERROR!")
        text=
        dic=(text)
        if dic['code']==-400:
            break
        list=dic['data']['list']
        for usr in list:
            info={}
            info['uid']=usr['mid']
            info['name']=usr['uname']
            info['vipType']=usr['vip']['vipType']
            info['verifyType']=usr['official_verify']['type']
            info['sign']=usr['sign']
            if info['verifyType']==-1:
                info['verifyDesc']='NULL'
            else :
                info['verifyDesc']=usr['official_verify']['desc']
            ((usr['mid'],usr['mtime']))
            (info)
    newID=insertUser(infos)
    insertFollowing(uid,subscribe)
    return newID

III. Complete code

#by concyclics
# -*- coding:UTF-8 -*-
import sqlite3
import json
import requests
def createDB():
    link=('')
    print("database open success")
    UserTableDDL='''
                create table if not exists user(
                UID int PRIMARY KEY     NOT NULL,
                NAME varchar            NOT NULL,
                SIGN varchar            DEFAULT NULL,
                vipType int             NOT NULL,
                verifyType int          NOT NULL,
                verifyDesc varchar      DEFAULT NULL)
                '''
    RelationTableDDL='''
                create table if not exists relation(
                follower int           NOT NULL,
                following int          NOT NULL,
                followTime int         NOT NULL,
                PRIMARY KEY (follower,following),
                FOREIGN KEY(follower,following) REFERENCES user(UID,UID)
                )
                '''
    # create user table
    (UserTableDDL)
    # create relation table
    (RelationTableDDL)
    print("database create success")
    ()
    ()
def insertUser(infos):
    conn=('')
    link=()
    InsertCmd="insert into user (UID,NAME,vipType,verifyType,sign,verifyDesc) values (?,?,?,?,?,?);"
    ExistCmd="select count(UID) from user where UID='%d';"# % UID
    newID=[]
    for info in infos:
        answer=(ExistCmd%info['uid'])
        for row in answer:
            exist_ID=row[0]
        if exist_ID==0:
            (info['uid'])
            (InsertCmd,(info['uid'],info['name'],info['vipType'],info['verifyType'],info['sign'],info['verifyDesc']))
    ()
    ()
    return newID
def insertFollowing(uid:int,subscribe):
    conn=('')
    link=()
    InsertCmd="insert into relation (follower,following,followTime) values (?,?,?);"
    for follow in subscribe:
        try:
            (InsertCmd,(uid,follow[0],follow[1]))
        except:
            print((uid,follow[0],follow[1]))
    ()
    ()
def getFollowingList(uid:int):
    url="/x/relation/followings?vmid=%d&pn=%d&ps=50&order=desc&order_type=attention&jsonp=jsonp"# % (UID, Page Number)
    infos=[]
    subscribe=[]
    for i in range(1,6):
        html=(url%(uid,i))
        if html.status_code!=200:
            print("GET ERROR!")
            return []
        text=
        dic=(text)
        if dic['code']==-400:
            return []
        try:
            list=dic['data']['list']
        except:
            return []
        for usr in list:
            info={}
            info['uid']=usr['mid']
            info['name']=usr['uname']
            info['vipType']=usr['vip']['vipType']
            info['verifyType']=usr['official_verify']['type']
            info['sign']=usr['sign']
            if info['verifyType']==-1:
                info['verifyDesc']='NULL'
            else :
                info['verifyDesc']=usr['official_verify']['desc']
            ((usr['mid'],usr['mtime']))
            (info)
    newID=insertUser(infos)
    insertFollowing(uid,subscribe)
    return newID
def getFollowingUid(uid:int):
    url="/x/relation/followings?vmid=%d&pn=%d&ps=50&order=desc&order_type=attention&jsonp=jsonp"# % (UID, Page Number)
    for i in range(1,6):
        html=(url%(uid,i))
        if html.status_code!=200:
            print("GET ERROR!")
            return []
        text=
        dic=(text)
        if dic['code']==-400:
            return []
        try:
            list=dic['data']['list']
        except:
            return []
        IDs=[]
        for usr in list:
            (usr['mid'])
        return IDs
def work(root):
    IDlist=root
    tmplist=[]
    while len(IDlist)!=0:
        tmplist=[]
        for ID in IDlist:
            print(ID)
            tmplist+=getFollowingList(ID)
        IDlist=tmplist
def rework():
    conn=('')
    link=()
    SelectCmd="select uid from user;"
    answer=(SelectCmd)
    IDs=[]
    for row in answer:
        (row[0])
    ()
    ()
    newID=[]
    print(IDs)
    for ID in IDs:
        ids=getFollowingUid(ID)
        for id in ids:
            if id not in IDs:
                (id)
    return newID
if __name__=="__main__":
    createDB()
    #work([**put root UID here**,])

IV. Project warehouse

/Concyclics/BiliBiliFollowSpider

The above is python crawl B station concern list and database design and operation of the details, more information about python crawl B station concern list please pay attention to my other related articles!