SoFunction
Updated on 2024-11-12

python mini-exercise of crawling squid game evaluation to generate word cloud

preamble

What is the squid game, I believe that we are not unfamiliar with it, although the blogger said that I have not seen the show, but still a little curious about the Douban comments, just recently learned selenium, as a practice, come on come on, climb climb climb.

在这里插入图片描述

Analyze page

Still the same, brothers first open our favorite google browser, click F12, open the crawler happy mode
To get to the page, click on each of the steps as shown below

在这里插入图片描述

Then we found that this page is very simple, each comment is wrapped in the class for the short span tag, then you can start writing xpath, as follows

在这里插入图片描述

This way a page of comments is obtained, the next step is to change the page
There is a small trick, do not need to write our own xpath, directly with the google browser can generate xpath, as follows

在这里插入图片描述

Click on this Copy path so that you get the contents of the button xpath, and then realize the click page on it, well so the analysis is complete, then start writing code.

在这里插入图片描述

Important Codes

selenium open douban short review page

# Pages to be opened
    url = '/subject/34812928/comments?limit=20&status=P&sort=new_score'
    # Evading intelligent detection
    option = ()
    #  = True
    option.add_experimental_option('excludeSwitches', ['enable-automation'])
    option.add_experimental_option('useAutomationExtension', False)
    driver = (options=option)
    driver.execute_cdp_cmd('',
                           {'source': '(navigator, "webdriver", {get: () => undefined})'
                            })
    #Open page
    (url)

Getting the content of a comment based on an xpath

Get the xpath statement for the comment here

//span[@class="short"]

Get Comment Code

        options = driver.find_elements(, '//span[@class="short"]')
        for i in options:
           text=text+

Jump to next page

Next page button xpath

//*[@]/a

Jump Button Click Code

        nextpage = driver.find_element(, '//*[@]/a')
        ()

Full Code

Word Cloud Generator

# -*- codeing = utf-8 -*-
# @Time : 2021/10/9 20:54
# @Author : xiaow
# @File : 
# @Software : PyCharm


from wordcloud import WordCloud
import  as image
import numpy as np

import jieba


def trans_CN(text):
    # Receive split strings
    word_list = (text)
    # Spaces between separate individuals after participles
    result = " ".join(word_list)
    return result


def getWordCloud(text):
    # print(text)
    text = trans_CN(text)
    # Word cloud background image
    mask = (("E://file//pics//"))
    wordcloud = WordCloud(
        mask=mask,
        # Font style files
        font_path="C:\Windows\Fonts\",
        background_color='white'
    ).generate(text)
    image_produce = wordcloud.to_image()
    image_produce.show()

Comment Fetch Code

# -*- codeing = utf-8 -*-
# @Time : 2021/6/27 22:29
# @Author : xiaow
# @File : 
# @Software : PyCharm
import time

from selenium import webdriver
from  import By
from api import wordcloudutil
if __name__ == '__main__':
    url = '/subject/34812928/comments?limit=20&status=P&sort=new_score'
    # Evading intelligent detection
    option = ()
    #  = True
    option.add_experimental_option('excludeSwitches', ['enable-automation'])
    option.add_experimental_option('useAutomationExtension', False)
    driver = (options=option)
    driver.execute_cdp_cmd('',
                           {'source': '(navigator, "webdriver", {get: () => undefined})'
                            })
    (url)
    text=''
    # Get all option elements
    j=0
    while 1:
        # Locate the new jump page
        (1)
        driver.switch_to.window(driver.window_handles[0])

        options = driver.find_elements(, '//span[@class="short"]')
        for i in options:
           text=text+
        (2)
        nextpage = driver.find_element(, '//*[@]/a')
        ()
        j=j+1
        if j>10:
            break
    print(text)
    (text)

results-based

The final crawled comments generated a word cloud map as shown below

在这里插入图片描述

And that's the end of it. It's still easy.

To this point this small exercise on python to climb the squid game evaluation to generate word cloud article is introduced to this, more related Python crawl squid game content please search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!