preamble
What is the squid game, I believe that we are not unfamiliar with it, although the blogger said that I have not seen the show, but still a little curious about the Douban comments, just recently learned selenium, as a practice, come on come on, climb climb climb.
Analyze page
Still the same, brothers first open our favorite google browser, click F12, open the crawler happy mode
To get to the page, click on each of the steps as shown below
Then we found that this page is very simple, each comment is wrapped in the class for the short span tag, then you can start writing xpath, as follows
This way a page of comments is obtained, the next step is to change the page
There is a small trick, do not need to write our own xpath, directly with the google browser can generate xpath, as follows
Click on this Copy path so that you get the contents of the button xpath, and then realize the click page on it, well so the analysis is complete, then start writing code.
Important Codes
selenium open douban short review page
# Pages to be opened url = '/subject/34812928/comments?limit=20&status=P&sort=new_score' # Evading intelligent detection option = () # = True option.add_experimental_option('excludeSwitches', ['enable-automation']) option.add_experimental_option('useAutomationExtension', False) driver = (options=option) driver.execute_cdp_cmd('', {'source': '(navigator, "webdriver", {get: () => undefined})' }) #Open page (url)
Getting the content of a comment based on an xpath
Get the xpath statement for the comment here
//span[@class="short"]
Get Comment Code
options = driver.find_elements(, '//span[@class="short"]') for i in options: text=text+
Jump to next page
Next page button xpath
//*[@]/a
Jump Button Click Code
nextpage = driver.find_element(, '//*[@]/a') ()
Full Code
Word Cloud Generator
# -*- codeing = utf-8 -*- # @Time : 2021/10/9 20:54 # @Author : xiaow # @File : # @Software : PyCharm from wordcloud import WordCloud import as image import numpy as np import jieba def trans_CN(text): # Receive split strings word_list = (text) # Spaces between separate individuals after participles result = " ".join(word_list) return result def getWordCloud(text): # print(text) text = trans_CN(text) # Word cloud background image mask = (("E://file//pics//")) wordcloud = WordCloud( mask=mask, # Font style files font_path="C:\Windows\Fonts\", background_color='white' ).generate(text) image_produce = wordcloud.to_image() image_produce.show()
Comment Fetch Code
# -*- codeing = utf-8 -*- # @Time : 2021/6/27 22:29 # @Author : xiaow # @File : # @Software : PyCharm import time from selenium import webdriver from import By from api import wordcloudutil if __name__ == '__main__': url = '/subject/34812928/comments?limit=20&status=P&sort=new_score' # Evading intelligent detection option = () # = True option.add_experimental_option('excludeSwitches', ['enable-automation']) option.add_experimental_option('useAutomationExtension', False) driver = (options=option) driver.execute_cdp_cmd('', {'source': '(navigator, "webdriver", {get: () => undefined})' }) (url) text='' # Get all option elements j=0 while 1: # Locate the new jump page (1) driver.switch_to.window(driver.window_handles[0]) options = driver.find_elements(, '//span[@class="short"]') for i in options: text=text+ (2) nextpage = driver.find_element(, '//*[@]/a') () j=j+1 if j>10: break print(text) (text)
results-based
The final crawled comments generated a word cloud map as shown below
And that's the end of it. It's still easy.
To this point this small exercise on python to climb the squid game evaluation to generate word cloud article is introduced to this, more related Python crawl squid game content please search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!