preamble
This article is divided into general structure, with specific needs can check the corresponding board of the previous part of the sub-structure, the final summary do not understand can be accessed in the sub-panel explanation. Sub-boards are cited in the library, read the text, sub-word and set the stop words, set the png mask, font settings, generate word cloud, thank you for opening this share, good luck.
I. Referenced libraries
from wordcloud import WordCloud, STOPWORDS import as plt import numpy as np import as pseg from collections import Counter import as Image from matplotlib import colors
Ensure that the above libraries are installed, otherwise the runtime will report errors
#Installation library can use Tsinghua's mirror site (may be updated, you can check the address on the official website)
pip install -i /simple some-package
Second, read the text (let python read)
# Read the text (here, the settings are based on the specific location where the text is located) text = open("", encoding="utf-8").read() words = (text)
Here " ", set according to the specific location of the text
III. Splitting words and setting stop words
# Extract words by specified length and lexicality report_words = [] for word, flag in words: if (len(word) >= 2) and ('n' in flag): #Set the number of words to count here report_words.append(word) # Setting up stop words stopwords = set(STOPWORDS) (["the", "Thank you.", "I represent", "Above.", "Report.", "Expresses its sincere gratitude","Strategy"]) # Remove stop words report_words = [word for word in report_words if word not in stopwords] # Statistical high-frequency vocabulary result = Counter(report_words).most_common(200) # of words # Establishment of vocabulary dictionaries content = dict(result) # Output word frequency statistics for i in range(50): word,flag=result[i] print("{0:<10}{1:>5}".format(word,flag))
len (word) is set to the length of the word, want to extract two two words set 2, three three words set 3 (and so on)
result = Counter(report_words).most_common(200) The 200 here means that 200 words are counted for plotting, which can be set as required
IV. Setting the png mask
#Set the png mask (replace it according to the actual path) background = ("") mask = (background)
Replacement according to the actual path
If the output structure is still rectangular (square), there should be a "problem" with the png image, you can try the following processing
1. change the image to solid black with p-picture software (maybe something else will work, I haven't tried)
2. Use the following code to change the white background to transparent
# If the current bit depth is 32, you don't need to write the line about converting to RGBA mode, but there's nothing wrong with writing it. # Conversion from RGB (24-bit) mode to RGBA (32-bit) mode img = ("").convert('RGBA') W, L = white_pixel = (0, 0, 0, 0) # White for h in range(W): for i in range(L): if ((h, i)) == white_pixel: ((h, i), (255, 255, 255, 0)) # Setting Transparency ("yourfile_new.png") # Set your own save address
There are two parameters that need to be changed here
Replace according to the actual path, yourfile_new.png (this is the modified image) Replace according to the actual path
V. Font Settings
# Set the font style path font_path = r"C:\Windows\Fonts\" # Set the font size max_font_size =200 min_font_size =10 # Create color arrays with the ability to change colors color_list = ['#FF274B'] # Call the color array colormap = (color_list)
Font styles: generally in this path, you can modify or download the desired fonts according to their own needs
Font size: maximum and minimum on request
Font color: you can leave this line of code out (the default setting), or you can set one or more colors as needed (I've only set one here)
VI. Generate word cloud map
# Generate word clouds wordcloud = WordCloud(scale=4, # Output clarity font_path=font_path, # Output path colormap=colormap, #Font color width=1600, # Output image width height=900, # Output image height background_color='white', #Image background color stopwords=stopwords, # Discontinued words mask=mask, #mask max_font_size=max_font_size, # Maximum font size min_font_size=min_font_size) # Minimum font size wordcloud.generate_from_frequencies(content) # Use matplotlib to display word clouds (wordcloud, interpolation='bilinear') ('off') () # Save the word cloud map wordcloud.to_file("")
If the previous parameters were set up as I did, just copy and paste them here
summarize
from wordcloud import WordCloud, STOPWORDS import as plt import numpy as np import as pseg from collections import Counter import as Image from matplotlib import colors # Read the text (here, the settings are based on the specific location where the text is located) text = open("", encoding="utf-8").read() words = (text) # Extract words by specified length and lexicality report_words = [] for word, flag in words: if (len(word) >= 2) and ('n' in flag): #Set the number of words to count here report_words.append(word) # Statistical high-frequency vocabulary result = Counter(report_words).most_common(200) # of words # Establishment of vocabulary dictionaries content = dict(result) # Output word frequency statistics for i in range(50): word,flag=result[i] print("{0:<10}{1:>5}".format(word,flag)) # Setting up stop words stopwords = set(STOPWORDS) (["the", "Thank you.", "I represent", "Above.", "Report.", "Expresses its sincere gratitude","Strategy"]) #Set the png mask (replace it according to the actual path) background = ("").convert('RGB') mask = (background) ''' # If the current bit depth is 32, you don't need to write the line about converting to RGBA mode, but there's nothing wrong with writing it. # Conversion from RGB (24-bit) mode to RGBA (32-bit) mode img = ("").convert('RGBA') W, L = white_pixel = (0, 0, 0, 0) # White for h in range(W): for i in range(L): if ((h, i)) == white_pixel: ((h, i), (255, 255, 255, 0)) # Setting Transparency ("yourfile_new.png") # Set your own save address ''' # Set the font style path font_path = r"C:\Windows\Fonts\" # Set the font size max_font_size =200 min_font_size =10 # Create color arrays with the ability to change colors color_list = ['#FF274B'] # Call the color array colormap = (color_list) # Generate word clouds wordcloud = WordCloud(scale=4, # Output clarity font_path=font_path, # Output path colormap=colormap, #Font color width=1600, # Output image width height=900, # Output image height background_color='white', #Image background color stopwords=stopwords, # Discontinued words mask=mask, #mask max_font_size=max_font_size, # Maximum font size min_font_size=min_font_size) # Minimum font size wordcloud.generate_from_frequencies(content) # Use matplotlib to display word clouds (wordcloud, interpolation='bilinear') ('off') () # Save the word cloud map wordcloud.to_file("")
Generate Example
To this point, this article on Python to draw a custom shape of the word cloud example of the article is introduced to this, more related to Python to draw the word cloud content, please search for my previous articles or continue to browse the following related articles I hope you will support me in the future more!