SoFunction
Updated on 2024-11-13

Python implementation of drawing custom shaped word clouds example

preamble

This article is divided into general structure, with specific needs can check the corresponding board of the previous part of the sub-structure, the final summary do not understand can be accessed in the sub-panel explanation. Sub-boards are cited in the library, read the text, sub-word and set the stop words, set the png mask, font settings, generate word cloud, thank you for opening this share, good luck.

I. Referenced libraries

from wordcloud import WordCloud, STOPWORDS
import  as plt
import numpy as np
import  as pseg
from collections import Counter
import  as Image
from matplotlib import colors

Ensure that the above libraries are installed, otherwise the runtime will report errors

#Installation library can use Tsinghua's mirror site (may be updated, you can check the address on the official website)

pip install -i /simple some-package

Second, read the text (let python read)

# Read the text (here, the settings are based on the specific location where the text is located)
text = open("", encoding="utf-8").read()  
words = (text)

Here " ", set according to the specific location of the text

III. Splitting words and setting stop words

# Extract words by specified length and lexicality
report_words = []
for word, flag in words:
    if (len(word) >= 2) and ('n' in flag): #Set the number of words to count here
        report_words.append(word)
# Setting up stop words
stopwords = set(STOPWORDS)
(["the", "Thank you.", "I represent", "Above.", "Report.", "Expresses its sincere gratitude","Strategy"])
# Remove stop words
report_words = [word for word in report_words if word not in stopwords]
# Statistical high-frequency vocabulary
result = Counter(report_words).most_common(200) # of words
# Establishment of vocabulary dictionaries
content = dict(result)
# Output word frequency statistics
for i in range(50):
    word,flag=result[i]
    print("{0:<10}{1:>5}".format(word,flag))
 

len (word) is set to the length of the word, want to extract two two words set 2, three three words set 3 (and so on)

result = Counter(report_words).most_common(200) The 200 here means that 200 words are counted for plotting, which can be set as required

IV. Setting the png mask

#Set the png mask (replace it according to the actual path)
background = ("")
mask = (background)

Replacement according to the actual path

If the output structure is still rectangular (square), there should be a "problem" with the png image, you can try the following processing

1. change the image to solid black with p-picture software (maybe something else will work, I haven't tried)

2. Use the following code to change the white background to transparent

# If the current bit depth is 32, you don't need to write the line about converting to RGBA mode, but there's nothing wrong with writing it.
# Conversion from RGB (24-bit) mode to RGBA (32-bit) mode
img = ("").convert('RGBA')
W, L = 
white_pixel = (0, 0, 0, 0)  # White
for h in range(W):
    for i in range(L):
        if ((h, i)) == white_pixel:
            ((h, i), (255, 255, 255, 0))  # Setting Transparency
("yourfile_new.png")  # Set your own save address

There are two parameters that need to be changed here

Replace according to the actual path, yourfile_new.png (this is the modified image) Replace according to the actual path

V. Font Settings

# Set the font style path
font_path = r"C:\Windows\Fonts\"
# Set the font size
max_font_size =200
min_font_size =10
# Create color arrays with the ability to change colors
color_list = ['#FF274B']
# Call the color array
colormap = (color_list)

Font styles: generally in this path, you can modify or download the desired fonts according to their own needs

Font size: maximum and minimum on request

Font color: you can leave this line of code out (the default setting), or you can set one or more colors as needed (I've only set one here)

VI. Generate word cloud map

# Generate word clouds
wordcloud = WordCloud(scale=4,                         # Output clarity
                      font_path=font_path,             # Output path
                      colormap=colormap,               #Font color
                      width=1600,                      # Output image width
                      height=900,                      # Output image height
                      background_color='white',        #Image background color
                      stopwords=stopwords,             # Discontinued words
                      mask=mask,                       #mask
                      max_font_size=max_font_size,     # Maximum font size
                      min_font_size=min_font_size)     # Minimum font size
wordcloud.generate_from_frequencies(content)
# Use matplotlib to display word clouds
(wordcloud, interpolation='bilinear')
('off')
()
# Save the word cloud map
wordcloud.to_file("")

If the previous parameters were set up as I did, just copy and paste them here

summarize

from wordcloud import WordCloud, STOPWORDS
import  as plt
import numpy as np
import  as pseg
from collections import Counter
import  as Image
from matplotlib import colors
# Read the text (here, the settings are based on the specific location where the text is located)
text = open("", encoding="utf-8").read()
words = (text)
# Extract words by specified length and lexicality
report_words = []
for word, flag in words:
    if (len(word) >= 2) and ('n' in flag): #Set the number of words to count here
        report_words.append(word)
# Statistical high-frequency vocabulary
result = Counter(report_words).most_common(200) # of words
# Establishment of vocabulary dictionaries
content = dict(result)
# Output word frequency statistics
for i in range(50):
    word,flag=result[i]
    print("{0:<10}{1:>5}".format(word,flag))
# Setting up stop words
stopwords = set(STOPWORDS)
(["the", "Thank you.", "I represent", "Above.", "Report.", "Expresses its sincere gratitude","Strategy"])
#Set the png mask (replace it according to the actual path)
background = ("").convert('RGB')
mask = (background)
'''
# If the current bit depth is 32, you don't need to write the line about converting to RGBA mode, but there's nothing wrong with writing it.
# Conversion from RGB (24-bit) mode to RGBA (32-bit) mode
img = ("").convert('RGBA')
W, L = 
white_pixel = (0, 0, 0, 0)  # White
for h in range(W):
    for i in range(L):
        if ((h, i)) == white_pixel:
            ((h, i), (255, 255, 255, 0))  # Setting Transparency
("yourfile_new.png")  # Set your own save address
'''
# Set the font style path
font_path = r"C:\Windows\Fonts\"
# Set the font size
max_font_size =200
min_font_size =10
# Create color arrays with the ability to change colors
color_list = ['#FF274B']
# Call the color array
colormap = (color_list)
# Generate word clouds
wordcloud = WordCloud(scale=4,                         # Output clarity
                      font_path=font_path,             # Output path
                      colormap=colormap,               #Font color
                      width=1600,                      # Output image width
                      height=900,                      # Output image height
                      background_color='white',        #Image background color
                      stopwords=stopwords,             # Discontinued words
                      mask=mask,                       #mask
                      max_font_size=max_font_size,     # Maximum font size
                      min_font_size=min_font_size)     # Minimum font size
wordcloud.generate_from_frequencies(content)
# Use matplotlib to display word clouds
(wordcloud, interpolation='bilinear')
('off')
()
# Save the word cloud map
wordcloud.to_file("")

Generate Example

To this point, this article on Python to draw a custom shape of the word cloud example of the article is introduced to this, more related to Python to draw the word cloud content, please search for my previous articles or continue to browse the following related articles I hope you will support me in the future more!