SoFunction
Updated on 2024-12-15

Generate tag cloud code analysis based on python3

This article introduces the generation of python3-based tag cloud code analysis, the text of the sample code through the introduction of the very detailed, for everyone's learning or work has a certain reference to learning value, you can refer to the following

Label cloud is now a favorite way to use a presentation inside the big data, which can also be achieved under python3 label cloud effect, paste the following:

------------------- Go to text ---------------------

Start by installing the following libraries:

#!/usr/bin/python3.4
# -*- coding: utf-8 -*-
# /~gohlke/pythonlibs/#cx_freeze
# universal repository download pygame
# pip3downloadingsimplejson

And then there's the all-important library:

pip3 install pytagcloud

Or go to the official website and download it:

/pypi/pytagcloud/

Once installed, use the example on the official website to do so:

from pytagcloud import create_tag_image, make_tags
from  import get_tag_counts

YOUR_TEXT = "A tag cloud is a visual representation for text data, typically\
used to depict keyword metadata on websites, or to visualize free form text."

tags = make_tags(get_tag_counts(YOUR_TEXT), maxsize=120)

create_tag_image(tags, 'cloud_large.png', size=(900, 600), fontname='Lobster')

Decidedly reporting errors:

Traceback (most recent call last):
 File "D:/code/pythonwork/", line 96, in <module>
  tags = make_tags(get_tag_counts(YOUR_TEXT), maxsize=120)
 File "C:\Python34\lib\site-packages\pytagcloud\lang\", line 25, in get_tag_counts
  return sorted((), key=itemgetter(1), reverse=True)
AttributeError: 'dict' object has no attribute 'iteritems'

Looked and realized the problem was in the library:

# 
return sorted((), key=itemgetter(1), reverse=True)

It turns out that python 3.4 doesn't support the write method:

In this case, items( ) is used to return a copy of the list of all items (key/value pairs) in D, which takes up extra memory.

iteritems() is used to return an iterator on all items (key/value pairs) in D, which takes no extra memory.

In Python, the iteritems() and viewitems() methods are deprecated, and items() gives you the same result as viewitems(). Replacing iteritems() with items() in Python can be used to loop through for.

But when I switched to:

# 
return sorted((), key=itemgetter(1), reverse=True)

Found that there is no error running, but did not generate the label cloud ah, over and over again to print out, finally found the problem:

	
from pytagcloud import create_tag_image

This one is for generating a tuple of things:

# counts =[('cloud', 3),
# ('words', 2),
# ('code', 1),
# ('word', 1),
# ('appear', 1)]

But items() in python3 won't achieve this, so I'll just write my own.

Reads a txt file and divides each line into elements of an array according to spaces:

arr = []
 file = open('../tagcloud/tag_file.txt', 'r')
 data = ().split('\r\n')
 for content in data:
  contents = validatecontent(content).split()
  for word in contents:
    (word)
['BAISC', 'Python', 'BASICA', 'GVBASIC', 'GWBASIC', 'Python', 'ETBASIC', 'QBASIC', 'Quick', 'Basic', 'Turbo', 'Basic', 'True', 'Python', 'java', 'Basic', 'Visual', 'Basic', 'Visual', 'Basic', 'Net', 'Power', 'Basic', 'Python', 'java', 'SQL', 'VB', 'Small', 'Basic', 'Free', 'Basic', 'DarkBASIC', 'VBScript', 'Visual', 'Basic', 'For', 'ApplicationsVBA', 'REALbasic', 'C', 'C', 'Turbo', 'C', 'Python', 'java', 'SQL', 'VB', 'PHP', 'HTML', 'Borland', 'C', 'C', 'Builder', 'CCLI', 'Python', 'java', 'ObjectiveC', 'C#', 'Microsoft', 'Visual', 'C', 'Pascal', 'Delphi', 'Turbo', 'Python', 'java', 'SQL', 'VB', 'PHP', 'HTML', 'Pascal', 'Object', 'Pascal', 'Free', 'Pascal', 'Lazarus', 'FORTRAN', 'MATLAB', 'Scilab', 'GNU', 'Octave', 'R', 'SPlus', 'Mathematica', 'Maple', 'Python', 'java', 'SQL', 'VB', 'PHP', 'HTML', 'Julia', 'xBaseClipper', 'Visual', 'FoxPro', 'SQLPLSQL', 'TSQL', 'SQLPSM', 'LINQ', 'Xquer', 'Lua', 'Python', 'java', 'SQL', 'VB', 'Perl', 'PHP', 'Python', 'Ruby', 'ASP', 'JSP', 'TclTk', 'VBScript', 'AppleScript', 'AAuto', 'ActionScript', 'DMDScript', 'ECMAScript', 'JavaScript', 'JScript', 'TypeScript', 'sh', 'bash', 'Python', 'java', 'SQL', 'VB', 'PHP', 'HTML', 'sed', 'awk', 'PowerShell', 'csh', 'tcsh', 'ksh', 'zsh', 'XMLSVG', 'XML', 'Schema', 'Python', 'java', 'XSLT', 'XHTML', 'MathML', 'XAML', 'SSML', 'SGML', 'HTML', 'Python', 'java', 'SQL', 'VB', 'Curl', 'SVG', 'XML', 'Schema', 'XSLT', 'XHTML', 'MathML', 'XAML', 'SSML', 'Java', 'Jython', 'JRuby', 'JScheme', 'Groovy', 'Kawa', 'Scala', 'Clojure', 'ALGOL', 'APLJ', 'Ada', 'Falcon', 'Forth', 'Io', 'MUMPS', 'PLI', 'PostScript', 'REXX', 'SAC', 'Self', 'Simula', 'Swift', 'IronPython', 'IronRuby', 'COBOL', 'Python', 'java', 'SQL', 'VB', 'PHP', 'HTML']

where validatecontent is the function that starts with illegal characters:

# Remove illegal characters from content (Windows)
def validatecontent(content):
  # '/\:*?"<>|'
  rstr = r"[\/\\\:\*\?\"\<\>\|\.\*\+\-\(\)\"\'\(\)\!\?\“\”\,\。\;\:\{\}\{\}\=\%\*\~\·]"
  new_content = (rstr, "", content)
  return new_content

Do a count on each element:

from collections import Counter
counts = Counter(arr).items()
print(counts)

The effect is out:

dict_items([('For', 1), ('SQL', 8), ('JRuby', 1), ('Builder', 1), ('HTML', 6), ('LINQ', 1), ('BAISC', 1), ('BASICA', 1), ('PHP', 6), ('Octave', 1), ('csh', 1), ('PostScript', 1), ('awk', 1), ('Ruby', 1), ('AppleScript', 1), ('Object', 1), ('java', 11), ('TclTk', 1), ('Xquer', 1), ('ksh', 1), ('zsh', 1), ('ETBASIC', 1), ('AAuto', 1), ('Borland', 1), ('SVG', 1), ('Jython', 1), ('Simula', 1), ('IronPython', 1), ('Python', 14), ('Microsoft', 1), ('ActionScript', 1), ('XHTML', 2), ('REXX', 1), ('COBOL', 1), ('Scilab', 1), ('Ada', 1), ('Basic', 9), ('GVBASIC', 1), ('ECMAScript', 1), ('TypeScript', 1), ('Falcon', 1), ('Clojure', 1), ('ASP', 1), ('ALGOL', 1), ('XMLSVG', 1), ('GWBASIC', 1), ('VBScript', 2), ('CCLI', 1), ('Lazarus', 1), ('Julia', 1), ('JSP', 1), ('PowerShell', 1), ('IronRuby', 1), ('Power', 1), ('FORTRAN', 1), ('Self', 1), ('Perl', 1), ('Small', 1), ('FoxPro', 1), ('REALbasic', 1), ('GNU', 1), ('Mathematica', 1), ('True', 1), ('Visual', 5), ('JScheme', 1), ('Maple', 1), ('Quick', 1), ('Turbo', 3), ('SAC', 1), ('JScript', 1), ('APLJ', 1), ('sh', 1), ('Kawa', 1), ('Pascal', 4), ('TSQL', 1), ('SPlus', 1), ('C', 6), ('xBaseClipper', 1), ('tcsh', 1), ('SQLPSM', 1), ('ApplicationsVBA', 1), ('SSML', 2), ('R', 1), ('Groovy', 1), ('XSLT', 2), ('MUMPS', 1), ('bash', 1), ('DarkBASIC', 1), ('SGML', 1), ('XAML', 2), ('VB', 8), ('Curl', 1), ('Schema', 2), ('MATLAB', 1), ('MathML', 2), ('Lua', 1), ('Net', 1), ('ObjectiveC', 1), ('JavaScript', 1), ('Java', 1), ('Io', 1), ('Free', 2), ('Delphi', 1), ('sed', 1), ('XML', 2), ('Forth', 1), ('C#', 1), ('SQLPLSQL', 1), ('QBASIC', 1), ('DMDScript', 1), ('Swift', 1), ('Scala', 1), ('PLI', 1)])

Just substitute right in at the end:

 tags = make_tags(counts, maxsize=120)
 create_tag_image(tags, 'cloud_large.png', size=(900, 600), fontname='Lobster')

You need to take your own time to figure out the specific corrections, such as text size, image size, background color, etc..

To here label cloud is counted completed, but is not support Chinese, the reason is that there is no suitable ttf font file, prepare a ttf Chinese fonts, such as , move it to the

# C:\Python34\Lib\site-packages\pytagcloud\fonts

Then it's time to change the file and add something like css according to the style:

{
    "name": "MicrosoftYaHei",
    "ttf": "",
    "web": "none"
  }

Just note the commas before and after. Finally change the code here:

create_tag_image(tags, 'cloud_large.png', size=(900, 600), fontname='MicrosoftYaHei')

Run, done! Chinese effect image:

This is the whole content of this article.