Python2.7 for the Chinese encoding of the problem is not well handled, the past few days in the crawl data often encounter Chinese encoding problems. But I do not understand the principle of coding, and I do not have time to investigate the principle. Here only from the application point of view to summarize.
1. Setting the default encoding
Anywhere in the Python code in Chinese, compilation will report errors, then you can add the first line of the code in the corresponding instructions, clear utf-8 encoding format, you can solve the general case of Chinese error. Of course, programming encountered in the specific problems need to be specifically analyzed.
#encoding:utf-8 or # -*- coding: utf-8 -*- import sys reload(sys) ('utf8') # Set the default encoding format to'utf-8'
2. File reading and writing
When encountering Chinese in reading or writing files, usually no error will be reported, but the final results will be displayed in a garbled manner, which will bring inconvenience to the subsequent processing.
2.1 Reading files
Read the file, if the file path, file name in Chinese, you need to use the unicode function to encode it as 'utf-8' format, and then read the normal file. I commonly used pandas read_csv function as an example, using the following code can successfully read the csv file named "POI summary table", saved in the DataFrame data type poi_list.
import pandas as pd inpath = 'C:\\\POI Summary Table.csv' **path = unicode(inpath, 'utf-8')** poi_list = pd.read_csv(path)
2.2 Writing documents
File name in Chinese, file name garbled
When you want to save the results of the program run to a text file, text file naming if there is Chinese, do not do to deal with the file name will appear garbled. The use of unicode function for encoding can be solved. unicode('Chinese.csv','utf-8')
File content in Chinese, excel open content messy code
If you will contain the results of Chinese output to csv files, the general default use of Excel to open the file, the file content will appear garbled, while the use of text editors to open the code will not be messed up. This is because Excel's default encoding for the 'GBK', while the default format of the text editor for the 'utf-8'. Use the codecs package to create the file after adding the statement (codecs.BOM_UTF8) can be solved!
name='Languages' f = open(name+'.csv','w') ('123, language') () # Modify the code import codecs f = open(**unicode(name+'.csv','utf-8')**,'w') # File names are not garbled **(codecs.BOM_UTF8) # excel core statements to open content without garbling** ('123, language') ()
Output results:
#File name: water.csv #Excel Open 123 Water # Text editor to open 123, language # After recoding # File name: language.csv # Excel open 123 Language # Text editor to open 123, language
Above this to solve the Python2.7 read and write files in the Chinese garbled problem is all that I have shared with you, I hope to be able to give you a reference, but also hope that you support me more.