Solve the problem of Chinese garbage in Python2.7 read and write files

Python2.7 for the Chinese encoding of the problem is not well handled, the past few days in the crawl data often encounter Chinese encoding problems. But I do not understand the principle of coding, and I do not have time to investigate the principle. Here only from the application point of view to summarize.

1. Setting the default encoding

Anywhere in the Python code in Chinese, compilation will report errors, then you can add the first line of the code in the corresponding instructions, clear utf-8 encoding format, you can solve the general case of Chinese error. Of course, programming encountered in the specific problems need to be specifically analyzed.

#encoding:utf-8
or
# -*- coding: utf-8 -*- 
import sys 
reload(sys) 
('utf8') # Set the default encoding format to'utf-8'

2. File reading and writing

When encountering Chinese in reading or writing files, usually no error will be reported, but the final results will be displayed in a garbled manner, which will bring inconvenience to the subsequent processing.

2.1 Reading files

Read the file, if the file path, file name in Chinese, you need to use the unicode function to encode it as 'utf-8' format, and then read the normal file. I commonly used pandas read_csv function as an example, using the following code can successfully read the csv file named "POI summary table", saved in the DataFrame data type poi_list.

import pandas as pd
inpath = 'C:\\\POI Summary Table.csv'
**path = unicode(inpath, 'utf-8')**
poi_list = pd.read_csv(path)

2.2 Writing documents

File name in Chinese, file name garbled

When you want to save the results of the program run to a text file, text file naming if there is Chinese, do not do to deal with the file name will appear garbled. The use of unicode function for encoding can be solved. unicode('Chinese.csv','utf-8')

File content in Chinese, excel open content messy code

If you will contain the results of Chinese output to csv files, the general default use of Excel to open the file, the file content will appear garbled, while the use of text editors to open the code will not be messed up. This is because Excel's default encoding for the 'GBK', while the default format of the text editor for the 'utf-8'. Use the codecs package to create the file after adding the statement (codecs.BOM_UTF8) can be solved!

name='Languages'
f = open(name+'.csv','w')
('123, language')
()
# Modify the code
import codecs
f = open(**unicode(name+'.csv','utf-8')**,'w') # File names are not garbled
**(codecs.BOM_UTF8) # excel core statements to open content without garbling**
('123, language')
()

Output results:

#File name:  water.csv
#Excel Open 123  Water
# Text editor to open 123, language
# After recoding
# File name: language.csv
# Excel open 123 Language
# Text editor to open 123, language

Above this to solve the Python2.7 read and write files in the Chinese garbled problem is all that I have shared with you, I hope to be able to give you a reference, but also hope that you support me more.