python encoding format causes csv read error
This article documents these two problems (and pandas.csv_read) that I encountered today as a python noob:
pandasmodule (in software)“CParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2”incorrect
csv module "line contains NULL byte" error
Today, when processing data negligence, and also lazy to copy the data to xlsx save, directly modify the file extension into .csv ready to use to read. After running the algorithm to read the data when the problem came.
import pandas as pd path = '' df=pd.read_csv(path)
Note: The last two lines can be written df=pd.read_csv('').
But since read_csv itself has a lot of parameters (though not used here), it's better to write path.
That's an error.CParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2
I've looked up a variety of solutions on the internet, each with their own words due to the many parameters of read_csv, and what I'm experiencing here should only be one of them, and I've been searching for a long time to no avail. Until I read here that after looking at the code of module_csv.c, I found that the file can't have "\0", so the csv file can't be unicode encoded, it can be ANSI.
In response to my direct suffix change the result is that clicking on that .csv to open it has prompted me:
That is to say, changing the suffix here didn't get the file format right. So I chose "Save As" and changed the file format to
After that, the read will not report an error.
Note: There is a question that has not been resolved, that is, that I "directly change the suffix to get the .csv" I used notepad to open the view, the code is ANSI ah. Then I do not know why the error ...... but the problem is temporarily solved.
Now reads the format
It's a structure.
In addition, for: csv module "line contains NULL byte" error. The causes and solutions of the above problems are the same, such as
import csv csvfile=file('','rb') reader = (csvfile) for line in reader: print line ()
Error: line contains NULL byte
After the correction, the data read in is formatted as list as follows
[‘1’, ‘2’, ‘2’, ‘1’, ‘2’]
[‘1’, ‘1’, ‘1’, ‘2’, ‘2’]
[‘1’, ‘2’, ‘1’, ‘1’, ‘1’]
[‘1’, ‘1’, ‘1’, ‘1’, ‘2’]
[‘1’, ‘1’, ‘1’, ‘2’, ‘2’]
[‘1’, ‘1’, ‘1’, ‘2’, ‘2’]
[‘0.697’, ‘0.744’, ‘0.634’, ‘0.403’, ‘0.481’]
[‘0.46’, ‘0.376’, ‘0.264’, ‘0.237’, ‘0.149’]
[‘1’, ‘1’, ‘1’, ‘1’, ‘1’]
pandas read csv common errors and solutions
1) The first error
Error Tip:
: Error tokenizing data. C error: Expected 1 fields in line 121, saw 2
Solution:
import pandas as pd data = pd.read_csv(inputfile, encoding='utf-8',header=None,sep = '\t')
2) The second error
Error Tip:
: Error tokenizing data. C error: EOF inside string starting at line 15945
Solution:
import pandas as pd import csv df = pd.read_csv(csvfile, quoting=csv.QUOTE_NONE, encoding='utf-8')
The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.