SoFunction
Updated on 2024-12-13

Python encoding format causes csv read error problem (, pandas.csv_read)

python encoding format causes csv read error

This article documents these two problems (and pandas.csv_read) that I encountered today as a python noob:

pandasmodule (in software)“CParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2”incorrect

csv module "line contains NULL byte" error

Today, when processing data negligence, and also lazy to copy the data to xlsx save, directly modify the file extension into .csv ready to use to read. After running the algorithm to read the data when the problem came.

import pandas as pd
path = ''
df=pd.read_csv(path)

Note: The last two lines can be written df=pd.read_csv('').

But since read_csv itself has a lot of parameters (though not used here), it's better to write path.

That's an error.CParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2

I've looked up a variety of solutions on the internet, each with their own words due to the many parameters of read_csv, and what I'm experiencing here should only be one of them, and I've been searching for a long time to no avail. Until I read here that after looking at the code of module_csv.c, I found that the file can't have "\0", so the csv file can't be unicode encoded, it can be ANSI.

In response to my direct suffix change the result is that clicking on that .csv to open it has prompted me:


That is to say, changing the suffix here didn't get the file format right. So I chose "Save As" and changed the file format to

After that, the read will not report an error.

Note: There is a question that has not been resolved, that is, that I "directly change the suffix to get the .csv" I used notepad to open the view, the code is ANSI ah. Then I do not know why the error ...... but the problem is temporarily solved.

Now reads the format

It's a structure.

In addition, for: csv module "line contains NULL byte" error. The causes and solutions of the above problems are the same, such as

import csv
csvfile=file('','rb')
reader = (csvfile)
for line in reader:
     print line
()

Error: line contains NULL byte

After the correction, the data read in is formatted as list as follows

[‘1’, ‘2’, ‘2’, ‘1’, ‘2’]
[‘1’, ‘1’, ‘1’, ‘2’, ‘2’]
[‘1’, ‘2’, ‘1’, ‘1’, ‘1’]
[‘1’, ‘1’, ‘1’, ‘1’, ‘2’]
[‘1’, ‘1’, ‘1’, ‘2’, ‘2’]
[‘1’, ‘1’, ‘1’, ‘2’, ‘2’]
[‘0.697’, ‘0.744’, ‘0.634’, ‘0.403’, ‘0.481’]
[‘0.46’, ‘0.376’, ‘0.264’, ‘0.237’, ‘0.149’]
[‘1’, ‘1’, ‘1’, ‘1’, ‘1’]

pandas read csv common errors and solutions

1) The first error

Error Tip:

: Error tokenizing data. C error: Expected 1 fields in line 121, saw 2

Solution:

import pandas as pd
data = pd.read_csv(inputfile, encoding='utf-8',header=None,sep = '\t')

2) The second error

Error Tip:

: Error tokenizing data. C error: EOF inside string starting at line 15945

Solution:

import pandas as pd
import csv
df = pd.read_csv(csvfile, quoting=csv.QUOTE_NONE, encoding='utf-8')

The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.