SoFunction
Updated on 2024-11-20

Example of Python using pandas to process CSV file explained

Python has many convenient libraries that can be used for data processing, especially Numpy and Pandas, together with the matplot drawing special module, very powerful.

CSV (Comma-Separated Values) format of the file refers to the form of plain text stored in the form of tabular data, which means that you can not simply use the Excel table tool for processing, and Excel tables to deal with the amount of data is very limited, and the use of Pandas to deal with the huge amount of data in the CSV file is much easier.

I used to use their own other hardware tools to capture the data, the hardware environment is built on the Linux platform, when the data is directly output in the terminal after running the script, the amount of data is very large, in order to save the data obtained under Linux using the data stream redirection, the data are all saved to a text file, forming a local csv file.

Pandas reads local CSV file and sets Dataframe (data format)

import pandas as pd
import numpy as np
df=pd.read_csv('filename',header=None,sep=' ') #filename can start directly from the disk letter, marking each level of the folder up to the csv file, header=None means the header is empty, sep=' ' means that the data between the use of space as a separator, if the separator is a comma, just replace it with ', '.
print ()
print ()
#as an example,exportsCSVThe first part of the document5Row and end5classifier for objects in rows such as words,this ispandas默认的exports5classifier for objects in rows such as words,可以根据需要自己设定exports几classifier for objects in rows such as words的值

Data Reading Example

The picture shows the first 5 rows of my local data and the last 5 rows, the first column is not labeled is the line number, there are a total of 13 columns of data, labeled from 0 to 12, a line display is not complete, in the 9th column after the line, and with a backslash "\" marked out.

Updated April 28, 2017

After using pandas to directly read a local csv file, the column index of the csv file defaults to a number starting from 0. The statement to redefine the column index is as follows:

import pandas as pd
import numpy as np
df=pd.read_csv('filename',header=None,sep=' ',names=["week",'month','date','time','year','name1','freq1','name2','freq2','name3','data1','name4','data2'])
print df

At this point the following information is printed out for the file, the column indexes have been renamed:

The above example of this Python using pandas to process CSV files is all that I have shared with you.