The following is a compilation of the ways in which python can read data files.
1. python built-in methods (read, readline, readlines)
- read() : Read the whole file at once. It is recommended to use the read(size) method, the larger the size, the longer the running time.
- readline(): reads one line at a time. Used when there is not enough memory.
- readlines(): reads the entire file at once and returns it to the list by line, so that we can traverse it.
2. Built-in modules (csv)
python has a built-in csv module for reading and writing csv files, which are comma-delimited files that are one of the most common data storage formats in data science.
csv module can easily complete a variety of volume data read and write operations, of course, large data volumes need to be optimized at the code level.
csv module reads files
# Read csv files import csv with open('','r') as myFile: lines=(myFile) for line in lines: print (line)
csv module write file
import csv with open('','w+') as myFile: myWriter=(myFile) # writerrow writes line by line ([7,8,9]) ([8,'h','f']) # writerowrite multiple lines myList=[[1,2,3],[4,5,6]] (myList)
3. Use of numpy libraries (loadtxt, load, fromfile)
loadtxt method
loadtxt is used to read text files (including txt, csv, etc.) and compressed files in .gz or .bz2 format, provided that each line of the file data must have the same number of values.
import numpy as np # The dtype parameter in loadtxt() is set to float by default # This is set to a str string for display purposes. ('',dtype=str) # out:array(['1,2,3', '4,5,6', '7,8,9'], dtype='<U5')
load method
load is used to read numpy-specific .npy, .npz or pickled persistent files.
import numpy as np # Mr. npy file ('', ([[1, 2, 3], [4, 5, 6]])) # Use load to load npy files ('') ''' out:array([[1, 2, 3], [4, 5, 6]]) '''
fromfile method
The fromfile method can read simple text data or binary data from binary data saved by the tofile method. Reading the data requires the user to specify the element type and modify the shape of the array appropriately.
import numpy as np x = (9).reshape(3,3) ('') ('',dtype=) # out:array([0, 1, 2, 3, 4, 5, 6, 7, 8])
4. Use of pandas libraries (read_csv, read_excel, etc.)
pandas is one of the most commonly used analytical libraries for data processing, which can read data files in a variety of formats and generally outputs dataframe format.
Such as: txt, csv, excel, json, clipboard, database, html, hdf, parquet, pickled files, sas, stata, etc.
read_csv method
The read_csv method is used to read a file in csv format and output it in dataframe format.
import pandas as pd pd.read_csv('')
read_excel method
Read excel files, including xlsx, xls, xlsm formats
import pandas as pd pd.read_excel('')
read_table method
Read any text file by controlling the sep parameter (delimiter)
read_json method
Read json format file
df = ([['a', 'b'], ['c', 'd']],index=['row 1', 'row 2'],columns=['col 1', 'col 2']) j = df.to_json(orient='split') pd.read_json(j,orient='split')
read_html method
Reading html tables
read_clipboard method
Read the contents of the clipboard
read_pickle method
Read plckled persistence file
read_sql method
Just read the database data, connect to the database and pass in the sql statement
read_dhf method
Read hdf5 files, suitable for large file reading
read_parquet method
Reading a parquet file
read_sas method
Reading sas files
read_stata method
Read stata file
read_gbq method
Read google bigquery data
5, read and write excel files (xlrd, xlwt, openpyxl, etc.)
There are many python libraries for reading and writing excel files, in addition to the previously mentioned pandas, there are xlrd, xlwt, openpyxl, xlwings, and so on.
Main Module:
- xlrd library: read data from excel, support xls, xlsx
- xlwt library: modify the operation of excel, does not support the modification of the xlsx format
- xlutils library: modification of an existing file in xlw and xlrd
- openpyxl: mainly for xlsx format excel for reading and editing
- xlwings: read/write, format modification and other operations on xlsx, xls, xlsm format files
- xlsxwriter: used to generate excel tables, insert data, insert icons and other table operations, does not support reading
- Microsoft Excel API: need to install pywin32, directly communicate with the Excel process, you can do anything you can do in Excel, but slower!
6. Operating databases (pymysql, cx_Oracle, etc.)
python supports interaction with almost all databases. After connecting to a database, you can use sql statements to add, delete, and retrieve.
Main Module:
- pymysql: for interaction with mysql database
- sqlalchemy: used to interact with mysql database
- cx_Oracle: for interaction with oracle database
- sqlite3: built-in library for interacting with sqlite databases
- pymssql: for interaction with sql server databases
- pymongo: for interaction with mongodb non-relational databases
- redis, pyredis: for interaction with redis non-relational databases
to this article on python read and write excel and other data files in 6 common ways (summary) of the article is introduced to this, more related python read and write excel content, please search for my previous posts or continue to browse the following related articles I hope you will support me in the future!