SoFunction
Updated on 2024-11-20

6 common ways to read and write data files like excel in python (summary)

The following is a compilation of the ways in which python can read data files.

1. python built-in methods (read, readline, readlines)

  • read() : Read the whole file at once. It is recommended to use the read(size) method, the larger the size, the longer the running time.
  • readline(): reads one line at a time. Used when there is not enough memory.
  • readlines(): reads the entire file at once and returns it to the list by line, so that we can traverse it.

2. Built-in modules (csv)

python has a built-in csv module for reading and writing csv files, which are comma-delimited files that are one of the most common data storage formats in data science.
csv module can easily complete a variety of volume data read and write operations, of course, large data volumes need to be optimized at the code level.

csv module reads files

# Read csv files
import csv  
with open('','r') as myFile:  
    lines=(myFile)  
    for line in lines:  
        print (line)  

csv module write file

import csv  
with open('','w+') as myFile:      
    myWriter=(myFile)  
    # writerrow writes line by line
    ([7,8,9])  
    ([8,'h','f'])  
    # writerowrite multiple lines
    myList=[[1,2,3],[4,5,6]]  
    (myList)  

3. Use of numpy libraries (loadtxt, load, fromfile)

loadtxt method

loadtxt is used to read text files (including txt, csv, etc.) and compressed files in .gz or .bz2 format, provided that each line of the file data must have the same number of values.

import numpy as np
# The dtype parameter in loadtxt() is set to float by default
# This is set to a str string for display purposes.
('',dtype=str)
# out:array(['1,2,3', '4,5,6', '7,8,9'], dtype='<U5')

load method

load is used to read numpy-specific .npy, .npz or pickled persistent files.

import numpy as np
# Mr. npy file
('', ([[1, 2, 3], [4, 5, 6]]))
# Use load to load npy files
('')
'''
out:array([[1, 2, 3],
       [4, 5, 6]])
'''

fromfile method

The fromfile method can read simple text data or binary data from binary data saved by the tofile method. Reading the data requires the user to specify the element type and modify the shape of the array appropriately.

import numpy as np
x = (9).reshape(3,3)
('')
('',dtype=)
# out:array([0, 1, 2, 3, 4, 5, 6, 7, 8])

4. Use of pandas libraries (read_csv, read_excel, etc.)

pandas is one of the most commonly used analytical libraries for data processing, which can read data files in a variety of formats and generally outputs dataframe format.
Such as: txt, csv, excel, json, clipboard, database, html, hdf, parquet, pickled files, sas, stata, etc.

read_csv method

The read_csv method is used to read a file in csv format and output it in dataframe format.

import pandas as pd
pd.read_csv('')

read_excel method

Read excel files, including xlsx, xls, xlsm formats

import pandas as pd
pd.read_excel('')

read_table method
Read any text file by controlling the sep parameter (delimiter)

read_json method

Read json format file

df = ([['a', 'b'], ['c', 'd']],index=['row 1', 'row 2'],columns=['col 1', 'col 2'])
j = df.to_json(orient='split')
pd.read_json(j,orient='split')

read_html method

Reading html tables

read_clipboard method

Read the contents of the clipboard

read_pickle method

Read plckled persistence file

read_sql method

Just read the database data, connect to the database and pass in the sql statement

read_dhf method

Read hdf5 files, suitable for large file reading

read_parquet method

Reading a parquet file

read_sas method

Reading sas files

read_stata method

Read stata file

read_gbq method

Read google bigquery data

5, read and write excel files (xlrd, xlwt, openpyxl, etc.)

There are many python libraries for reading and writing excel files, in addition to the previously mentioned pandas, there are xlrd, xlwt, openpyxl, xlwings, and so on.

Main Module:

  • xlrd library: read data from excel, support xls, xlsx
  • xlwt library: modify the operation of excel, does not support the modification of the xlsx format
  • xlutils library: modification of an existing file in xlw and xlrd
  • openpyxl: mainly for xlsx format excel for reading and editing
  • xlwings: read/write, format modification and other operations on xlsx, xls, xlsm format files
  • xlsxwriter: used to generate excel tables, insert data, insert icons and other table operations, does not support reading
  • Microsoft Excel API: need to install pywin32, directly communicate with the Excel process, you can do anything you can do in Excel, but slower!

6. Operating databases (pymysql, cx_Oracle, etc.)

python supports interaction with almost all databases. After connecting to a database, you can use sql statements to add, delete, and retrieve.
Main Module:

  • pymysql: for interaction with mysql database
  • sqlalchemy: used to interact with mysql database
  • cx_Oracle: for interaction with oracle database
  • sqlite3: built-in library for interacting with sqlite databases
  • pymssql: for interaction with sql server databases
  • pymongo: for interaction with mongodb non-relational databases
  • redis, pyredis: for interaction with redis non-relational databases

to this article on python read and write excel and other data files in 6 common ways (summary) of the article is introduced to this, more related python read and write excel content, please search for my previous posts or continue to browse the following related articles I hope you will support me in the future!