Summary of eight data import methods in Python

In most cases, theNumPymaybePandasto import the data, so execute it before you start:

import numpy as np
import pandas as pd

Two ways to get help

Often times some function methods are not well understood, when thePythonSome help information is provided to quickly use thePythonObject.

Using the info method in Numpy

()

Python built-in functions

help(pd.read_csv)

I. Text documents

1. Plain text files

filename = ''
file = open(filename, mode='r') # Open the file for reading
text = () # Read the contents of the file
print() # Check if the file is closed
() # Close the file
print(text)

Using the Context Manager -- with

with open('', 'r') as file:
    print(()) # Read it line by line
    print(())
    print(())

2. Tabular data: Flat files

Reading Flat files with Numpy

Numpy The speed at which the built-in functions process data isC Language level.

Flat A file is a document that contains records with no relative relational structure. (Support forExcel, CSV and Tab(Separator file )

1. Files with one data type

The string used to separate the values skips the first two lines. The type of the result array is read in the first and third columns.

filename = ''
data = (filename,
                  delimiter=',',
                  skiprows=2,
                  usecols=[0,2],
                  dtype=str)

2. Documents with mixed data types

Two hard requirements:

Skip header information
Distinguish between horizontal and vertical coordinates

filename = ''
data = (filename,
                     delimiter=',',
                     names=True,
                     dtype=None)

Reading Flat Files with Pandas

filename = '' 
data = pd.read_csv(filename, 
                   nrows=5,        # of lines in the file to be read
                   header=None,    # Line numbers as column names
                   sep='\t',       # Separator use
                   comment='#', # character separating comments
                   na_values=[""]) # Strings recognizable as NA/NaN

II. Excel spreadsheets

Pandashit the nail on the headExcelFile()bepandasneutralizationexcelIt is a very convenient and fast class for reading form files, especially in the case of files containing multiplesheet(used form a nominal expression)excelThe file is very easy to manipulate.

file = ''
data = (file)
df_sheet2 = (sheet_name='1960-1966',
                       skiprows=[0],
                       names=['Country',
                              'AAM: War(2002)'])
df_sheet1 = pd.read_excel(data,
                          sheet_name=0,
                          parse_cols=[0],
                          skiprows=[0],
                          names=['Country'])

utilizationsheet_namesproperty gets the name of the sheet to read.

data.sheet_names

III. SAS documentation

SAS (Statistical Analysis System) is a modular, integrated large-scale application software system. Its save file that sas is a statistical analysis file.

from sas7bdat import SAS7BDAT
with SAS7BDAT('demo.sas7bdat') as file:
  df_sas = file.to_data_frame()

IV. Stata files

Stata is a complete and integrated suite of statistical software that provides its users with the ability to analyze data, manage data, and create professional charts and graphs. It saves files with the suffix.dtaThe Stata file of the

data = pd.read_stata('')

V. Pickled documents

Almost all data types in python (lists, dictionaries, collections, classes, etc.) can be serialized with pickle. python's pickle module implements basic data serialization and deserialization. With the serialization operations of the pickle module we are able to save the information about the objects run in the program to a file for permanent storage, and with the deserialization operations of the pickle module we are able to create the objects saved by the last program from the file.

import pickle
with open('pickled_demo.pkl', 'rb') as file:
   pickled_data = (file) # Downloading the data that was opened and read

The corresponding operation is the write method() 。

VI. HDF5 files

HDF5 files are a common cross-platform data storage file that can store different types of images and digital data and can be transferred across different types of machines, as well as libraries that unify the handling of this file format.

HDF5 files generally begin with.h5 or.hdf5 As a suffix, specialized software is required to open a preview of the file's contents.

import h5py
filename = 'H-H1_LOSC_4_v1-815411200-4096.hdf5'
data = (filename, 'r')

VII. Matlab Documentation

Its a suffix by matlab to store the data in its working interval as a.matof the document.

import 
filename = ''
mat = (filename)

VIII. Relational databases

from sqlalchemy import create_engine
engine = create_engine('sqlite://')

utilizationtable_names()method to get a list of table names

table_names = engine.table_names()

1. Direct query of relational databases

con = ()
rs = ("SELECT * FROM Orders")
df = (())
 = ()
()

Using the Context Manager -- with

with () as con:
    rs = ("SELECT OrderID FROM Orders")
    df = ((size=5))
     = ()

2. Using Pandas to query relational databases

df = pd.read_sql_query("SELECT * FROM Orders", engine)

Data Exploration

After the data is imported, the data will be initially explored, such as viewing the data type, data size, length and some other basic information. Here is a brief summary of some.

1、NumPy Arrays

data_array.dtype  # Data types of array elements
data_array.shape  # Array size
len(data_array)   # Length of the array

2、Pandas DataFrames

()  # Return the first few rows of DataFrames (default 5)
()  # Return the last few rows of DataFrames (default 5)
   # Return DataFrames index
 # Return DataFrames column names
()  # Return DataFrames basic information
data_array =  # commander-in-chief (military)DataFramesconvert toNumPyarrays

The above is a summary of the details of the eight data import methods in Python, more information about Python data import please pay attention to my other related articles!