In most cases, theNumPy
maybePandas
to import the data, so execute it before you start:
import numpy as np import pandas as pd
Two ways to get help
Often times some function methods are not well understood, when thePythonSome help information is provided to quickly use thePythonObject.
Using the info method in Numpy
()
Python built-in functions
help(pd.read_csv)
I. Text documents
1. Plain text files
filename = '' file = open(filename, mode='r') # Open the file for reading text = () # Read the contents of the file print() # Check if the file is closed () # Close the file print(text)
Using the Context Manager -- with
with open('', 'r') as file: print(()) # Read it line by line print(()) print(())
2. Tabular data: Flat files
Reading Flat files with Numpy
Numpy The speed at which the built-in functions process data isC Language level.
Flat A file is a document that contains records with no relative relational structure. (Support forExcel, CSV and Tab(Separator file )
1. Files with one data type
The string used to separate the values skips the first two lines. The type of the result array is read in the first and third columns.
filename = '' data = (filename, delimiter=',', skiprows=2, usecols=[0,2], dtype=str)
2. Documents with mixed data types
Two hard requirements:
- Skip header information
- Distinguish between horizontal and vertical coordinates
filename = '' data = (filename, delimiter=',', names=True, dtype=None)
Reading Flat Files with Pandas
filename = '' data = pd.read_csv(filename, nrows=5, # of lines in the file to be read header=None, # Line numbers as column names sep='\t', # Separator use comment='#', # character separating comments na_values=[""]) # Strings recognizable as NA/NaN
II. Excel spreadsheets
Pandashit the nail on the headExcelFile()
bepandasneutralizationexcelIt is a very convenient and fast class for reading form files, especially in the case of files containing multiplesheet(used form a nominal expression)excelThe file is very easy to manipulate.
file = '' data = (file) df_sheet2 = (sheet_name='1960-1966', skiprows=[0], names=['Country', 'AAM: War(2002)']) df_sheet1 = pd.read_excel(data, sheet_name=0, parse_cols=[0], skiprows=[0], names=['Country'])
utilizationsheet_names
property gets the name of the sheet to read.
data.sheet_names
III. SAS documentation
SAS (Statistical Analysis System) is a modular, integrated large-scale application software system. Its save file that sas is a statistical analysis file.
from sas7bdat import SAS7BDAT with SAS7BDAT('demo.sas7bdat') as file: df_sas = file.to_data_frame()
IV. Stata files
Stata is a complete and integrated suite of statistical software that provides its users with the ability to analyze data, manage data, and create professional charts and graphs. It saves files with the suffix.dta
The Stata file of the
data = pd.read_stata('')
V. Pickled documents
Almost all data types in python (lists, dictionaries, collections, classes, etc.) can be serialized with pickle. python's pickle module implements basic data serialization and deserialization. With the serialization operations of the pickle module we are able to save the information about the objects run in the program to a file for permanent storage, and with the deserialization operations of the pickle module we are able to create the objects saved by the last program from the file.
import pickle with open('pickled_demo.pkl', 'rb') as file: pickled_data = (file) # Downloading the data that was opened and read
The corresponding operation is the write method()
。
VI. HDF5 files
HDF5 files are a common cross-platform data storage file that can store different types of images and digital data and can be transferred across different types of machines, as well as libraries that unify the handling of this file format.
HDF5 files generally begin with.h5
or.hdf5
As a suffix, specialized software is required to open a preview of the file's contents.
import h5py filename = 'H-H1_LOSC_4_v1-815411200-4096.hdf5' data = (filename, 'r')
VII. Matlab Documentation
Its a suffix by matlab to store the data in its working interval as a.mat
of the document.
import filename = '' mat = (filename)
VIII. Relational databases
from sqlalchemy import create_engine engine = create_engine('sqlite://')
utilizationtable_names()
method to get a list of table names
table_names = engine.table_names()
1. Direct query of relational databases
con = () rs = ("SELECT * FROM Orders") df = (()) = () ()
Using the Context Manager -- with
with () as con: rs = ("SELECT OrderID FROM Orders") df = ((size=5)) = ()
2. Using Pandas to query relational databases
df = pd.read_sql_query("SELECT * FROM Orders", engine)
Data Exploration
After the data is imported, the data will be initially explored, such as viewing the data type, data size, length and some other basic information. Here is a brief summary of some.
1、NumPy Arrays
data_array.dtype # Data types of array elements data_array.shape # Array size len(data_array) # Length of the array
2、Pandas DataFrames
() # Return the first few rows of DataFrames (default 5) () # Return the last few rows of DataFrames (default 5) # Return DataFrames index # Return DataFrames column names () # Return DataFrames basic information data_array = # commander-in-chief (military)DataFramesconvert toNumPyarrays
The above is a summary of the details of the eight data import methods in Python, more information about Python data import please pay attention to my other related articles!