This article example describes the Python data analysis pandas module usage. Shared for your reference, as follows:
pandas
For a 10-minute introduction to pandas, check out the official website: 10 minutes to pandas
Also check out the more complex cookiebook
- pandas is a very powerful data analysis package, pandas is based on Numpy to build data analysis packages with more advanced data structures and tools. Just as Numpy's core is ndarray, pandas is built around two core data structures, Series and DataFrame, which correspond to one-dimensional series and two-dimensional table structures.
create an object
Conventional import method:
import pandas as pd import numpy as np import as plt
Series
- Series can be thought of as a fixed-length ordered dictionary, which is a one-dimensional tagged array capable of holding any data type (integers, strings, floats, Python objects, etc.).
- The Series object contains two main properties: index and values.
- The data can be a Python dictionary, ndarray, scalar value (e.g. 5), etc.
- The default index will be set at creation time with or without index, but when the index is an array, the index at creation time will be used by default.
- You can also specify the name attribute at creation time, and later modify the rename
ser1 = (range(10,15),index=list('ABCDE')) print(ser1) # Subscripts and indexes are equivalent print(ser1['A']) print(ser1[0])
Output:
A 10
B 11
C 12
D 13
E 14
dtype: int64
10
10
When fetching multiple consecutive data, the subscript value does not include the end position, and the index slice includes the end position.
print(ser1['A':'D']) print(ser1[0:3])
Output:
A 10
B 11
C 12
D 13
dtype: int64
A 10
B 11
C 12
dtype: int64
Fetch multiple data, conditional filtering (Boolean indexing)
# Notice it's a list inside # print(ser1[[0,1,3]]) # Boolean index print(ser1[(ser1>12)&(ser1<15)])
DataFrame
A DataFrame is a two-dimensional labeled data structure. You can think of it as a spreadsheet or SQL table, or a Series object. It is usually the most commonly used pandans object. Like a Series, a DataFrame accepts many different kinds of input:
- Dict of 1D ndarrays, lists, dicts, or Series
- 2-D
- Structured or record ndarray
- A Series
- Another DataFrame
df1 = ((10,50,(3,4)), - index=list('ABC'),columns=list('abcd'))
- index is a row index and columns is a column index.
- When creating with a dictionary, the key name is the column index, and the key values can be a list, which will be auto-completing
Fetching a single row or column of data, fetching a single piece of data
# Column fetch, the fetch is a series object print(df1['a']) print(df1['a'].values) # Fetch a row of data, i.e., a single piece of data print(df1['a']['B']) # These two are the same print(df1['a'][1])
Fetch discontinuous multiple columns, fetch continuous multiple columns (continuous not supported by default, requires advanced indexing)
# Fetch discontinuous multiple columns print(df1[['a','c']])
line index, can be directly sliced, but the default can not not be consecutive rows of values, subscripts the same way
print('Row index fetch ##############') print(df1['A':'A']) # Take multiple consecutive rows is df1['A':'C'].
Advanced Indexing (Fancy Indexing)
Generally used for DataFrame, here directly omit Series
loc tag index
df1 = ((10,50,(5,4)), index=list('ABCDE'),columns=list('abcd')) # Take a single row, type series print(['A']) print(type(['A'])) # Take multiple consecutive rows, type DataFrame print(['A':'C']) # If you don't have an index, use subscripts, which can take multiple consecutive rows and columns. print(['A':'D','a':'c']) # Take discrete rows and discrete columns print([['A','C'],['a','c']])
iloc Location Index
iloc is the subscript and lo use the same, but the subscript index left closed right open, loc is including the last bit of the
# DataFrame print([0:2, 0]) # Note the difference with ['A':'C', 'a']. print(['A':'C', 'a'])
ix Mixed indexing of tags and positions
The version of pandas 0.24.2 that the blogger is using has deprecated .ix (warned but still works), so that's out too!
- ix is a combination of both of the above, using both index numbers and custom indexes, depending on the situation.
- If the index has both numbers and English, then this approach is not recommended and can easily lead to confusion in positioning.
Add data
1 | 2 |
---|---|
Add a line of data | 1.[‘D'] = [1,2,3,4,5] 2.[‘D'] = [(10,20)] |
Add a column of data | df1. |
Readers interested in more Python related content can check out this site's topic: theSummary of Python mathematical operations techniques》、《Python Data Structures and Algorithms Tutorial》、《Summary of Python function usage tips》、《Summary of Python string manipulation techniques》、《Python introductory and advanced classic tutorialsand theSummary of Python file and directory manipulation techniques》
I hope that what I have said in this article will help you in Python programming.