SoFunction
Updated on 2024-11-13

Python data analysis pandas module usage examples in detail

This article example describes the Python data analysis pandas module usage. Shared for your reference, as follows:

pandas

For a 10-minute introduction to pandas, check out the official website: 10 minutes to pandas

Also check out the more complex cookiebook

  • pandas is a very powerful data analysis package, pandas is based on Numpy to build data analysis packages with more advanced data structures and tools. Just as Numpy's core is ndarray, pandas is built around two core data structures, Series and DataFrame, which correspond to one-dimensional series and two-dimensional table structures.

create an object

Conventional import method:

import pandas as pd
import numpy as np
import  as plt

Series

  • Series can be thought of as a fixed-length ordered dictionary, which is a one-dimensional tagged array capable of holding any data type (integers, strings, floats, Python objects, etc.).
  • The Series object contains two main properties: index and values.
  • The data can be a Python dictionary, ndarray, scalar value (e.g. 5), etc.
  • The default index will be set at creation time with or without index, but when the index is an array, the index at creation time will be used by default.
  • You can also specify the name attribute at creation time, and later modify the rename
ser1 = (range(10,15),index=list('ABCDE'))
print(ser1)
# Subscripts and indexes are equivalent
print(ser1['A'])
print(ser1[0])

Output:

A    10
B    11
C    12
D    13
E    14
dtype: int64
10
10

When fetching multiple consecutive data, the subscript value does not include the end position, and the index slice includes the end position.

print(ser1['A':'D'])
print(ser1[0:3])

Output:

A    10
B    11
C    12
D    13
dtype: int64
A    10
B    11
C    12
dtype: int64

Fetch multiple data, conditional filtering (Boolean indexing)

# Notice it's a list inside #
print(ser1[[0,1,3]])
# Boolean index
print(ser1[(ser1>12)&(ser1<15)])

DataFrame

A DataFrame is a two-dimensional labeled data structure. You can think of it as a spreadsheet or SQL table, or a Series object. It is usually the most commonly used pandans object. Like a Series, a DataFrame accepts many different kinds of input:

  • Dict of 1D ndarrays, lists, dicts, or Series
  • 2-D
  • Structured or record ndarray
  • A Series
  • Another DataFrame
df1 = ((10,50,(3,4)), - index=list('ABC'),columns=list('abcd'))
  • index is a row index and columns is a column index.
  • When creating with a dictionary, the key name is the column index, and the key values can be a list, which will be auto-completing

Fetching a single row or column of data, fetching a single piece of data

# Column fetch, the fetch is a series object
print(df1['a'])
print(df1['a'].values)
# Fetch a row of data, i.e., a single piece of data
print(df1['a']['B']) # These two are the same
print(df1['a'][1])

Fetch discontinuous multiple columns, fetch continuous multiple columns (continuous not supported by default, requires advanced indexing)

# Fetch discontinuous multiple columns
print(df1[['a','c']])

line index, can be directly sliced, but the default can not not be consecutive rows of values, subscripts the same way

print('Row index fetch ##############')
print(df1['A':'A'])
# Take multiple consecutive rows is df1['A':'C'].

Advanced Indexing (Fancy Indexing)

Generally used for DataFrame, here directly omit Series

loc tag index

df1 = ((10,50,(5,4)), index=list('ABCDE'),columns=list('abcd'))
# Take a single row, type series
print(['A'])
print(type(['A']))
# Take multiple consecutive rows, type DataFrame
print(['A':'C'])
# If you don't have an index, use subscripts, which can take multiple consecutive rows and columns.
print(['A':'D','a':'c'])
# Take discrete rows and discrete columns
print([['A','C'],['a','c']])

iloc Location Index

iloc is the subscript and lo use the same, but the subscript index left closed right open, loc is including the last bit of the

# DataFrame
print([0:2, 0]) # Note the difference with ['A':'C', 'a'].
print(['A':'C', 'a'])

ix Mixed indexing of tags and positions

The version of pandas 0.24.2 that the blogger is using has deprecated .ix (warned but still works), so that's out too!

  • ix is a combination of both of the above, using both index numbers and custom indexes, depending on the situation.
  • If the index has both numbers and English, then this approach is not recommended and can easily lead to confusion in positioning.

Add data

1 2
Add a line of data 1.[‘D'] = [1,2,3,4,5] 2.[‘D'] = [(10,20)]
Add a column of data df1.

Readers interested in more Python related content can check out this site's topic: theSummary of Python mathematical operations techniques》、《Python Data Structures and Algorithms Tutorial》、《Summary of Python function usage tips》、《Summary of Python string manipulation techniques》、《Python introductory and advanced classic tutorialsand theSummary of Python file and directory manipulation techniques

I hope that what I have said in this article will help you in Python programming.