python pandas group aggregation
1. Environment
- python3.9
- win10 64bit
- pandas==1.2.1
groupby
method is the grouping method in pandas, which uses thegroupby
method returns theDataFrameGroupBy
objects, generally grouping operations are followed by aggregation operations.
2. Grouping
import pandas as pd import numpy as np pd.set_option('display.notebook_repr_html',False) # Data preparation df = ({'A': [1, 1, 2, 2],'B': [1, 2, 3, 4],'C':[6,8,1,9]}) df
A B C 0 1 1 6 1 1 2 8 2 2 3 1 3 2 4 9
PressA
The columns are grouped to produce a grouped dataframe. Grouped dataframes are iterable objects that can be traversed in a loop, and you can see that in the loop, each element is of type tuple that
The first element of the tuple is the grouped value and the second element is the corresponding grouped data frame.
# Grouping g_df=('A') # Grouped dataframe classes type(g_df)
# Circular grouping of data for i in g_df: print(i,type(i),end='\n\n')
(1, A B C 0 1 1 6 1 1 2 8) <class 'tuple'>
(2, A B C 2 2 3 1 3 2 4 9) <class 'tuple'>
Aggregation methods can be used directly on grouped dataframesagg
, calculates the value of the statistical function for each column of the grouped data frame.
# Summing in groups ('A').agg('sum') B C A 1 3 14 2 7 10
3. Sequence grouping
Data boxes can be grouped according to the sequence data outside the data box, it should be noted that the length of the sequence needs to be the same as the number of rows in the data box.
# Define grouped lists label=['a','a','b','b'] # Summing in groups (label).agg('sum') A B C a 2 3 14 b 4 7 10
4、Multi-column grouping
Dataframes can be grouped based on multiple columns of the dataframe.
# Data preparation df = ({'A': [1, 1, 2, 2],'B': [3, 4, 3, 3],'C':[6,8,1,9]}) df
A B C 0 1 3 6 1 1 4 8 2 2 3 1 3 2 3 9
according toA
,B
The columns are grouped and then summed.
# Sums based on multiple columns (['A','B']).agg('sum')
C A B 1 3 6 4 8 2 3 10
5. Index Grouping
Data frames can be grouped according to the index, and the level parameter needs to be set.
# Data preparation df = ({'A': [1, 1, 2, 2],'B': [3, 4, 3, 3],'C':[6,8,1,9]},index=['a','a','b','b']) df
A B C a 1 3 6 a 1 4 8 b 2 3 1 b 2 3 9
The data frame has only one level of indexing, setting the parameterlevel=0
。
# Summing by indexed groups (level=0).agg('sum') A B C a 2 7 14 b 4 6 10
When there are multiple levels of data frame indexes, the level parameter can also be set on demand to accomplish group aggregation.
# Data preparation mi=.from_arrays([[1,1,2,2],[3,4,3,3]],names=['id1','id2']) df=(dict(value=[4,7,2,9]),index=mi) df
value id1 id2 1 3 4 4 7 2 3 2 3 9
set uplevel
parameter, if you need to group according to the first level index, i.e. id1, you can set thelevel=0
maybelevel='id1'
Complete the packet aggregation.
# Summed in groups based on the first level of indexing (level=0).agg('sum')
value id1 1 11 2 11
# Summed in groups based on the first level of indexing (level='id1').agg('sum')
value id1 1 11 2 11
7. Polymerization
The grouping is generally followed by an aggregation operation with theagg
method for aggregation.
# Data preparation df = ({'A': [1, 1, 2, 2],'B': [3, 4, 3, 3],'C':[6,8,1,9],'D':[2,5,4,8]}) df
A B C D 0 1 3 6 2 1 1 4 8 5 2 2 3 1 4 3 2 3 9 8
8, single function on multiple columns
Aggregation of the grouped dataframes is performed using a single function. The single aggregation function performs calculations on each column and then merges them back. The aggregation function is passed as a string.
# Sum all columns in groups ('A').agg('sum')
B C D A 1 7 14 7 2 6 10 12
You can specify columns for grouped aggregation of grouped data. Requiredtake note ofSubcolumns need to be wrapped in [].
。
# Sum the specified columns in groups ('A')[['B','C']].agg('sum')
B C A 1 7 14 2 6 10
Aggregate functions can also be passed in custom anonymous functions.
# anonymous functions grouping and summing ('A').agg(lambda x:sum(x))
B C D A 1 7 14 7 2 6 10 12
9, multi-function to multi-column
Aggregate functions can be multiple functions. When aggregating, multiple aggregation functions will perform calculations on each column and then merge them to return. Aggregate functions are passed in as a list.
# All columns multi-function aggregation ('A').agg(['sum','mean'])
B C D sum mean sum mean sum mean A 1 7 3.5 14 7 7 3.5 2 6 3.0 10 5 12 6.0
The column names of the data returned by the aggregation have two levels of indexing, the first being the names of the columns that were aggregated, and the second being the names of the aggregation functions that were used. If you need to rename the returned aggregation function names, the
It is necessary to pass a tuple with the first element being the name of the aggregation function and the second element being the aggregation function when passing the parameter.
# Aggregate function renaming ('A').agg([('SUM','sum'),('MEAN','mean')])
B C D SUM MEAN SUM MEAN SUM MEAN A 1 7 3.5 14 7 7 3.5 2 6 3.0 10 5 12 6.0
Similarly, anonymous functions can be passed in.
# anonymize functions and rename them ('A').agg([('SUM','sum'),('MAX',lambda x:max(x))])
B C D SUM MAX SUM MAX SUM MAX A 1 7 4 14 8 7 5 2 6 3 10 9 12 8
If you need different aggregation calculations for different columns, you need to pass in the form of a dictionary.
# Different aggregation functions for different columns ('A').agg({'B':['sum','mean'],'C':'mean'})
B C sum mean mean A 1 7 3.5 7 2 6 3.0 5
You can rename the column names after the aggregation, note that theOnly valid when passing an aggregate function to a column.
。
# Rename column names after aggregation ('A').agg(B_sum=('B','sum'),C_mean=('C','mean'))
B_sum C_mean A 1 7 7 2 6 5
to this detailed article on python pandas grouping aggregation is introduced to this, more related python pandas grouping aggregation content please search for my previous articles or continue to browse the following related articles I hope you will support me in the future more!