SoFunction
Updated on 2024-11-18

python pandas grouping aggregation details

python pandas group aggregation

1. Environment

  • python3.9
  • win10 64bit
  • pandas==1.2.1

groupbymethod is the grouping method in pandas, which uses thegroupbymethod returns theDataFrameGroupByobjects, generally grouping operations are followed by aggregation operations.

2. Grouping

import pandas as pd
import numpy as np
pd.set_option('display.notebook_repr_html',False)
# Data preparation
df = ({'A': [1, 1, 2, 2],'B': [1, 2, 3, 4],'C':[6,8,1,9]})
df

      A  B  C
0  1  1  6
1  1  2  8
2  2  3  1
3  2  4  9

PressAThe columns are grouped to produce a grouped dataframe. Grouped dataframes are iterable objects that can be traversed in a loop, and you can see that in the loop, each element is of type tuple that

The first element of the tuple is the grouped value and the second element is the corresponding grouped data frame.

# Grouping
g_df=('A')
# Grouped dataframe classes
type(g_df)


# Circular grouping of data
for i in g_df:
    print(i,type(i),end='\n\n')

(1,    A  B  C
0  1  1  6
1  1  2  8) <class 'tuple'>

(2,    A  B  C
2  2  3  1
3  2  4  9) <class 'tuple'>

Aggregation methods can be used directly on grouped dataframesagg, calculates the value of the statistical function for each column of the grouped data frame.

# Summing in groups
('A').agg('sum')
   B   C
A       
1  3  14
2  7  10

3. Sequence grouping

Data boxes can be grouped according to the sequence data outside the data box, it should be noted that the length of the sequence needs to be the same as the number of rows in the data box.

# Define grouped lists
label=['a','a','b','b']
# Summing in groups
(label).agg('sum')
   A  B   C
a  2  3  14
b  4  7  10

4、Multi-column grouping

Dataframes can be grouped based on multiple columns of the dataframe.

# Data preparation
df = ({'A': [1, 1, 2, 2],'B': [3, 4, 3, 3],'C':[6,8,1,9]})
df


   A  B  C
0  1  3  6
1  1  4  8
2  2  3  1
3  2  3  9


according toA,BThe columns are grouped and then summed.

# Sums based on multiple columns
(['A','B']).agg('sum')
      C
A B    
1 3   6
  4   8
2 3  10

5. Index Grouping

Data frames can be grouped according to the index, and the level parameter needs to be set.

# Data preparation
df = ({'A': [1, 1, 2, 2],'B': [3, 4, 3, 3],'C':[6,8,1,9]},index=['a','a','b','b'])
df


   A  B  C
a  1  3  6
a  1  4  8
b  2  3  1
b  2  3  9


The data frame has only one level of indexing, setting the parameterlevel=0

# Summing by indexed groups
(level=0).agg('sum')
   A  B   C
a  2  7  14
b  4  6  10


When there are multiple levels of data frame indexes, the level parameter can also be set on demand to accomplish group aggregation.

# Data preparation
mi=.from_arrays([[1,1,2,2],[3,4,3,3]],names=['id1','id2'])
df=(dict(value=[4,7,2,9]),index=mi)
df


         value
id1 id2       
1   3        4
    4        7
2   3        2
    3        9


set uplevelparameter, if you need to group according to the first level index, i.e. id1, you can set thelevel=0maybelevel='id1'Complete the packet aggregation.

# Summed in groups based on the first level of indexing
(level=0).agg('sum')


     value
id1       
1       11
2       11


# Summed in groups based on the first level of indexing
(level='id1').agg('sum')
 

    value
id1       
1       11
2       11

7. Polymerization

The grouping is generally followed by an aggregation operation with theaggmethod for aggregation.

# Data preparation
df = ({'A': [1, 1, 2, 2],'B': [3, 4, 3, 3],'C':[6,8,1,9],'D':[2,5,4,8]})
df


   A  B  C  D
0  1  3  6  2
1  1  4  8  5
2  2  3  1  4
3  2  3  9  8

8, single function on multiple columns

Aggregation of the grouped dataframes is performed using a single function. The single aggregation function performs calculations on each column and then merges them back. The aggregation function is passed as a string.

# Sum all columns in groups
('A').agg('sum')


   B   C   D
A           
1  7  14   7
2  6  10  12


You can specify columns for grouped aggregation of grouped data. Requiredtake note ofSubcolumns need to be wrapped in [].

# Sum the specified columns in groups
('A')[['B','C']].agg('sum')


   B   C
A       
1  7  14
2  6  10


Aggregate functions can also be passed in custom anonymous functions.

# anonymous functions grouping and summing
('A').agg(lambda x:sum(x))


  B   C   D
A           
1  7  14   7
2  6  10  12

9, multi-function to multi-column

Aggregate functions can be multiple functions. When aggregating, multiple aggregation functions will perform calculations on each column and then merge them to return. Aggregate functions are passed in as a list.

# All columns multi-function aggregation
('A').agg(['sum','mean'])


    B        C        D     
  sum mean sum mean sum mean
A                           
1   7  3.5  14    7   7  3.5
2   6  3.0  10    5  12  6.0


The column names of the data returned by the aggregation have two levels of indexing, the first being the names of the columns that were aggregated, and the second being the names of the aggregation functions that were used. If you need to rename the returned aggregation function names, the
It is necessary to pass a tuple with the first element being the name of the aggregation function and the second element being the aggregation function when passing the parameter.

# Aggregate function renaming
('A').agg([('SUM','sum'),('MEAN','mean')])

    B        C        D     
  SUM MEAN SUM MEAN SUM MEAN
A                           
1   7  3.5  14    7   7  3.5
2   6  3.0  10    5  12  6.0


Similarly, anonymous functions can be passed in.

# anonymize functions and rename them
('A').agg([('SUM','sum'),('MAX',lambda x:max(x))])


    B       C       D    
  SUM MAX SUM MAX SUM MAX
A                        
1   7   4  14   8   7   5
2   6   3  10   9  12   8


If you need different aggregation calculations for different columns, you need to pass in the form of a dictionary.

# Different aggregation functions for different columns
('A').agg({'B':['sum','mean'],'C':'mean'})


    B         C
  sum mean mean
A              
1   7  3.5    7
2   6  3.0    5


You can rename the column names after the aggregation, note that theOnly valid when passing an aggregate function to a column.

# Rename column names after aggregation
('A').agg(B_sum=('B','sum'),C_mean=('C','mean'))


   B_sum  C_mean
A               
1      7       7
2      6       5

to this detailed article on python pandas grouping aggregation is introduced to this, more related python pandas grouping aggregation content please search for my previous articles or continue to browse the following related articles I hope you will support me in the future more!