SoFunction
Updated on 2024-11-18

pandas group aggregation in detail

I. Preface

pandas learned grouping iteration, then the basic pandas series is almost learned, self feel good, knowledge seekers have used pandas to deal with some data, quite good;

knowledge seeker(Inheriting the spirit of open source, Spreading technology knowledge;)

II. Grouping

2.1 Data preparation

# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np

frame = ({
 'user' : ['zszxz','craler','rose','zszxz','rose'],
 'hobby' : ['reading','running','hiking','reading','hiking'],
 'price' : (5),
 'number' : (5)
})
print(frame)

exports

     user    hobby     price    number
0   zszxz  reading  0.275752 -0.075841
1  craler  running -1.410682  0.259869
2    rose   hiking -0.353269 -0.392659
3   zszxz  reading  1.484604  0.659274
4    rose   hiking -1.348315  2.492047

2.2 Grouping to find the mean

Extract the price column in the DataFrame, group the data according to the hobby column, and finally process the grouped data to find the mean;

# It's a generator
group = frame['price'].groupby(frame['hobby'])
# Find the mean
print(())

exports

hobby
hiking    -0.850792
reading    0.880178
running   -1.410682
Name: price, dtype: float64

Tip: can be understood as according to the hobby group, query price; query column must be a number, otherwise the average value will report an exception!

If it is grouped according to multiple columns in the groupby after the use of a list of specified, and call the average function; the output value will be grouped columns, the average value of the results;

group = frame['price'].groupby([frame['hobby'],frame['user']])
print(())

exports

hobby    user 
hiking   rose      0.063972
reading  zszxz     0.393164
running  craler   -1.395186
Name: price, dtype: float64

If the entire DataFrame is grouped, it is no longer necessary to extract the specified columns;

group = (frame['hobby'])
print(())

exports

hobby                     
hiking  -0.116659 -0.316222
reading -0.651365  0.856299
running -0.282676 -0.585124

Tip: After averaging, the default is to group numeric data for averaging; non-numeric columns are automatically ignored.

2.3 Grouping quantities

Grouping for the number of statistical analysis is the most widely used function; the following example of the DataFrame according to hobby grouping, and call size () function statistics; this method is commonly used statistical techniques;

group = (frame['hobby'])
print(())

exports

hobby
hiking     2
reading    2
running    1
dtype: int64

2.4 Grouping Iterations

When there is only a single column for groupby (the example is grouped according to hobby), you can iterate over the grouped data using the key , value form, where key is the name of the group and value is the grouped data;

group = frame['price'].groupby(frame['hobby'])
for key , data in group:
 print(key)
 print(data)

exports

hiking
2   -0.669410
4   -0.246816
Name: price, dtype: float64
reading
0    1.362191
3   -0.052538
Name: price, dtype: float64
running
1    0.8963
Name: price, dtype: float64

When iterating over multiple columns, you need to specify as many keys as there are columns, which can be any non-repeating variable name.

group = frame['price'].groupby([frame['hobby'],frame['user']])
for (key1, key2) , data in group:
 print(key1,key2)
 print(data)

exports

hiking rose
2   -0.019423
4   -2.642912
Name: price, dtype: float64
reading zszxz
0    0.405016
3    0.422182
Name: price, dtype: float64
running craler
1   -0.724752
Name: price, dtype: float64

2.5 Grouping data into dictionaries

The grouped data can be converted to a dictionary;

dic = dict(list((frame['hobby'])))
print(dic)

exports

{'hiking':    user   hobby     price    number
2  rose  hiking  0.351633  0.523272
4  rose  hiking  0.800039  0.331646,
'reading':     user    hobby     price    number
0  zszxz  reading -0.074857 -0.928798
3  zszxz  reading  0.666925  0.606706,
'running':      user    hobby     price    number
1  craler  running -2.525633  0.895776}

Get key

print(dic['hiking'])

exports

   user   hobby     price    number
2  rose  hiking  0.382225 -0.242055
4  rose  hiking  1.055785 -0.328943

2.6 Grouping values

Groups frames into hobby groups, even if the query price is averaged; returns Series;

mean = ('hobby')['price'].mean()
print(type(mean))
print(mean)

exports

<class ''>
hobby
hiking     0.973211
reading   -1.393790
running   -0.286236
Name: price, dtype: float64

Tip: ('hobby')['price'] is equal to frame['price'] .groupby(frame['hobby'])

If you want to return the DataFrame

mean = ('hobby')[['price']].mean()
print(type(mean))
print(mean)

exports

<class ''>
            price
hobby           
hiking   0.973211
reading -1.393790
running -0.286236

2.5 Series as a Grouping

You can also pass in Series as a grouping column for the DataFrame

ser = (['hiking','reading','running'])
data = (ser).mean()
print(data)

exports

            price    number
hiking   1.233396  0.313839
reading -0.298887  0.982853
running -0.797734 -1.230811

Tip: These are essentially arrays, and in addition to Series, you can use dictionaries, lists, arrays, and functions as grouping columns.

2.6 Grouping by Index Hierarchy

Hierarchical index grouping can be achieved by passing in the name of a level

# Create 2 columns and assign names
columns = .from_arrays([['Python', 'Java', 'Python', 'Java', 'Python'],
          ['a', 'b', 'a', 'b', 'c']], names=['language', 'alpha'])
frame = ((1, 10, (5, 5)), columns=columns)
print(frame)

# Grouping by language
print((level='language', axis=1).sum())
# Grouping by index
print((level='alpha', axis=1).sum())

The frame output is as follows

language Python Java Python Java Python
alpha         a    b      a    b      c
0             9    9      7    4      5
1             3    4      7    6      6
2             6    6      3    9      1
3             1    1      8    5      2
4             6    5      9    5      4

The language groupings are as follows

language  Java  Python
0           13      21
1           10      16
2           15      10
3            6      11
4           10      19

The alpha groupings are as follows

alpha   a   b  c
0      16  13  5
1      10  10  6
2       9  15  1
3       9   6  2
4      15  10  4

to this article on pandas grouping aggregation detailed article is introduced to this, more related pandas grouping aggregation content please search for my previous articles or continue to browse the following related articles I hope you will support me in the future more!