I. Preface
pandas learned grouping iteration, then the basic pandas series is almost learned, self feel good, knowledge seekers have used pandas to deal with some data, quite good;
knowledge seeker(Inheriting the spirit of open source, Spreading technology knowledge;)
II. Grouping
2.1 Data preparation
# -*- coding: utf-8 -*- import pandas as pd import numpy as np frame = ({ 'user' : ['zszxz','craler','rose','zszxz','rose'], 'hobby' : ['reading','running','hiking','reading','hiking'], 'price' : (5), 'number' : (5) }) print(frame)
exports
user hobby price number
0 zszxz reading 0.275752 -0.075841
1 craler running -1.410682 0.259869
2 rose hiking -0.353269 -0.392659
3 zszxz reading 1.484604 0.659274
4 rose hiking -1.348315 2.492047
2.2 Grouping to find the mean
Extract the price column in the DataFrame, group the data according to the hobby column, and finally process the grouped data to find the mean;
# It's a generator group = frame['price'].groupby(frame['hobby']) # Find the mean print(())
exports
hobby
hiking -0.850792
reading 0.880178
running -1.410682
Name: price, dtype: float64
Tip: can be understood as according to the hobby group, query price; query column must be a number, otherwise the average value will report an exception!
If it is grouped according to multiple columns in the groupby after the use of a list of specified, and call the average function; the output value will be grouped columns, the average value of the results;
group = frame['price'].groupby([frame['hobby'],frame['user']]) print(())
exports
hobby user
hiking rose 0.063972
reading zszxz 0.393164
running craler -1.395186
Name: price, dtype: float64
If the entire DataFrame is grouped, it is no longer necessary to extract the specified columns;
group = (frame['hobby']) print(())
exports
hobby
hiking -0.116659 -0.316222
reading -0.651365 0.856299
running -0.282676 -0.585124
Tip: After averaging, the default is to group numeric data for averaging; non-numeric columns are automatically ignored.
2.3 Grouping quantities
Grouping for the number of statistical analysis is the most widely used function; the following example of the DataFrame according to hobby grouping, and call size () function statistics; this method is commonly used statistical techniques;
group = (frame['hobby']) print(())
exports
hobby
hiking 2
reading 2
running 1
dtype: int64
2.4 Grouping Iterations
When there is only a single column for groupby (the example is grouped according to hobby), you can iterate over the grouped data using the key , value form, where key is the name of the group and value is the grouped data;
group = frame['price'].groupby(frame['hobby']) for key , data in group: print(key) print(data)
exports
hiking
2 -0.669410
4 -0.246816
Name: price, dtype: float64
reading
0 1.362191
3 -0.052538
Name: price, dtype: float64
running
1 0.8963
Name: price, dtype: float64
When iterating over multiple columns, you need to specify as many keys as there are columns, which can be any non-repeating variable name.
group = frame['price'].groupby([frame['hobby'],frame['user']]) for (key1, key2) , data in group: print(key1,key2) print(data)
exports
hiking rose
2 -0.019423
4 -2.642912
Name: price, dtype: float64
reading zszxz
0 0.405016
3 0.422182
Name: price, dtype: float64
running craler
1 -0.724752
Name: price, dtype: float64
2.5 Grouping data into dictionaries
The grouped data can be converted to a dictionary;
dic = dict(list((frame['hobby']))) print(dic)
exports
{'hiking': user hobby price number
2 rose hiking 0.351633 0.523272
4 rose hiking 0.800039 0.331646,
'reading': user hobby price number
0 zszxz reading -0.074857 -0.928798
3 zszxz reading 0.666925 0.606706,
'running': user hobby price number
1 craler running -2.525633 0.895776}
Get key
print(dic['hiking'])
exports
user hobby price number
2 rose hiking 0.382225 -0.242055
4 rose hiking 1.055785 -0.328943
2.6 Grouping values
Groups frames into hobby groups, even if the query price is averaged; returns Series;
mean = ('hobby')['price'].mean() print(type(mean)) print(mean)
exports
<class ''>
hobby
hiking 0.973211
reading -1.393790
running -0.286236
Name: price, dtype: float64
Tip: ('hobby')['price'] is equal to frame['price'] .groupby(frame['hobby'])
If you want to return the DataFrame
mean = ('hobby')[['price']].mean() print(type(mean)) print(mean)
exports
<class ''>
price
hobby
hiking 0.973211
reading -1.393790
running -0.286236
2.5 Series as a Grouping
You can also pass in Series as a grouping column for the DataFrame
ser = (['hiking','reading','running']) data = (ser).mean() print(data)
exports
price number
hiking 1.233396 0.313839
reading -0.298887 0.982853
running -0.797734 -1.230811
Tip: These are essentially arrays, and in addition to Series, you can use dictionaries, lists, arrays, and functions as grouping columns.
2.6 Grouping by Index Hierarchy
Hierarchical index grouping can be achieved by passing in the name of a level
# Create 2 columns and assign names columns = .from_arrays([['Python', 'Java', 'Python', 'Java', 'Python'], ['a', 'b', 'a', 'b', 'c']], names=['language', 'alpha']) frame = ((1, 10, (5, 5)), columns=columns) print(frame) # Grouping by language print((level='language', axis=1).sum()) # Grouping by index print((level='alpha', axis=1).sum())
The frame output is as follows
language Python Java Python Java Python
alpha a b a b c
0 9 9 7 4 5
1 3 4 7 6 6
2 6 6 3 9 1
3 1 1 8 5 2
4 6 5 9 5 4
The language groupings are as follows
language Java Python
0 13 21
1 10 16
2 15 10
3 6 11
4 10 19
The alpha groupings are as follows
alpha a b c
0 16 13 5
1 10 10 6
2 9 15 1
3 9 6 2
4 15 10 4
to this article on pandas grouping aggregation detailed article is introduced to this, more related pandas grouping aggregation content please search for my previous articles or continue to browse the following related articles I hope you will support me in the future more!