Pycharm Mouse over a function, CTRL+Q for a quick look at the documentation, and CTR+P to see the basic parameters.
apply(), applymap() and map()
apply() and applymap() are functions of DataFrame and map() is a function of Series.
The operation object of apply() is a row or column of data in a DataFrame. applymap() is every element of the DataFrame. map() is also every element in a Series.
apply() to batch process the contents of a dataframe, which is faster than looping. Such as (func,axis=0,.....) func: the definition of the function, axis = 0 for the operation of the column, = 1 for the operation of the row.
map() is no different from python's built-in ones, e.g. df['one'].map(sqrt).
import numpy as np from pandas import Series, DataFrame frame = DataFrame((4, 3), columns = list('bde'), index = ['Utah', 'Ohio', 'Texas', 'Oregon']) print frame print (frame) print f = lambda x: () - () print (f) print (f, axis = 1) def f(x): return Series([(), ()], index = ['min', 'max']) print (f) print print 'applymap and map' _format = lambda x: '%.2f' % x print (_format) print frame['e'].map(_format)
Groupby
Groupby is the most commonly used and effective grouping function in Pandas, with sum (), count (), mean () and other statistical functions.
The DataFrameGroupBy object returned by the groupby method doesn't actually contain the data, it records the intermediate data of df['key1']. When you apply a function or other aggregation operation to the grouped data, pandas then performs a quick chunking operation on the df based on the information recorded in the groupby object and returns the result.
df = DataFrame({'key1': ['a', 'a', 'b', 'b', 'a'], 'key2': ['one', 'two', 'one', 'two', 'one'], 'data1': (5), 'data2': (5)}) grouped = (df['key1']) print () (lambda x:'even' if x%2==0 else 'odd').mean() # Grouping by function
Aggregate agg()
For grouped a column (row) or multiple columns (rows, axis = 0/1), apply agg (func) can be applied to the grouped data after the func function. For example: grouped ['data1'].agg ('mean') is also grouped 'data1' column for the mean. Of course, you can also act on multiple columns (rows) and use multiple functions at the same time.
df = DataFrame({'key1': ['a', 'a', 'b', 'b', 'a'], 'key2': ['one', 'two', 'one', 'two', 'one'], 'data1': (5), 'data2': (5)}) grouped = ('key1') print ('mean') data1 data2 key1 a 0.749117 0.220249 b -0.567971 -0.126922
apply () and agg () functionally similar, apply () is often used to deal with different groups of missing data to fill and top N calculation, will produce a hierarchical index.
And agg can be passed multiple functions that act on different columns at the same time.
df = DataFrame({'key1': ['a', 'a', 'b', 'b', 'a'], 'key2': ['one', 'two', 'one', 'two', 'one'], 'data1': (5), 'data2': (5)}) grouped = ('key1') print (['sum','mean']) print () The same applies to #apply here, except that you can't pass in more than one, and the two functions are basically universal.
data1 data2
sum mean sum mean
key1
a 2.780273 0.926758 -1.561696 -0.520565
b -0.308320 -0.154160 -1.382162 -0.691081
data1 data2 key1 key2
key1
a 2.780273 -1.561696 aaa onetwoone
b -0.308320 -1.382162 bb onetwo
The functions of apply and agg are basically similar, but it is more convenient to use agg for multiple functions.
Apply itself has a high degree of freedom, and is useful if the grouping is not followed by an aggregation operation tightly followed by some observations.
print (lambda x: ()) data1 data2 key1 a count 3.000000 3.000000 mean -0.887893 -1.042878 std 0.777515 1.551220 min -1.429440 -2.277311 25% -1.333350 -1.913495 50% -1.237260 -1.549679 75% -0.617119 -0.425661 max 0.003021 0.698357 b count 2.000000 2.000000 mean -0.078983 0.106752 std 0.723929 0.064191 min -0.590879 0.061362 25% -0.334931 0.084057 50% -0.078983 0.106752 75% 0.176964 0.129447 max 0.432912 0.152142
In addition apply can change the dimension of the returned data.
/pandas-docs/stable/
There is also a pivot table pivot_table and a crosstab crosstab, but I haven't used them.
This is the whole content of this article.