SoFunction
Updated on 2024-11-15

Explaining python pandas grouping statistics in detail

First of all, look at the application scenario for which this article is aimed: we have a dataset df, and now we want to count the number of occurrences of each element in a certain column. This in our previous article "how to draw a histogram" in the method has been introduced, the use of value_counts () can be achieved (see back to the article)

But now, let's consider another scenario, what if we want to count the number of times two of the column elements appear? Take a chestnut:


In the df dataset, if we want to count the occurrences of the elements in columns A and B, that is, get the following table.


As can be seen from the last column above, in columns A and B, 1 2 occurs twice, 1 4 occurs once, 1 6 occurs once, 2 3 occurs twice, 2 4 occurs once, and 3 1 occurs once.

Code for specific implementations:

import pandas as pd
df=([[1,2,2],[1,4,5],[1,2,4],[1,6,3],[2,3,1],[2,4,1],[2,3,5],[3,1,1]],columns=['A','B','C'])
gp=(by=['A','B'])
()

So, if you want to count more columns, just add the by parameter in groupby(), for example, counting 3 columns.

gp=(by=['A','B','C'])

Obtained by () can mulitiindex Series.

Below, the structure to be transformed into a DataFrame.

newdf=()
newdf.reset_index(name='times')

The name parameter is where we can add a new name for the last column, such as "times" in this case.

At this time newdf is already the type of DataFrame.

This is the whole content of this article.