SoFunction
Updated on 2024-07-15

An article to understand the use of Pandas pivot of 4 functions

Hello, I'm Ding Xiaojie!

I'm sharing this with you today.PandasIn the four pivot-related general-purpose functions, in the data processing encountered in this type of demand, can be a good response.

()

meltThe main purpose of the function is to convert theDataFrameConvert from wide format to long format.

(frame,id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)

Parameter Meaning

  • id_vars: tuple, list, or ndarray, optionally, as a column of identifier variables
  • value_vars: tuple, list, or ndarray, optional, pivot columns, if not specified, use all columns not set to id_vars.
  • var_name: scalar, defaults to None, uses variable as column name
  • value_name: scalar, default 'value', name of column value
  • col_level: int or str, optional, if the column is a multilevel index, melt will be applied at the specified level
  • ignore_index: bool, defaults to True, which is equivalent to reordering from zero. If False, the original index will be retained and index labels will be duplicated.

Look at an example first:

import pandas as pd

df = (
    {'Area': ['A', 'B', 'C'],
     '2020': [80, 60, 40],
     '2021': [800, 600, 400], 
     '2022': [8000, 6000, 4000]})

(df,
        id_vars=['Area'],
        value_vars=['2020', '2021', '2022'])

set upvar_nametogether withvalue_name

df = (df,
             id_vars=['Area'],
             value_vars=['2020', '2021', '2022'],
             var_name='Year',
             value_name='Sales')

()

pivotfunction is mainly used to pass the index and column values to theDataFrameReconstruction.

(data, index=None, columns=None, values=None)

Parameter Meaning

  • data: DataFrame object
  • index: optional, used for indexing new DataFrames
  • columns: Columns used to create a new DataFrame
  • values: optional, used to populate the values of the new DataFrame

Use the results above as an example:

(index='Year',
         columns='Area',
         values='Sales')

It can also be written in the following format.

(index='Year', columns='Area')['Sales']

Add a sales column that counts bothvaluesThis will make thecolumnsinto a multi-level index.

df['Sales volume'] = df['Sales']/10
(index='Year',
         columns='Area',
         values=['Sales', 'Sales volume'])

Add a month column specifying twoindex

df['Month'] = [f'{m}moon' for m in range(1, 4)]*3
(index=['Year', 'Month'],
         columns='Area',
         values='Sales')

utilizationpivotIt is important to note that whenindexcolumnsWhen a duplicate occurs, theValueError

df = (
        {'Area': ['A', 'A', 'B', 'C'],
         'Year': ['2020', '2020', '2021', '2022'],
         'Sales': [800, 600, 400, 200]})

(index='Area',
         columns='Year',
         values='Sales')
# ValueError

pandas.pivot_table()

This function has been covered separately before, see thePandas Playing with Pivot Tablescompared withpivotpivot_tableof greater flexibility.

()

crosstabfunction computes a simple cross-tabulation of two (or more) arrays. By default computes a frequency table of elements.

(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=True, normalize=False)

Look at the example below:

The frequency is calculated here by default.

import numpy as np
array_A = (["one", "two", "two", "three", "three", "three"], dtype=object)
array_B = (["Python", "Python", "Python", "C", "C", "C"], dtype=object)
array_C = (["Y", "Y", "Y", "N", "N", "N"])
(array_A,
           [array_B, array_C],
           rownames=['array_A'],
           colnames=['array_B', 'array_C'])

Create a newvaluescolumns and calculate the sum.

array_D = ([1, 4, 9, 16, 25, 36])
(index=array_A,
            columns=[array_B, array_C],
            rownames=['array_A'],
            colnames=['array_B', 'array_C'],
            values=array_D,
            aggfunc='sum')

to this article about a paper to understand the use of Pandas pivot of the four functions of the article is introduced to this, more related Pandas pivot content, please search for my previous articles or continue to browse the following related articles I hope you will support me in the future!