Hello, I'm Ding Xiaojie!
I'm sharing this with you today.Pandas
In the four pivot-related general-purpose functions, in the data processing encountered in this type of demand, can be a good response.
()
melt
The main purpose of the function is to convert theDataFrame
Convert from wide format to long format.
“(frame,id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)
”
Parameter Meaning
-
id_vars
: tuple, list, or ndarray, optionally, as a column of identifier variables -
value_vars
: tuple, list, or ndarray, optional, pivot columns, if not specified, use all columns not set to id_vars. -
var_name
: scalar, defaults to None, uses variable as column name -
value_name
: scalar, default 'value', name of column value -
col_level
: int or str, optional, if the column is a multilevel index, melt will be applied at the specified level -
ignore_index
: bool, defaults to True, which is equivalent to reordering from zero. If False, the original index will be retained and index labels will be duplicated.
Look at an example first:
import pandas as pd df = ( {'Area': ['A', 'B', 'C'], '2020': [80, 60, 40], '2021': [800, 600, 400], '2022': [8000, 6000, 4000]})
(df, id_vars=['Area'], value_vars=['2020', '2021', '2022'])
set upvar_name
together withvalue_name
。
df = (df, id_vars=['Area'], value_vars=['2020', '2021', '2022'], var_name='Year', value_name='Sales')
()
pivot
function is mainly used to pass the index and column values to theDataFrame
Reconstruction.
“(data, index=None, columns=None, values=None)
”
Parameter Meaning
-
data
: DataFrame object -
index
: optional, used for indexing new DataFrames -
columns
: Columns used to create a new DataFrame -
values
: optional, used to populate the values of the new DataFrame
Use the results above as an example:
(index='Year', columns='Area', values='Sales')
It can also be written in the following format.
(index='Year', columns='Area')['Sales']
Add a sales column that counts bothvalues
This will make thecolumns
into a multi-level index.
df['Sales volume'] = df['Sales']/10 (index='Year', columns='Area', values=['Sales', 'Sales volume'])
Add a month column specifying twoindex
。
df['Month'] = [f'{m}moon' for m in range(1, 4)]*3 (index=['Year', 'Month'], columns='Area', values='Sales')
utilizationpivot
It is important to note that whenindex
,columns
When a duplicate occurs, theValueError
。
df = ( {'Area': ['A', 'A', 'B', 'C'], 'Year': ['2020', '2020', '2021', '2022'], 'Sales': [800, 600, 400, 200]})
(index='Area', columns='Year', values='Sales') # ValueError
pandas.pivot_table()
This function has been covered separately before, see thePandas Playing with Pivot Tablescompared withpivot
,pivot_table
of greater flexibility.
()
crosstab
function computes a simple cross-tabulation of two (or more) arrays. By default computes a frequency table of elements.
“(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=True, normalize=False)
”
Look at the example below:
The frequency is calculated here by default.
import numpy as np array_A = (["one", "two", "two", "three", "three", "three"], dtype=object) array_B = (["Python", "Python", "Python", "C", "C", "C"], dtype=object) array_C = (["Y", "Y", "Y", "N", "N", "N"]) (array_A, [array_B, array_C], rownames=['array_A'], colnames=['array_B', 'array_C'])
Create a newvalues
columns and calculate the sum.
array_D = ([1, 4, 9, 16, 25, 36]) (index=array_A, columns=[array_B, array_C], rownames=['array_A'], colnames=['array_B', 'array_C'], values=array_D, aggfunc='sum')
to this article about a paper to understand the use of Pandas pivot of the four functions of the article is introduced to this, more related Pandas pivot content, please search for my previous articles or continue to browse the following related articles I hope you will support me in the future!