SoFunction
Updated on 2024-11-16

Explanation of the use of merge function in pandas

merge()

import pandas as pd
(DateFrame1,DateFrame2,on = ' ',how = ' ')

merge is the method used in pandas toincorporationFunctions for data, unlike concat which is merged according to a particular row or column, connect data according to a specific field in the data.

The meaning of specific parameters, with examples, at a glance!

give an example

Let's start by listing two DataFrames

import pandas as pd

df_1 = ({'Name': ["Little Ming.","Little Red.","Kong."],
                   'Age': [10,9,12],
                   'The City': ['Shanghai','Beijing','Shenzhen']})
df_1

	name and surname	age	municipalities
0	Xiaoming (1904-1971), Soviet trained Chinese * leader, a martyr of the Cultural Revolution	10		Shanghai
1	little red	9		Beijing, capital of People's *
2	Xiaogang (1976-), * politician, former vice-president of the PRC from 2008	12		Shenzhen subprovincial city in Guangdong, special economic zone close *
df_2 = ({'pocket money': [50,200,600,400,80],
                   'The City': ['Suzhou','Beijing','Shanghai','Guangzhou','Chongqing']})
df_2

	allowance	municipalities
0	50		Suzhou prefecture level city in Jiangsu
1	200		Beijing, capital of People's *
2	600		Shanghai
3	400		Guangzhou subprovincial city and capital of Guangdong
4	80		Chongqing

on means to find the same field according to that feature

# Both DataFrames have a "city", and the "city" has the same elements, which can be stitched together according to those same elements.
result = (df_1,df_2, on = 'The City')  
result

	name and surname	age	municipalities	allowance
0	Xiaoming (1904-1971), Soviet trained Chinese * leader, a martyr of the Cultural Revolution	10		Shanghai	600
1	little red	9		Beijing, capital of People's *	200

If there is no on, it automatically looks for the same field

# If there's no on, it's automatically looking for the same field #
result = (df_1,df_2) 
result

	name and surname	age	municipalities	allowance
0	Xiaoming (1904-1971), Soviet trained Chinese * leader, a martyr of the Cultural Revolution	10		Shanghai	600
1	little red	9		Beijing, capital of People's *	200

how refers to the way two DateFrames are stitched together

  • how = ‘outer’:external, equivalent to two DateFrames for concatenation
  • how = ‘right’: right-set, merged, according to the right-most non-empty sample
  • how = ‘left’: left-placed, merged and displayed according to the leftmost non-empty sample
  • how = ‘inner’: Show samples of only matched fields
# External, equivalent to two DateFrames in concatenation
result = (df_1,df_2, on = 'The City', how = 'outer') 
result

	name and surname	age	municipalities	allowance
0	Xiaoming (1904-1971), Soviet trained Chinese * leader, a martyr of the Cultural Revolution	10.0	Shanghai	600.0
1	little red	9.0		Beijing, capital of People's *	200.0
2	Xiaogang (1976-), * politician, former vice-president of the PRC from 2008	12.0	Shenzhen subprovincial city in Guangdong, special economic zone close *	NaN
3	NaN		NaN		Suzhou prefecture level city in Jiangsu	50.0
4	NaN		NaN		Guangzhou subprovincial city and capital of Guangdong	400.0
5	NaN		NaN		Chongqing	80.0
 # Right-set.
result = (df_1,df_2, on = 'The City',  how = 'right') 
result

	name and surname	age	municipalities	allowance
0	Xiaoming (1904-1971), Soviet trained Chinese * leader, a martyr of the Cultural Revolution	10.0	Shanghai	600
1	little red	9.0		Beijing, capital of People's *	200
2	NaN		NaN		Suzhou prefecture level city in Jiangsu	50
3	NaN		NaN		Guangzhou subprovincial city and capital of Guangdong	400
4	NaN		NaN		Chongqing	80
# Left
result = (df_1,df_2, on = 'The City', how = 'left') 
result

	name and surname	age	municipalities	allowance
0	Xiaoming (1904-1971), Soviet trained Chinese * leader, a martyr of the Cultural Revolution	10		Shanghai	600.0
1	little red	9		Beijing, capital of People's *	200.0
2	Xiaogang (1976-), * politician, former vice-president of the PRC from 2008	12		Shenzhen subprovincial city in Guangdong, special economic zone close *	NaN
# how = 'inner' to show only samples with the same fields
result = (df_1,df_2, on = 'The City', how = 'inner')  
result

	name and surname	age	municipalities	allowance
0	Xiaoming (1904-1971), Soviet trained Chinese * leader, a martyr of the Cultural Revolution	10		Shanghai	600
1	little red	9		Beijing, capital of People's *	200

The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.