SoFunction
Updated on 2024-11-16

pandas data merge and reshape merge details

Data merging and reshaping have merge, join, concat three methods, this article first merge to explain the

Parameter overview

parameters present (sb for a job etc)
how Connection type (left connection left, right connection right, inner connection inner, outer connection outer) default inner connection
on The name of the column to be used as the join key (both tables must be the same)
left_on The name of the column in the left table that will be used as the join key
right_on The name of the column in the right table that will be used as the join key
left_index True means that the left table index column is used as the join key
right_index True means that the right table index column is used as the join key
suffix Suffix the columns of the two tables to differentiate them

2. Explanation of the concept of left-right internal and external connections

The left and right inner and outer connections in pandas are roughly the same as in mysql, which is easier to understand if you have learned mysql

There are now two tables: table 1 and table 2

(1) Left connection

Subject to table 1.

Merge the data in Table 2 that corresponds to Table 1 according to the connection key with Table 1, and discard the data that does not correspond to Table 1.

The data in Table 1 is fully preserved in this process.

(2) Right Connection

In contrast to the left connection, subject to Table 2

Merge the data in Table 1 that corresponds to Table 2 according to the connection key with Table 2, and discard the data that does not correspond to Table 2.

The data in Table 2 are fully retained in this process.

(3) Internal connection

Combine the data in Tables 1 and 2 that correspond to both sides according to the join key

In this process Table 1 retains only the data that can correspond to Table 2, and Table 2 retains only the data that can correspond to Table 1, somewhat similar to the intersection in mathematics.

(4) External connection

The data in Tables 1 and 2 are merged according to the linkage keys.

In this process, all the data in Table 1 and Table 2 are kept, which is equivalent to the parallel set in mathematics.

Data consolidation

First read the data from both tables

import pandas as pd
adress1="D:/pandas exercise file/"
adress2="D:/pandas exercise file/"
data1=pd.read_excel(adress1)
data2=pd.read_excel(adress2)

(1) When both tables have the same column name used as the join key (in the case of a left join)

A Usage:

(data1, data2, on=" ", how=" ")

all_data=(data1,data2,on="Name",how="left")

Comparison of B data before and after consolidation:

(2) When both tables have different column names used as join keys (default inner join in this case)

A Usage

(data1, data2, left_on="", right_on="")

all_data=(data1,data2,left_on="Name 1",right_on="Name 2")

Comparison of B data before and after consolidation

summarize

The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.