Data merging and reshaping have merge, join, concat three methods, this article first merge to explain the
Parameter overview
parameters | present (sb for a job etc) |
how | Connection type (left connection left, right connection right, inner connection inner, outer connection outer) default inner connection |
on | The name of the column to be used as the join key (both tables must be the same) |
left_on | The name of the column in the left table that will be used as the join key |
right_on | The name of the column in the right table that will be used as the join key |
left_index | True means that the left table index column is used as the join key |
right_index | True means that the right table index column is used as the join key |
suffix | Suffix the columns of the two tables to differentiate them |
2. Explanation of the concept of left-right internal and external connections
The left and right inner and outer connections in pandas are roughly the same as in mysql, which is easier to understand if you have learned mysql
There are now two tables: table 1 and table 2
(1) Left connection
Subject to table 1.
Merge the data in Table 2 that corresponds to Table 1 according to the connection key with Table 1, and discard the data that does not correspond to Table 1.
The data in Table 1 is fully preserved in this process.
(2) Right Connection
In contrast to the left connection, subject to Table 2
Merge the data in Table 1 that corresponds to Table 2 according to the connection key with Table 2, and discard the data that does not correspond to Table 2.
The data in Table 2 are fully retained in this process.
(3) Internal connection
Combine the data in Tables 1 and 2 that correspond to both sides according to the join key
In this process Table 1 retains only the data that can correspond to Table 2, and Table 2 retains only the data that can correspond to Table 1, somewhat similar to the intersection in mathematics.
(4) External connection
The data in Tables 1 and 2 are merged according to the linkage keys.
In this process, all the data in Table 1 and Table 2 are kept, which is equivalent to the parallel set in mathematics.
Data consolidation
First read the data from both tables
import pandas as pd adress1="D:/pandas exercise file/" adress2="D:/pandas exercise file/" data1=pd.read_excel(adress1) data2=pd.read_excel(adress2)
(1) When both tables have the same column name used as the join key (in the case of a left join)
A Usage:
(data1, data2, on=" ", how=" ")
all_data=(data1,data2,on="Name",how="left")
Comparison of B data before and after consolidation:
(2) When both tables have different column names used as join keys (default inner join in this case)
A Usage
(data1, data2, left_on="", right_on="")
all_data=(data1,data2,left_on="Name 1",right_on="Name 2")
Comparison of B data before and after consolidation
summarize
The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.