merge()
1. Routine consolidation
①Method 1
Specify a reference column against which to merge other columns.
import pandas as pd df1 = ({'id': ['001', '002', '003'], 'num1': [120, 101, 104], 'num2': [110, 102, 121], 'num3': [105, 120, 113]}) df2 = ({'id': ['001', '002', '003'], 'num4': [80, 86, 79]}) print(df1) print("=======================================") print(df2) print("=======================================") df_merge = (df1, df2, on='id') print(df_merge)
②Method 2
To achieve this merge, you can also merge by index, i.e. by index column. Set both left_index and right_index to True
(left_index and right_index are both False by default. (Both left_index and right_index default to False. left_index means that the left table is based on the index of the data in the left table, and right_index means that the right table is based on the index of the data in the right table.)
import pandas as pd df1 = ({'id': ['001', '002', '003'], 'num1': [120, 101, 104], 'num2': [110, 102, 121], 'num3': [105, 120, 113]}) df2 = ({'id': ['001', '002', '003'], 'num4': [80, 86, 79]}) print(df1) print("=======================================") print(df2) print("=======================================") df_merge = (df1, df2, left_index=True, right_index=True) print(df_merge)
Compared to method ①, the difference is that, as shown in the figure, there are duplicate columns in the data merged by method ②.
important parameter
(right,how=‘inner’, on=“None”, left_on=“None”, right_on=“None”, left_index=False, right_index=False )
parameters | descriptive |
---|---|
left | Left Table, Merge Object, DataFrame or Series |
right | Right Table, Merge Object, DataFrame or Series |
how | Merge method, can be left (left merge), right (right merge), outer (outer merge), inner (inner merge) |
on | Column name of the base column |
left_on | Left table base column column names |
right_on | Right-hand table base column column names |
left_index | Whether the left column is index-based, default False, no |
right_index | Whether the right column is index-based, default False, no |
Among them, left_index and right_index cannot be specified at the same time as on.
Merge method left right outer inner
Prepare data'
New data set prepared.
import pandas as pd df1 = ({'id': ['001', '002', '003'], 'num1': [120, 101, 104], 'num2': [110, 102, 121], 'num3': [105, 120, 113]}) df2 = ({'id': ['001', '004', '003'], 'num4': [80, 86, 79]}) print(df1) print("=======================================") print(df2) print("=======================================")
inner (default)
Use the intersection of keys from two datasets
df_merge = (df1, df2, on='id') print(df_merge)
outer
Using a concatenation of keys from two datasets
df_merge = (df1, df2, on='id', how="outer") print(df_merge)
left
Use keys from the left dataset
df_merge = (df1, df2, on='id', how='left') print(df_merge)
right
Use keys from the right dataset
df_merge = (df1, df2, on='id', how='right') print(df_merge)
2. Many-to-one merger
import pandas as pd df1 = ({'id': ['001', '002', '003'], 'num1': [120, 101, 104], 'num2': [110, 102, 121], 'num3': [105, 120, 113]}) df2 = ({'id': ['001', '001', '003'], 'num4': [80, 86, 79]}) print(df1) print("=======================================") print(df2) print("=======================================")
As shown in the figure, df2 has data with duplicate id1.
incorporation
df_merge = (df1, df2, on='id') print(df_merge)
The merged result is shown in the figure:
Still follows the default Inner approach, using the intersection of keys from both datasets. And rows with duplicate keys will be reflected as multiple rows in the merge result.
3. Many-to-many merger
For example, there are multiple rows with duplicate ids in both Chart 1 and Table 2.
import pandas as pd df1 = ({'id': ['001', '002', '002', '002', '003'], 'num1': [120, 101, 104, 114, 123], 'num2': [110, 102, 121, 113, 126], 'num3': [105, 120, 113, 124, 128]}) df2 = ({'id': ['001', '001', '002', '003', '001'], 'num4': [80, 86, 79, 88, 93]}) print(df1) print("=======================================") print(df2) print("=======================================")
df_merge = (df1, df2, on='id') print(df_merge)
concat()
(objs, axis=0, join=‘outer’, ignore_index:bool=False,keys=None,levels=None,names=None, verify_integrity:bool=False,sort:bool=False,copy:bool=True)
parameters | descriptive |
---|---|
objs | Series or mapping of Series, DataFrame or Panel objects |
axis | Defaults to 0 for columns. A value of 1 indicates a row. |
join | Defaults to "outer", but can also be "inner". |
ignore_index | The default is False, which means the index is preserved (not ignored). Set to True to ignore indexes. |
Other important parameters are illustrated by examples.
1. The first table with the same field is connected to the first table
First prepare three sets of DataFrame data:
import pandas as pd df1 = ({'id': ['001', '002', '003'], 'num1': [120, 114, 123], 'num2': [110, 102, 121], 'num3': [113, 124, 128]}) df2 = ({'id': ['004', '005'], 'num1': [120, 101], 'num2': [113, 126], 'num3': [105, 128]}) df3 = ({'id': ['007', '008', '009'], 'num1': [120, 101, 125], 'num2': [113, 126, 163], 'num3': [105, 128, 114]}) print(df1) print("=======================================") print(df2) print("=======================================") print(df3)
incorporation
dfs = [df1, df2, df3] result = (dfs) print(result)
If you want to mark which tables or categories of data you want to merge from, you can also add the concatParameter keys 。
result = (dfs, keys=['table1', 'table2', 'table3']) print(result)
At this point, the added keys form a tuple with the original index and together become the new index.
print()
2. Horizontal table consolidation (row alignment)
Prepare two sets of DataFrame data:
import pandas as pd df1 = ({'num1': [120, 114, 123], 'num2': [110, 102, 121], 'num3': [113, 124, 128]}, index=['001', '002', '003']) df2 = ({'num3': [117, 120, 101, 126], 'num5': [113, 125, 126, 133], 'num6': [105, 130, 128, 128]}, index=['002', '003', '004', '005']) print(df1) print("=======================================") print(df2)
When axis is the default value of 0:
result = ([df1, df2]) print(result)
Horizontal merge requires axis to be set to 1 :
result = ([df1, df2], axis=1) print(result)
Compare the output differences above.
- When axis=0, i.e. the default vertical merge, if there are duplicate rows, they will be reflected in the result at the same time.
- When axis=1, i.e. when merging horizontally, if there are duplicate columns, they will be reflected in the results at the same time.
3. Cross-merger
result = ([df1, df2], axis=1, join='inner') print(result)
summarize
To this DataFrame data merge merge() and concat() method in python article is introduced to this, more related python data merge merge() and concat() method content, please search for my previous articles or continue to browse the following related articles I hope that you will support me in the future more!