SoFunction
Updated on 2024-11-21

DataFrame data merge merge and concat methods in python details

merge()

1. Routine consolidation

①Method 1

Specify a reference column against which to merge other columns.

import pandas as pd

df1 = ({'id': ['001', '002', '003'],
                    'num1': [120, 101, 104],
                    'num2': [110, 102, 121],
                    'num3': [105, 120, 113]})
df2 = ({'id': ['001', '002', '003'],
                    'num4': [80, 86, 79]})
print(df1)
print("=======================================")
print(df2)
print("=======================================")
df_merge = (df1, df2, on='id')
print(df_merge)

②Method 2

To achieve this merge, you can also merge by index, i.e. by index column. Set both left_index and right_index to True
(left_index and right_index are both False by default. (Both left_index and right_index default to False. left_index means that the left table is based on the index of the data in the left table, and right_index means that the right table is based on the index of the data in the right table.)

import pandas as pd

df1 = ({'id': ['001', '002', '003'],
                    'num1': [120, 101, 104],
                    'num2': [110, 102, 121],
                    'num3': [105, 120, 113]})
df2 = ({'id': ['001', '002', '003'],
                    'num4': [80, 86, 79]})
print(df1)
print("=======================================")
print(df2)
print("=======================================")

df_merge = (df1, df2, left_index=True, right_index=True)
print(df_merge)

Compared to method ①, the difference is that, as shown in the figure, there are duplicate columns in the data merged by method ②.

important parameter

(right,how=‘inner’, on=“None”, left_on=“None”, right_on=“None”, left_index=False, right_index=False )

parameters descriptive
left Left Table, Merge Object, DataFrame or Series
right Right Table, Merge Object, DataFrame or Series
how Merge method, can be left (left merge), right (right merge), outer (outer merge), inner (inner merge)
on Column name of the base column
left_on Left table base column column names
right_on Right-hand table base column column names
left_index Whether the left column is index-based, default False, no
right_index Whether the right column is index-based, default False, no

Among them, left_index and right_index cannot be specified at the same time as on.

Merge method left right outer inner

Prepare data'

New data set prepared.

import pandas as pd

df1 = ({'id': ['001', '002', '003'],
                    'num1': [120, 101, 104],
                    'num2': [110, 102, 121],
                    'num3': [105, 120, 113]})
df2 = ({'id': ['001', '004', '003'],
                    'num4': [80, 86, 79]})
print(df1)
print("=======================================")
print(df2)
print("=======================================")

inner (default)

Use the intersection of keys from two datasets

df_merge = (df1, df2, on='id')
print(df_merge)

outer

Using a concatenation of keys from two datasets

df_merge = (df1, df2, on='id', how="outer")
print(df_merge)

left

Use keys from the left dataset

df_merge = (df1, df2, on='id', how='left')
print(df_merge)

right

Use keys from the right dataset

df_merge = (df1, df2, on='id', how='right')
print(df_merge)

2. Many-to-one merger

import pandas as pd

df1 = ({'id': ['001', '002', '003'],
                    'num1': [120, 101, 104],
                    'num2': [110, 102, 121],
                    'num3': [105, 120, 113]})
df2 = ({'id': ['001', '001', '003'],
                    'num4': [80, 86, 79]})
print(df1)
print("=======================================")
print(df2)
print("=======================================")

As shown in the figure, df2 has data with duplicate id1.

incorporation

df_merge = (df1, df2, on='id')
print(df_merge)

The merged result is shown in the figure:

Still follows the default Inner approach, using the intersection of keys from both datasets. And rows with duplicate keys will be reflected as multiple rows in the merge result.

3. Many-to-many merger

For example, there are multiple rows with duplicate ids in both Chart 1 and Table 2.

import pandas as pd
df1 = ({'id': ['001', '002', '002', '002', '003'],
                    'num1': [120, 101, 104, 114, 123],
                    'num2': [110, 102, 121, 113, 126],
                    'num3': [105, 120, 113, 124, 128]})
df2 = ({'id': ['001', '001', '002', '003', '001'],
                    'num4': [80, 86, 79, 88, 93]})
print(df1)
print("=======================================")
print(df2)
print("=======================================")

df_merge = (df1, df2, on='id')
print(df_merge)

concat()

(objs, axis=0, join=‘outer’, ignore_index:bool=False,keys=None,levels=None,names=None, verify_integrity:bool=False,sort:bool=False,copy:bool=True)

parameters descriptive
objs Series or mapping of Series, DataFrame or Panel objects
axis Defaults to 0 for columns. A value of 1 indicates a row.
join Defaults to "outer", but can also be "inner".
ignore_index The default is False, which means the index is preserved (not ignored). Set to True to ignore indexes.

Other important parameters are illustrated by examples.

1. The first table with the same field is connected to the first table

First prepare three sets of DataFrame data:

import pandas as pd
df1 = ({'id': ['001', '002', '003'],
                    'num1': [120, 114, 123],
                    'num2': [110, 102, 121],
                    'num3': [113, 124, 128]})
df2 = ({'id': ['004', '005'],
                    'num1': [120, 101],
                    'num2': [113, 126],
                    'num3': [105, 128]})
df3 = ({'id': ['007', '008', '009'],
                    'num1': [120, 101, 125],
                    'num2': [113, 126, 163],
                    'num3': [105, 128, 114]})


print(df1)
print("=======================================")
print(df2)
print("=======================================")
print(df3)

incorporation

dfs = [df1, df2, df3]
result = (dfs)
print(result)

If you want to mark which tables or categories of data you want to merge from, you can also add the concatParameter keys

result = (dfs, keys=['table1', 'table2', 'table3'])
print(result)

At this point, the added keys form a tuple with the original index and together become the new index.

print()

2. Horizontal table consolidation (row alignment)

Prepare two sets of DataFrame data:

import pandas as pd
df1 = ({'num1': [120, 114, 123],
                    'num2': [110, 102, 121],
                    'num3': [113, 124, 128]}, index=['001', '002', '003'])
df2 = ({'num3': [117, 120, 101, 126],
                    'num5': [113, 125, 126, 133],
                    'num6': [105, 130, 128, 128]}, index=['002', '003', '004', '005'])

print(df1)
print("=======================================")
print(df2)

When axis is the default value of 0:

result = ([df1, df2])
print(result)

Horizontal merge requires axis to be set to 1

result = ([df1, df2], axis=1)
print(result)

Compare the output differences above.

  • When axis=0, i.e. the default vertical merge, if there are duplicate rows, they will be reflected in the result at the same time.
  • When axis=1, i.e. when merging horizontally, if there are duplicate columns, they will be reflected in the results at the same time.

3. Cross-merger

result = ([df1, df2], axis=1, join='inner')
print(result)

summarize

To this DataFrame data merge merge() and concat() method in python article is introduced to this, more related python data merge merge() and concat() method content, please search for my previous articles or continue to browse the following related articles I hope that you will support me in the future more!