Pandas uses the main table to update subtable columns

1. Preface

A small tip for working, use pandas to read the main table and subtable, use the designated columns of the main table, and update the designated columns of the subtable.

Case:

Main table:
uid name
0 101 Xiaobai
1 102 Xiaohong
2 103 Xiaolan

Sub-table:
name zb_uid
0 Xiaobai None
1 Xiaohong None
2 Little Green None

Requirements: When the name column data of the main table is the same as the name column data of the subtable name column data, the corresponding uid of the main table is assigned to the zb_uid of the subtable

2. Basic cases

1. Create the main table data

import pandas as pd

#Main representation datadata_sheet = {
    'uid': [101, 102, 103],
    'name': ['noob', 'Little Red', 'Little Blue'],
}
df_sheet = (data_sheet)
print('Main Table:')
print(df_sheet)

data_sheetis a dictionary where the keys'uid'Complied with a list containing three integers, indicating the user's unique identity; key'name'Corresponding to a list containing three strings, indicating the user's name.
(data_sheet)Bundledata_sheetDictionary conversion topandasofDataFrameObjectdf_sheet。DataFrameis a two-dimensional tabular data structure similar to a spreadsheet or SQL table.
Finally passedprintThe function prints out the contents of the main table.

2. Create a mapping dictionary

# Create a mapping dictionary, map name to uidname_uid_map = df_sheet.set_index('name')['uid'].to_dict()
print('Mapping Dictionary:',name_uid_map)

df_sheet.set_index('name')Bundledf_sheetof'name'Set the column to index, so that the corresponding row can be positioned by name.
['uid']Selectuidcolumn data.
to_dict()Convert the selected data into a dictionary. The key of the dictionary is the name and the value is the corresponding one.uid。
Finally passedprintThe function prints out this mapping dictionary.

3. Create subtable data

# Sub-expression datadata_sheet1 = {
    'name': ['noob', 'Little Red', 'Little Green'],
    'zb_uid': [None,None,None]
}
df_sheet1 = (data_sheet1)
print('Subtable:')
print(df_sheet1)

data_sheet1is a dictionary, key'name'Complied with a list containing three strings, indicating the user's name; key'zb_uid'The corresponding one contains threeNoneThe list of values, herezb_uidThe initial values are all empty and will be updated later.
(data_sheet1)Willdata_sheet1Dictionary conversion toDataFrameObjectdf_sheet1。
Finally passedprintThe function prints out the contents of the subtable.

4. Update the zb_uid column of the subtable

# Update the zb_uid column of the subtabledf_sheet1['zb_uid'] = df_sheet1['name'].map(name_uid_map).fillna(df_sheet1['zb_uid'])

df_sheet1['name'].map(name_uid_map)Will be based onname_uid_mapDictionary, willdf_sheet1middle'name'Each value of the column is mapped to the correspondinguid. if'name'The column's value isname_uid_mapIf it does not exist in the dictionary, it will be mapped asNaN。
fillna(df_sheet1['zb_uid'])Put the mapping resultNaNValue fordf_sheet1The original one of Zhongyuan'zb_uid'Column value padding. Here becausezb_uidThe initial value isNone,existpandas It will be considered as NaN, so the actual operation is to keep the originalNaNValue.
Finally assign the updated value todf_sheet1of'zb_uid'List.

5. Complete code

import pandas as pd


#Main representation datadata_sheet = {
    'uid': [101, 102, 103],
    'name': ['noob', 'Little Red', 'Little Blue'],
}
df_sheet = (data_sheet)
print('Main Table:')
print(df_sheet)
# Create a mapping dictionary, map name to uidname_uid_map = df_sheet.set_index('name')['uid'].to_dict()
print('Mapping Dictionary:',name_uid_map)


# Sub-expression datadata_sheet1 = {
    'name': ['noob', 'Little Red', 'Little Green'],
    'zb_uid': [None,None,None]
}
df_sheet1 = (data_sheet1)
print('Subtable:')
print(df_sheet1)

# Update the zb_uid column of the subtabledf_sheet1['zb_uid'] = df_sheet1['name'].map(name_uid_map).fillna(df_sheet1['zb_uid'])

print("Updated subtable:")
print(df_sheet1)

Running results:

Main table:
uid name
0 101 Xiaobai
1 102 Xiaohong
2 103 Xiaolan
Sub-table:
name zb_uid
0 Xiaobai None
1 Xiaohong None
2 Little Green None
Mapping dictionary: {'Xiaobai': 101, 'Xiaohong': 102, 'Xiaolan': 103}
Updated subtable:
name zb_uid
0 Xiaobai 101.0
1 Xiaohong 102.0
2 Little Green NaN

6. Summary

The main function of this code is based on the name anduidThe corresponding relationship of is updated in the subtablezb_uidThe value of the column. If the name in the subtable exists in the main table, use the correspondinguidFillzb_uid; If not, keepzb_uidis empty.

3. Upgrade cases

Based on the basic case, update the name fields of the main table and the subtable according to the mapping dictionary of names and pinyin:

import pandas as pd


def get_namePingyingMap():
	// The following method is to manually build a mapping dictionary. Of course, if you have a database, you can also read it from the database and then build a mapping dictionary.    name_pingying_map = {'noob':'xiaobai','Little Red':'xiaohong','Little Blue':'xiaolan','Little Green':'xiaolù'}
    print('name_pingying_map map dictionary:', name_pingying_map)

    return name_pingying_map


#Main representation datadata_sheet = {
    'uid': [101, 102, 103],
    'name': ['noob', 'Little Red', 'Little Blue'],
}
df_sheet = (data_sheet)
print('Main Table:')
print(df_sheet)
# Update the name column of the main tablename_pingying_map = get_namePingyingMap()
df_sheet['name'] = df_sheet['name'].map(name_pingying_map).fillna(df_sheet['name'])
print("Updated main table:")
print(df_sheet)
# Create a mapping dictionary, map name to uidname_uid_map = df_sheet.set_index('name')['uid'].to_dict()
print('name_uid_map map dictionary:',name_uid_map)

# Sub-expression datadata_sheet1 = {
    'name': ['noob', 'Little Red', 'Little Green'],
    'zb_uid': [None,None,None]
}
df_sheet1 = (data_sheet1)
print('Subtable:')
print(df_sheet1)

# Update the name column of the subtabledf_sheet1['name'] = df_sheet1['name'].map(name_pingying_map).fillna(df_sheet1['name'])
# Update the zb_uid column of the subtabledf_sheet1['zb_uid'] = df_sheet1['name'].map(name_uid_map).fillna(df_sheet1['zb_uid'])

print("Updated subtable:")
print(df_sheet1)

Running results:

Main table:
uid name
0 101 Xiaobai
1 102 Xiaohong
2 103 Xiaolan
name_pingying_map mapping dictionary: {'Xiaobai': 'xiaobai', 'Xiaohong': 'xiaohong', 'Xiaolan': 'xiaolan', 'Xiaogreen': 'xiaolù'}
Updated main table:
uid name
0 101 xiaobai
1 102 xiaohong
2 103 xiaolan
name_uid_map map dictionary: {'xiaobai': 101, 'xiaohong': 102, 'xiaolan': 103}
Sub-table:
name zb_uid
0 Xiaobai None
1 Xiaohong None
2 Little Green None
Updated subtable:
name zb_uid
0 xiaobai 101.0
1 xiaohong 102.0
2 xiaolù NaN

This is the article about Pandas' tips for updating sub-table columns using the main table. For more related Pandas to update sub-table columns, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!