How Pandas splits columns with string elements into multiple columns.
Methods that use the following strings.
- (): split by delimiter
- (): split by regular expression
String methods are methods.
Applies to or columns
(): split by delimiter
To split by delimiter, use the string method ().
Take the following as an example.
import pandas as pd s_org = (['aaa@', 'bbb@', 'ccc@', 'ddd'], index=['A', 'B', 'C', 'D']) print(s_org) print(type(s_org)) # A aaa@ # B bbb@ # C ccc@ # D ddd # dtype: object # <class ''>
Specifies the delimiter as the first argument. An element is returned as a list of split strings.
s = s_org.('@') print(s) print(type(s)) # A [aaa, ] # B [bbb, ] # C [ccc, ] # D [ddd] # dtype: object # <class ''>
Specifying split = True as a parameter can be split into multiple columns and fetched as. The default value is expand = False.
Elements that do not have enough row divisions are "None".
df = s_org.('@', expand=True) print(df) print(type(df)) # 0 1 # A aaa # B bbb # C ccc # D ddd None # <class ''>
You can specify the name of the fetched column in the column.
= ['local', 'domain'] print(df) # local domain # A aaa # B bbb # C ccc # D ddd None
It would be a bit tedious to update a specific column by splitting it into multiple columns. There may be a better way.
Take the previously created one as an example.
print(df) # local domain # A aaa # B bbb # C ccc # D ddd None
Use () on a specific column to get a split.
print(df['domain'].('.', expand=True)) # 0 1 # A xxx com # B yyy com # C zzz com # D None None
Use () to concatenate (join) with the original and use the drop() method to delete the original column.
df2 = ([df, df['domain'].('.', expand=True)], axis=1).drop('domain', axis=1) print(df2) # local 0 1 # A aaa xxx com # B bbb yyy com # C ccc zzz com # D ddd None None
If there are very few columns remaining, only the columns required when connected (coupled) in series with ( ) can be selected.
df3 = ([df['local'], df['domain'].('.', expand=True)], axis=1) print(df3) # local 0 1 # A aaa xxx com # B bbb yyy com # C ccc zzz com # D ddd None None
To rename a specific column, use the rename() method.
(columns={0: 'second_LD', 1: 'TLD'}, inplace=True) print(df3) # local second_LD TLD # A aaa xxx com # B bbb yyy com # C ccc zzz com # D ddd None None
reference article
Modification of row and column names of the
(): split by regular expression
Use the string method () to split regular expressions.
Take the following as an example.
import pandas as pd s_org = (['aaa@', 'bbb@', 'ccc@', 'ddd'], index=['A', 'B', 'C', 'D']) print(s_org) # A aaa@ # B bbb@ # C ccc@ # D ddd # dtype: object
Specify the regular expression in the first argument. For each string that partially matches the group enclosed in () in the regular expression, it is divided.
When extracting multiple groups, it will return regardless of the argument expand.
NaN if there is no match.
df = s_org.('(.+)@(.+)\.(.+)', expand=True) print(df) # 0 1 2 # A aaa xxx com # B bbb yyy com # C ccc zzz com # D NaN NaN NaN df = s_org.('(.+)@(.+)\.(.+)', expand=False) print(df) # 0 1 2 # A aaa xxx com # B bbb yyy com # C ccc zzz com # D NaN NaN NaN
If there is only one set, it returns when the argument expand = True, or if expand = False.
df_single = s_org.('(\w+)', expand=True) print(df_single) print(type(df_single)) # 0 # A aaa # B bbb # C ccc # D ddd # <class ''> s = s_org.('(\w+)', expand=False) print(s) print(type(s)) # A aaa # B bbb # C ccc # D ddd # dtype: object # <class ''>
Expand = False is the default in the current version 0.22.0, but expand = True will be the default in the future.
FutureWarning: currently extract(expand=None) means expand=False (return Index/Series/DataFrame)
but in a future version of pandas this will be changed to expand=True (return DataFrame)
If a named group (?P ...) is used for a regular expression pattern, the name will be the column name as is.
df_name = s_org.('(?P<local>.*)@(?P<second_LD>.*)\.(?P<TLD>.*)', expand=True) print(df_name) # local second_LD TLD # A aaa xxx com # B bbb yyy com # C ccc zzz com # D NaN NaN NaN
To update a specific column by dividing it into multiple columns, see the () example above. Use () to join (concatenate) the original and use the drop() method to remove the original column.
to this article on the use of Pandas delimiter or regular expression will be split into multiple columns of the string is introduced to this article, more related to Pandas string split into multiple columns of content, please search for my previous posts or continue to browse the following related articles I hope that you will support me in the future more!