1. dict into DataFrame
Depending on the form of the dict, choose a different transformation method, the main method used is DataFrame.from_dict, the official document is as follows:
-
.from_dict
- classmethod DataFrame.from_dict(data, orient=‘columns’, dtype=None, columns=None)
- Construct DataFrame from dict of array-like or dicts.
- Creates DataFrame object from dictionary by columns or by index allowing dtype specification.
-
Parameters
- data [dict] Of the form {field : array-like} or {field : dict}.
-
orient [{‘columns’, ‘index’}, default ‘columns’] The “orientation” of the data. If the
keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’. - dtype [dtype, default None] Data type to force, otherwise infer.
-
columns [list, default None] Column labels to use when orient=‘index’. Raises
a ValueError if used with orient=‘columns’.
-
Returns
- DataFrame
1.1 The value of a dict is a non-iteratable object
1. from_dict
If you use from_dict, you must set orient='index', or you will get an error, which means that the key of the dict cannot be used for columns.
dic = {'name': 'abc', 'age': 18, 'job': 'teacher'} df = .from_dict(dic, orient='index') print(df)
Out:
0
name abc
age 18
job teacher
2. Land conversion
The dict is first converted to a Series, then the Series is converted to a Dataframe, then the indexes are reset and the column names are renamed.
dic = {'name': 'abc', 'age': 18, 'job': 'teacher'} df = ((dic), columns=['value']) df = df.reset_index().rename(columns={'index': 'key'}) print(df)
Out:
key value
0 name abc
1 age 18
2 job teacher
1.2 The value of dict is list
1.2.1 When orient is not specified, the key value is used as the column name by default. (Column alignment)
dic = {'color': ['blue', 'green', 'orange', 'yellow'], 'size': [15, 20, 20, 25]} df = .from_dict(dic) print(df)
Out:
color size
0 blue 15
1 green 20
2 orange 20
3 yellow 25
1.2.2 When orient='index' is specified, the key value is used as the row name. (Row alignment)
dic = {'color': ['blue', 'green', 'orange', 'yellow'], 'size': [15, 20, 20, 25]} df = .from_dict(dic, orient='index', columns=list('ABCD')) print(df)
Out:
A B C D
color blue green orange yellow
size 15 20 20 25
summarize:
orient specifies what, and the key of the dict is used as what.。
If orient='index', then the key of the dict is used as the row index.
1.3 The value of dict is dict
1.3.1 Using the default orient attribute, the key will be used as columns.
dic = {'Jack': {'hobby': 'football', 'age': 19}, 'Tom': {'hobby': 'basketball', 'age': 24}, 'Lucy': {'hobby': 'swimming', 'age': 20}, 'Lily': {'age': 21}} df = .from_dict(dic) print(df)
Out:
Jack Tom Lucy Lily
age 19 24 20 21.0
hobby football basketball swimming NaN
This is the use of dict nested dict writing, the outer dict key for columns, values within the dict keys for the name of the rows, the default value of NAN
1.3.2 When orient='index' is specified, the internal key is columns and the external key is index
When modifying the default value of orient 'columns' to 'index', the internal key is the columns of the DataFrame and the external key is the index of the DataFrame
dic = {'Jack': {'hobby': 'football', 'age': 19}, 'Tom': {'hobby': 'basketball', 'age': 24}, 'Lucy': {'hobby': 'swimming', 'age': 20}, 'Lily': {'age': 21}} df = .from_dict(dic, orient='index') print(df)
Out:
hobby age
Jack football 19
Lily NaN 21
Lucy swimming 20
Tom basketball 24
take note of:
At that time, when using dict nested dict, after setting orient='index', you can no longer name the columns, at this time, if you set the columns, you will only filter out the columns that already exist in the original DataFrame.
dic = {'Jack': {'hobby': 'football', 'age': 19}, 'Tom': {'hobby': 'basketball', 'age': 24}, 'Lucy': {'hobby': 'swimming', 'age': 20}, 'Lily': {'age': 21}} df = .from_dict(dic, orient='index', columns=['age', 'A']) print(df)
Out:
age A
Jack 19 NaN
Lily 21 NaN
Lucy 20 NaN
Tom 24 NaN
Convert to dict
DataFrame.to_dict official documentation:
-
.to_dict
- DataFrame.to_dict(orient=‘dict’, into=<class ‘dict’>)
- Convert the DataFrame to a dictionary.
- The type of the key-value pairs can be customized with the parameters (see below).
-
Parameters
-
orient [str {‘dict’, ‘list’, ‘series’, ‘split’, ‘records’, ‘index’}] Determines the type of the
values of the dictionary.
• ‘dict’ (default) : dict like {column -> {index -> value}}
• ‘list’ : dict like {column -> [values]}
• ‘series’ : dict like {column -> Series(values)}
• ‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}
• ‘records’ : list like [{column -> value}, . . . , {column -> value}]
• ‘index’ : dict like {index -> {column -> value}}
Abbreviations are allowed. s indicates series and sp indicates split. -
into [class, default dict] The subclass used for all Mappings
in the return value. Can be the actual class or an empty instance of the mapping
type you want. If you want a , you must pass it initialized.
-
orient [str {‘dict’, ‘list’, ‘series’, ‘split’, ‘records’, ‘index’}] Determines the type of the
- Returnsdict, list or Return a object representing the DataFrame. The resulting transformation depends on the orient parameter.
-
Function type only need to fill in a parameter: orient that is, but for the different write orient, the dictionary is constructed in different ways, the official website gives a total of 6 kinds, and one of them is a list type:
- orient = 'dict', which is the function default, transformed dictionary form: {column(column name) : {index(row name) : value(value) )}};
- orient = 'list', transformed dictionary form: {column(column name) :{ values }};
- orient='series', transformed dictionary form: {column (column name) : Series (values) (values)};
- orient = 'split', transformed dictionary form: {'index' : [index], 'columns' : [ columns], 'data' : [values]};
- orient = 'records', which is transformed into list form: [{column(column(name) : value(value)}...{column:value}];
- orient = 'index', transformed dictionary form: {index(value) : {column(column(column name) : value(value)}};
Note: In the above, value represents the value in the data table, column represents the column name, index represents the row name.
df = ({'col_1': [5, 6, 7], 'col_2': [0.35, 0.96, 0.55]}, index=['row1', 'row2', 'row3']) print(df)
Out:
col_1 col_2
row1 5 0.35
row2 6 0.96
row3 7 0.55
2.1 orient =‘list’
{column(column name) : { values }}.
Generate a list of dicts in which the key is the name of each column and the value is the corresponding value of each column.
df = df.to_dict(orient='list') print(df)
Out:
{'col_1': [5, 6, 7], 'col_2': [0.35, 0.96, 0.55]}
2.2 orient =‘dict’
{column(column) : {index(row) : value)}}
df = df.to_dict(orient='dict') print(df)
Out:
{'col_1': {'row1': 5, 'row2': 6, 'row3': 7}, 'col_2': {'row1': 0.35, 'row2': 0.96, 'row3': 0.55}}
2.3 orient =‘series’
{column (column name) : Series (values)}.
The only difference between orient = 'series' and orient = 'list' is that the value here is of type Series, whereas the former is a list.
df = df.to_dict(orient='series') print(df)
Out:
{'col_1': row1 5
row2 6
row3 7
Name: col_1, dtype: int64, 'col_2': row1 0.35
row2 0.96
row3 0.55
Name: col_2, dtype: float64}
2.4 orient =‘split’
{'index' : [index], 'columns' : [columns], 'data' : [values]}; orient ='split' gets three key-value pairs, one each for columns, rows, and values, and values are uniformly in list form;
df = df.to_dict(orient='split') print(df)
Out:
{'index': ['row1', 'row2', 'row3'], 'columns': ['col_1', 'col_2'], 'data': [[5, 0.35], [6, 0.96], [7, 0.55]]}
2.5 orient =‘records’
[{column:value(value)},{column:value}...{column:value}]; note that orient ='records' returns a datatype not in the form of a dict; but a list. which is a one-to-one mapping of all the column names to the values in each row: the
df = df.to_dict(orient='records') print(df)
Out:
[{'col_1': 5, 'col_2': 0.35}, {'col_1': 6, 'col_2': 0.96}, {'col_1': 7, 'col_2': 0.55}]
The advantage of this construction is that it's easy to get a dictionary of column names and values for a particular row; for example, I'd like to have the data for row 1 {column:value}:
print(df.to_dict('records')[1])
Out:
{'col_1': 6, 'col_2': 0.96}
2.6 orient =‘index’
{index:{culumn:value}};
orient = 'index' and orient = 'dict' are used in the opposite way, to find the one-to-one correspondence between column names and values in a row (the query effect is similar to orient = 'records ' similar to the query effect of orient = 'records'):
print(df.to_dict('index'))
Out:
{'row1': {'col_1': 5, 'col_2': 0.35}, 'row2': {'col_1': 6, 'col_2': 0.96}, 'row3': {'col_1': 7, 'col_2': 0.55}}
Query row name row1 Column name and value one-to-one correspondence dictionary data type
print(df.to_dict('index')['row1'])
Out:
{'col_1': 5, 'col_2': 0.35}
to this article on the realization of pandas DataFrame dictionary transfer to this article, more related pandas DataFrame dictionary transfer content, please search my previous posts or continue to browse the following related articles I hope you will support me in the future more!