SoFunction
Updated on 2024-11-18

Implementation of DataFrame dictionary interconversion in pandas

1. dict into DataFrame

Depending on the form of the dict, choose a different transformation method, the main method used is DataFrame.from_dict, the official document is as follows:

  • .from_dict
    • classmethod DataFrame.from_dict(data, orient=‘columns’, dtype=None, columns=None)
    • Construct DataFrame from dict of array-like or dicts.
    • Creates DataFrame object from dictionary by columns or by index allowing dtype specification.
    • Parameters
      • data [dict] Of the form {field : array-like} or {field : dict}.
      • orient [{‘columns’, ‘index’}, default ‘columns’] The “orientation” of the data. If the
        keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.
      • dtype [dtype, default None] Data type to force, otherwise infer.
      • columns [list, default None] Column labels to use when orient=‘index’. Raises
        a ValueError if used with orient=‘columns’.
    • Returns
      • DataFrame

1.1 The value of a dict is a non-iteratable object

1. from_dict

If you use from_dict, you must set orient='index', or you will get an error, which means that the key of the dict cannot be used for columns.

dic = {'name': 'abc', 'age': 18, 'job': 'teacher'}
df = .from_dict(dic, orient='index')
print(df)

Out:

            0
name      abc
age        18
job   teacher

2. Land conversion

The dict is first converted to a Series, then the Series is converted to a Dataframe, then the indexes are reset and the column names are renamed.

dic = {'name': 'abc', 'age': 18, 'job': 'teacher'}
df = ((dic), columns=['value'])
df = df.reset_index().rename(columns={'index': 'key'})
print(df)

Out:

    key    value
0  name      abc
1   age       18
2   job  teacher

1.2 The value of dict is list

1.2.1 When orient is not specified, the key value is used as the column name by default. (Column alignment)

dic = {'color': ['blue', 'green', 'orange', 'yellow'], 'size': [15, 20, 20, 25]}
df = .from_dict(dic)
print(df)

Out:

    color  size
0    blue    15
1   green    20
2  orange    20
3  yellow    25

1.2.2 When orient='index' is specified, the key value is used as the row name. (Row alignment)

dic = {'color': ['blue', 'green', 'orange', 'yellow'], 'size': [15, 20, 20, 25]}
df = .from_dict(dic, orient='index', columns=list('ABCD'))
print(df)

Out:

          A      B       C       D
color  blue  green  orange  yellow
size     15     20      20      25

summarize
orient specifies what, and the key of the dict is used as what.
If orient='index', then the key of the dict is used as the row index.

1.3 The value of dict is dict

1.3.1 Using the default orient attribute, the key will be used as columns.

dic = {'Jack': {'hobby': 'football', 'age': 19},
       'Tom': {'hobby': 'basketball', 'age': 24},
       'Lucy': {'hobby': 'swimming', 'age': 20},
       'Lily': {'age': 21}}
df = .from_dict(dic)
print(df)

Out:

           Jack         Tom      Lucy  Lily
age          19          24        20  21.0
hobby  football  basketball  swimming   NaN

This is the use of dict nested dict writing, the outer dict key for columns, values within the dict keys for the name of the rows, the default value of NAN

1.3.2 When orient='index' is specified, the internal key is columns and the external key is index

When modifying the default value of orient 'columns' to 'index', the internal key is the columns of the DataFrame and the external key is the index of the DataFrame

dic = {'Jack': {'hobby': 'football', 'age': 19},
       'Tom': {'hobby': 'basketball', 'age': 24},
       'Lucy': {'hobby': 'swimming', 'age': 20},
       'Lily': {'age': 21}}
df = .from_dict(dic, orient='index')
print(df)

Out:

           hobby  age
Jack    football   19
Lily         NaN   21
Lucy    swimming   20
Tom   basketball   24

take note of
At that time, when using dict nested dict, after setting orient='index', you can no longer name the columns, at this time, if you set the columns, you will only filter out the columns that already exist in the original DataFrame.

dic = {'Jack': {'hobby': 'football', 'age': 19},
       'Tom': {'hobby': 'basketball', 'age': 24},
       'Lucy': {'hobby': 'swimming', 'age': 20},
       'Lily': {'age': 21}}
df = .from_dict(dic, orient='index', columns=['age', 'A'])
print(df)

Out:

      age    A
Jack   19  NaN
Lily   21  NaN
Lucy   20  NaN
Tom    24  NaN

Convert to dict

DataFrame.to_dict official documentation:

  • .to_dict

    • DataFrame.to_dict(orient=‘dict’, into=<class ‘dict’>)
    • Convert the DataFrame to a dictionary.
    • The type of the key-value pairs can be customized with the parameters (see below).
    • Parameters
      • orient [str {‘dict’, ‘list’, ‘series’, ‘split’, ‘records’, ‘index’}] Determines the type of the
        values of the dictionary.
        • ‘dict’ (default) : dict like {column -> {index -> value}}
        • ‘list’ : dict like {column -> [values]}
        • ‘series’ : dict like {column -> Series(values)}
        • ‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}
        • ‘records’ : list like [{column -> value}, . . . , {column -> value}]
        • ‘index’ : dict like {index -> {column -> value}}
        Abbreviations are allowed. s indicates series and sp indicates split.
      • into [class, default dict] The subclass used for all Mappings
        in the return value. Can be the actual class or an empty instance of the mapping
        type you want. If you want a , you must pass it initialized.
    • Returnsdict, list or Return a object representing the DataFrame. The resulting transformation depends on the orient parameter.
  • Function type only need to fill in a parameter: orient that is, but for the different write orient, the dictionary is constructed in different ways, the official website gives a total of 6 kinds, and one of them is a list type:

    • orient = 'dict', which is the function default, transformed dictionary form: {column(column name) : {index(row name) : value(value) )}};
    • orient = 'list', transformed dictionary form: {column(column name) :{ values }};
    • orient='series', transformed dictionary form: {column (column name) : Series (values) (values)};
    • orient = 'split', transformed dictionary form: {'index' : [index], 'columns' : [ columns], 'data' : [values]};
    • orient = 'records', which is transformed into list form: [{column(column(name) : value(value)}...{column:value}];
    • orient = 'index', transformed dictionary form: {index(value) : {column(column(column name) : value(value)}};
  • Note: In the above, value represents the value in the data table, column represents the column name, index represents the row name.

df = ({'col_1': [5, 6, 7], 'col_2': [0.35, 0.96, 0.55]}, index=['row1', 'row2', 'row3'])
print(df)

Out:

      col_1  col_2
row1      5   0.35
row2      6   0.96
row3      7   0.55

2.1 orient =‘list’

{column(column name) : { values }}.
Generate a list of dicts in which the key is the name of each column and the value is the corresponding value of each column.

df = df.to_dict(orient='list')
print(df)

Out:

{'col_1': [5, 6, 7], 'col_2': [0.35, 0.96, 0.55]}

2.2 orient =‘dict’

{column(column) : {index(row) : value)}}

df = df.to_dict(orient='dict')
print(df)

Out:

{'col_1': {'row1': 5, 'row2': 6, 'row3': 7}, 'col_2': {'row1': 0.35, 'row2': 0.96, 'row3': 0.55}}

2.3 orient =‘series’

{column (column name) : Series (values)}.
The only difference between orient = 'series' and orient = 'list' is that the value here is of type Series, whereas the former is a list.

df = df.to_dict(orient='series')
print(df)

Out:

{'col_1': row1    5
row2    6
row3    7
Name: col_1, dtype: int64, 'col_2': row1    0.35
row2    0.96
row3    0.55
Name: col_2, dtype: float64}

2.4 orient =‘split’

{'index' : [index], 'columns' : [columns], 'data' : [values]}; orient ='split' gets three key-value pairs, one each for columns, rows, and values, and values are uniformly in list form;

df = df.to_dict(orient='split')
print(df)

Out:

{'index': ['row1', 'row2', 'row3'], 'columns': ['col_1', 'col_2'], 'data': [[5, 0.35], [6, 0.96], [7, 0.55]]}

2.5 orient =‘records’

[{column:value(value)},{column:value}...{column:value}]; note that orient ='records' returns a datatype not in the form of a dict; but a list. which is a one-to-one mapping of all the column names to the values in each row: the

df = df.to_dict(orient='records')
print(df)

Out:

[{'col_1': 5, 'col_2': 0.35}, {'col_1': 6, 'col_2': 0.96}, {'col_1': 7, 'col_2': 0.55}]

The advantage of this construction is that it's easy to get a dictionary of column names and values for a particular row; for example, I'd like to have the data for row 1 {column:value}:

print(df.to_dict('records')[1])

Out:

{'col_1': 6, 'col_2': 0.96}

2.6 orient =‘index’

{index:{culumn:value}};

orient = 'index' and orient = 'dict' are used in the opposite way, to find the one-to-one correspondence between column names and values in a row (the query effect is similar to orient = 'records ' similar to the query effect of orient = 'records'):

print(df.to_dict('index'))

Out:

{'row1': {'col_1': 5, 'col_2': 0.35}, 'row2': {'col_1': 6, 'col_2': 0.96}, 'row3': {'col_1': 7, 'col_2': 0.55}}

Query row name row1 Column name and value one-to-one correspondence dictionary data type

print(df.to_dict('index')['row1'])

Out:

{'col_1': 5, 'col_2': 0.35}

to this article on the realization of pandas DataFrame dictionary transfer to this article, more related pandas DataFrame dictionary transfer content, please search my previous posts or continue to browse the following related articles I hope you will support me in the future more!