SoFunction
Updated on 2024-11-10

How to Iterate over a DataFrame Using itertuples

Iterating over a DataFrame with itertuples

Recently in the recommended system practice when you need to generate items with the matrix and the user items matrix, found the DataFrame object traversal is very convenient function itertuples

The following are relevant.

  • iterrows() : Iterate DataFrame into (index ,series)
  • iteritems(): Iterate DataFrame into (column name, series)
  • itertuples(): Iterate DataFrame into tuples

Examples are shown below:

The getattr() function allows you to get the value specified in the tuple directly.

Here the corresponding value is accessed by listing the index

Here the corresponding value is retrieved via the Index index

Pandas - Dataframe row traversal several common methods performance analysis

pandas as python data analysis a big tool for the majority of data analysts to use. Unintentionally, I heard a beautiful colleague spit: dataframe is so slow ah! Uh-huh, instantly aroused my attention as a data person, go over and take a look, it turns out to be using the method itself is inefficient.

In our daily work, traversing data by rows is a very common scenario! Especially since I moved from sql boy to data analysis, I can't stop thinking about

 select * from table1;

A quick look at the data in general. There are a few main implementations of this operation in pandas:

1、iterrows()

The principle is to iterate the Dataframe into a Series and return the result. This process requires type checking, so, it will take a long time. (Not recommended)

for index, row in ():
     # Dictionary access
     print(index, row['c1'], row['c2'])

2、itertuples()

The principle is to iterate the Dataframe into tuples, and then return, due to the immutable nature of the tuple, this process does not require type checking. (High efficiency, recommended)

for row in ():
    # print(row)
    print(, , , )
    print(, getattr(row,'name'), getattr(row,'account'), getattr(row,'pwd'))

3、for + zip 

This method is to construct the native tuple directly and manually, without caring about the index data. (High efficiency, recommended)

for A, B in zip(df['A'], df['B']):
    print(A, B)

summarize

The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.