Iterating over a DataFrame with itertuples
Recently in the recommended system practice when you need to generate items with the matrix and the user items matrix, found the DataFrame object traversal is very convenient function itertuples
The following are relevant.
-
iterrows()
: Iterate DataFrame into (index ,series) -
iteritems()
: Iterate DataFrame into (column name, series) -
itertuples()
: Iterate DataFrame into tuples
Examples are shown below:
The getattr() function allows you to get the value specified in the tuple directly.
Here the corresponding value is accessed by listing the index
Here the corresponding value is retrieved via the Index index
Pandas - Dataframe row traversal several common methods performance analysis
pandas as python data analysis a big tool for the majority of data analysts to use. Unintentionally, I heard a beautiful colleague spit: dataframe is so slow ah! Uh-huh, instantly aroused my attention as a data person, go over and take a look, it turns out to be using the method itself is inefficient.
In our daily work, traversing data by rows is a very common scenario! Especially since I moved from sql boy to data analysis, I can't stop thinking about
select * from table1;
A quick look at the data in general. There are a few main implementations of this operation in pandas:
1、iterrows()
The principle is to iterate the Dataframe into a Series and return the result. This process requires type checking, so, it will take a long time. (Not recommended)
for index, row in (): # Dictionary access print(index, row['c1'], row['c2'])
2、itertuples()
The principle is to iterate the Dataframe into tuples, and then return, due to the immutable nature of the tuple, this process does not require type checking. (High efficiency, recommended)
for row in (): # print(row) print(, , , ) print(, getattr(row,'name'), getattr(row,'account'), getattr(row,'pwd'))
3、for + zip
This method is to construct the native tuple directly and manually, without caring about the index data. (High efficiency, recommended)
for A, B in zip(df['A'], df['B']): print(A, B)
summarize
The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.