Pandas: Pandas DataFrame iterrows detailed explanation

Pandas2.2 DataFrame

Indexing, iteration

method	describe
([n])	Used to return the first few lines of the DataFrame
	Methods to quickly access and modify individual values in DataFrame
	Methods to quickly access and modify individual values in DataFrame
	Used to access and modify data in a DataFrame based on tags (row labels and column labels)
	Used to access and modify data in a DataFrame based on integer positions (row and column numbers)
(loc, column, value[, …])	Used to insert a new column at the specified location of the DataFrame
()	Column name used to iterate over DataFrame
()	Column names and column data used to iterate over DataFrame
()	Returns the column name of the DataFrame
()	Used for line by line iteration DataFrame

()

()Methods are used to iterate row by row DataFrame, each iteration returns a tuple containing row index and row data.

Line data withSeriesReturns the form of an object, where the index is the column name and the value is the value of the column corresponding to the row.

grammar:

for index, row in ():
    # Process row index and row data

Example:

Suppose we have a DataFrame as follows:

import pandas as pd

data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}

df = (data, index=['row1', 'row2', 'row3'])
print(df)

Output:

A B C
row1 1 4 7
row2 2 5 8
row3 3 6 9

Iterate over row indexes and row data

useiterrows()Methods iterate line by line DataFrame:

for index, row in ():
    print(f"Index: {index}")
    print(f"Row: {row}")
    print()

Output:

Index: row1
Row: A 1
B 4
C 7
Name: row1, dtype: int64

Index: row2
Row: A 2
B 5
C 8
Name: row2, dtype: int64

Index: row3
Row: A 3
B 6
C 9
Name: row3, dtype: int64

Access values for specific columns

When iterating over rows of data, access the values of a specific column:

for index, row in ():
    print(f"Index: {index}, A: {row['A']}, B: {row['B']}, C: {row['C']}")

Output:

Index: row1, A: 1, B: 4, C: 7
Index: row2, A: 2, B: 5, C: 8
Index: row3, A: 3, B: 6, C: 9

Notes:

Performance issues: iterrows()Poor performance when working with large DataFrames because it converts each row toSeriesobject, which can cause additional overhead. For performance-sensitive operations, it is recommended to useitertuples()Or vectorized operations.
Modify data: Modifying the DataFrame data during iteration may lead to unpredictable results. If you need to modify the data, it is recommended to create a copy first or use another method.

Summarize

()Methods provide a way to iterate over DataFrame row by row, each iteration returns a tuple containing row index and row data.

Although it is easy to use, performance issues need to be paid attention to when dealing with large data sets. For scenarios where data needs to be processed line by line,iterrows()It is a useful tool.

The above is personal experience. I hope you can give you a reference and I hope you can support me more.