Pandas2.2 DataFrame
Indexing, iteration
method | describe |
---|---|
([n]) | Used to return the first few lines of the DataFrame |
Methods to quickly access and modify individual values in DataFrame | |
Methods to quickly access and modify individual values in DataFrame | |
Used to access and modify data in a DataFrame based on tags (row labels and column labels) | |
Used to access and modify data in a DataFrame based on integer positions (row and column numbers) | |
(loc, column, value[, …]) | Used to insert a new column at the specified location of the DataFrame |
() | Column name used to iterate over DataFrame |
() | Column names and column data used to iterate over DataFrame |
() | Returns the column name of the DataFrame |
() | Used for line by line iteration DataFrame |
()
()
Methods are used to iterate row by row DataFrame, each iteration returns a tuple containing row index and row data.
Line data withSeries
Returns the form of an object, where the index is the column name and the value is the value of the column corresponding to the row.
- grammar:
for index, row in (): # Process row index and row data
- Example:
Suppose we have a DataFrame as follows:
import pandas as pd data = { 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9] } df = (data, index=['row1', 'row2', 'row3']) print(df)
Output:
A B C
row1 1 4 7
row2 2 5 8
row3 3 6 9
Iterate over row indexes and row data
useiterrows()
Methods iterate line by line DataFrame:
for index, row in (): print(f"Index: {index}") print(f"Row: {row}") print()
Output:
Index: row1
Row: A 1
B 4
C 7
Name: row1, dtype: int64Index: row2
Row: A 2
B 5
C 8
Name: row2, dtype: int64Index: row3
Row: A 3
B 6
C 9
Name: row3, dtype: int64
Access values for specific columns
When iterating over rows of data, access the values of a specific column:
for index, row in (): print(f"Index: {index}, A: {row['A']}, B: {row['B']}, C: {row['C']}")
Output:
Index: row1, A: 1, B: 4, C: 7
Index: row2, A: 2, B: 5, C: 8
Index: row3, A: 3, B: 6, C: 9
Notes:
-
Performance issues:
iterrows()
Poor performance when working with large DataFrames because it converts each row toSeries
object, which can cause additional overhead. For performance-sensitive operations, it is recommended to useitertuples()
Or vectorized operations. - Modify data: Modifying the DataFrame data during iteration may lead to unpredictable results. If you need to modify the data, it is recommended to create a copy first or use another method.
Summarize
()
Methods provide a way to iterate over DataFrame row by row, each iteration returns a tuple containing row index and row data.
Although it is easy to use, performance issues need to be paid attention to when dealing with large data sets. For scenarios where data needs to be processed line by line,iterrows()
It is a useful tool.
The above is personal experience. I hope you can give you a reference and I hope you can support me more.