When using Pandas DataFrame, we often need to filter the data in it and select only rows that meet specific conditions. If the data volume is large, it will be inefficient to traverse the DataFrame line by line. Here is a specific example:
#1 #2 #3 #4
1/1/1999 4 2 4 5
1/2/1999 5 2 3 3
1/3/1999 5 2 3 8
1/4/1999 6 4 2 6
1/5/1999 8 3 4 7
1/6/1999 3 2 3 8
1/7/1999 1 3 4 1
We want to test the rows that meet the following conditions:
- The sum of the first column of the current row and the fourth column of the first two rows is greater than or equal to 6.
For each condition, we can create a Boolean mask array whereTrue
Indicates that the conditions are met,False
Indicates that the conditions are not met. We can then use these mask arrays to filter the DataFrame, selecting only rows that satisfy all the criteria.
Solution
1) Use mask array to filter DataFrame
Pandas provides a very efficient way to filter data in a DataFrame, that is, to use a boolean array. The mask array is a boolean array, whereTrue
Indicates that the conditions are met,False
Indicates that the conditions are not met. We can use mask arrays to filter DataFrames, selecting only rows that satisfy all conditions.
import pandas as pd # Create a DataFramedf = ({ 'Date': ['1/1/1999', '1/2/1999', '1/3/1999', '1/4/1999', '1/5/1999', '1/6/1999', '1/7/1999'], '#1': [4, 5, 5, 6, 8, 3, 1], '#2': [2, 2, 2, 4, 3, 2, 3], '#3': [4, 3, 3, 2, 4, 3, 4], '#4': [5, 3, 8, 6, 7, 8, 1] }) # Create a mask array to represent rows that meet the criteriamask = (df['#1'].shift(1) + df['#4'].shift(2) >= 6) # Filter DataFrame using mask arraynewdf = df[mask] # Print the filtered DataFrameprint(newdf)
The output result is as follows:
Date #1 #2 #3 #4
3 1/4/1999 6 4 2 6
4 1/5/1999 8 3 4 7
2) Use logical operators to combine conditions
We can use logical operators (such as&
and|
) to combine multiple conditions to form a new Boolean mask array. We can then use this new mask array to filter the DataFrame, selecting only rows that satisfy all the criteria.
import pandas as pd # Create a DataFramedf = ({ 'Date': ['1/1/1999', '1/2/1999', '1/3/1999', '1/4/1999', '1/5/1999', '1/6/1999', '1/7/1999'], '#1': [4, 5, 5, 6, 8, 3, 1], '#2': [2, 2, 2, 4, 3, 2, 3], '#3': [4, 3, 3, 2, 4, 3, 4], '#4': [5, 3, 8, 6, 7, 8, 1] }) # Create a mask array to represent rows that meet the criteriamask = ((df['#1'].shift(1) + df['#4'].shift(2) >= 6) & (df['#2'] > 2)) # Filter DataFrame using mask arraynewdf = df[mask] # Print the filtered DataFrameprint(newdf)
The output result is as follows:
Date #1 #2 #3 #4
3 1/4/1999 6 4 2 6
We can see that only one row is selected to satisfy all the conditions.
3) Use the query() method to filter DataFrame
Pandas also provides aquery()
Method to filter DataFrame.query()
The method allows us to use a string expression to specify filtering conditions. A string expression is a boolean expression, whereTrue
Indicates that the conditions are met,False
Indicates that the conditions are not met.
import pandas as pd # Create a DataFramedf = ({ 'Date': ['1/1/1999', '1/2/1999', '1/3/1999', '1/4/1999', '1/5/1999', '1/6/1999', '1/7/1999'], '#1': [4, 5, 5, 6, 8, 3, 1], '#2': [2, 2, 2, 4, 3, 2, 3], '#3': [4, 3, 3, 2, 4, 3, 4], '#4': [5, 3, 8, 6, 7, 8, 1] }) # Use query() method to filter DataFramenewdf = ('(#(1) + #(2) >= 6) & (#2 > 2)') # Print the filtered DataFrameprint(newdf)
The output result is as follows:
Date #1 #2 #3 #4
3 1/4/1999 6 4 2 6
We can see that only one row is selected to satisfy all the conditions.
4) Use the iterrows() method to filter DataFrame
We can useiterrows()
Method to traverse the DataFrame line by line and select the rows that meet the conditions according to the conditions.
import pandas as pd # Create a DataFramedf = ({ 'Date': ['1/1/1999', '1/2/1999', '1/3/1999', '1/4/1999', '1/5/1999', '1/6/1999', '1/7/1999'], '#1': [4, 5, 5, 6, 8, 3, 1], '#2': [2, 2, 2, 4, 3, 2, 3], '#3': [4, 3, 3, 2, 4, 3, 4], '#4': [5, 3, 8, 6, 7, 8, 1] }) # Create an empty list to store rows that meet the criterianewdf = [] # traversal DataFrame line by linefor index, row in (): # Check whether the current row meets the conditions if (row['#1'] + row['#4'] >= 6) and (row['#2'] > 2): # Add the current line to the list (row) # Convert list to DataFramenewdf = (newdf) # Print the filtered DataFrameprint(newdf)
The output result is as follows:
Date #1 #2 #3 #4
3 1/4/1999 6 4 2 6
We can see that only one row is selected to satisfy all the conditions.
This is the end of this article about efficient access to rows in Pandas DataFrame that meet specific conditions. For more related Pandas DataFrame to access rows in specific conditions, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!