Efficient access to rows in Pandas DataFrame that meet specific conditions

When using Pandas DataFrame, we often need to filter the data in it and select only rows that meet specific conditions. If the data volume is large, it will be inefficient to traverse the DataFrame line by line. Here is a specific example:

#1 #2 #3 #4
1/1/1999 4 2 4 5
1/2/1999 5 2 3 3
1/3/1999 5 2 3 8
1/4/1999 6 4 2 6
1/5/1999 8 3 4 7
1/6/1999 3 2 3 8
1/7/1999 1 3 4 1

We want to test the rows that meet the following conditions:

The sum of the first column of the current row and the fourth column of the first two rows is greater than or equal to 6.

For each condition, we can create a Boolean mask array whereTrueIndicates that the conditions are met,FalseIndicates that the conditions are not met. We can then use these mask arrays to filter the DataFrame, selecting only rows that satisfy all the criteria.

Solution

1) Use mask array to filter DataFrame

Pandas provides a very efficient way to filter data in a DataFrame, that is, to use a boolean array. The mask array is a boolean array, whereTrueIndicates that the conditions are met,FalseIndicates that the conditions are not met. We can use mask arrays to filter DataFrames, selecting only rows that satisfy all conditions.

import pandas as pd

# Create a DataFramedf = ({
    'Date': ['1/1/1999', '1/2/1999', '1/3/1999', '1/4/1999', '1/5/1999', '1/6/1999', '1/7/1999'],
    '#1': [4, 5, 5, 6, 8, 3, 1],
    '#2': [2, 2, 2, 4, 3, 2, 3],
    '#3': [4, 3, 3, 2, 4, 3, 4],
    '#4': [5, 3, 8, 6, 7, 8, 1]
})

# Create a mask array to represent rows that meet the criteriamask = (df['#1'].shift(1) + df['#4'].shift(2) &gt;= 6)

# Filter DataFrame using mask arraynewdf = df[mask]

# Print the filtered DataFrameprint(newdf)

The output result is as follows:

Date #1 #2 #3 #4
3 1/4/1999 6 4 2 6
4 1/5/1999 8 3 4 7

2) Use logical operators to combine conditions

We can use logical operators (such as&and|) to combine multiple conditions to form a new Boolean mask array. We can then use this new mask array to filter the DataFrame, selecting only rows that satisfy all the criteria.

import pandas as pd

# Create a DataFramedf = ({
    'Date': ['1/1/1999', '1/2/1999', '1/3/1999', '1/4/1999', '1/5/1999', '1/6/1999', '1/7/1999'],
    '#1': [4, 5, 5, 6, 8, 3, 1],
    '#2': [2, 2, 2, 4, 3, 2, 3],
    '#3': [4, 3, 3, 2, 4, 3, 4],
    '#4': [5, 3, 8, 6, 7, 8, 1]
})

# Create a mask array to represent rows that meet the criteriamask = ((df['#1'].shift(1) + df['#4'].shift(2) &gt;= 6) &amp; (df['#2'] &gt; 2))

# Filter DataFrame using mask arraynewdf = df[mask]

# Print the filtered DataFrameprint(newdf)

The output result is as follows:

Date #1 #2 #3 #4
3 1/4/1999 6 4 2 6

We can see that only one row is selected to satisfy all the conditions.

3) Use the query() method to filter DataFrame

Pandas also provides aquery()Method to filter DataFrame.query()The method allows us to use a string expression to specify filtering conditions. A string expression is a boolean expression, whereTrueIndicates that the conditions are met,FalseIndicates that the conditions are not met.

import pandas as pd

# Create a DataFramedf = ({
    'Date': ['1/1/1999', '1/2/1999', '1/3/1999', '1/4/1999', '1/5/1999', '1/6/1999', '1/7/1999'],
    '#1': [4, 5, 5, 6, 8, 3, 1],
    '#2': [2, 2, 2, 4, 3, 2, 3],
    '#3': [4, 3, 3, 2, 4, 3, 4],
    '#4': [5, 3, 8, 6, 7, 8, 1]
})

# Use query() method to filter DataFramenewdf = ('(#(1) + #(2) &gt;= 6) &amp; (#2 &gt; 2)')

# Print the filtered DataFrameprint(newdf)

The output result is as follows:

Date #1 #2 #3 #4
3 1/4/1999 6 4 2 6

We can see that only one row is selected to satisfy all the conditions.

4) Use the iterrows() method to filter DataFrame

We can useiterrows()Method to traverse the DataFrame line by line and select the rows that meet the conditions according to the conditions.

import pandas as pd

# Create a DataFramedf = ({
    'Date': ['1/1/1999', '1/2/1999', '1/3/1999', '1/4/1999', '1/5/1999', '1/6/1999', '1/7/1999'],
    '#1': [4, 5, 5, 6, 8, 3, 1],
    '#2': [2, 2, 2, 4, 3, 2, 3],
    '#3': [4, 3, 3, 2, 4, 3, 4],
    '#4': [5, 3, 8, 6, 7, 8, 1]
})

# Create an empty list to store rows that meet the criterianewdf = []

# traversal DataFrame line by linefor index, row in ():
    # Check whether the current row meets the conditions    if (row['#1'] + row['#4'] &gt;= 6) and (row['#2'] &gt; 2):
        # Add the current line to the list        (row)

# Convert list to DataFramenewdf = (newdf)

# Print the filtered DataFrameprint(newdf)

The output result is as follows:

Date #1 #2 #3 #4
3 1/4/1999 6 4 2 6

We can see that only one row is selected to satisfy all the conditions.

This is the end of this article about efficient access to rows in Pandas DataFrame that meet specific conditions. For more related Pandas DataFrame to access rows in specific conditions, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!